[v2,07/34] target/ppc: Implement cntlzdm

Message ID	20211029202424.175401-8-matheus.ferst@eldorado.org.br (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=w/R+=PR=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 195936103E From: matheus.ferst@eldorado.org.br To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Subject: [PATCH v2 07/34] target/ppc: Implement cntlzdm Date: Fri, 29 Oct 2021 17:23:57 -0300 Message-Id: <20211029202424.175401-8-matheus.ferst@eldorado.org.br> In-Reply-To: <20211029202424.175401-1-matheus.ferst@eldorado.org.br> References: <20211029202424.175401-1-matheus.ferst@eldorado.org.br> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=201.28.113.2; envelope-from=matheus.ferst@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, PDS_HP_HELO_NORDNS=0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Cc: lucas.castro@eldorado.org.br, richard.henderson@linaro.org, groug@kaod.org, luis.pires@eldorado.org.br, Matheus Ferst <matheus.ferst@eldorado.org.br>, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	PowerISA v3.1 instruction batch \| expand [v2,00/34] PowerISA v3.1 instruction batch [v2,01/34] target/ppc: introduce do_ea_calc [v2,02/34] target/ppc: move resolve_PLS_D to translate.c [v2,03/34] target/ppc: Move load and store floating point instructions to decodetree [v2,04/34] target/ppc: Implement PLFS, PLFD, PSTFS and PSTFD instructions [v2,05/34] target/ppc: Move LQ and STQ to decodetree [v2,06/34] target/ppc: Implement PLQ and PSTQ [v2,07/34] target/ppc: Implement cntlzdm [v2,08/34] target/ppc: Implement cnttzdm [v2,09/34] target/ppc: Implement pdepd instruction [v2,10/34] target/ppc: Implement pextd instruction [v2,11/34] target/ppc: Move vcfuged to vmx-impl.c.inc [v2,12/34] target/ppc: Implement vclzdm/vctzdm instructions [v2,13/34] target/ppc: Implement vpdepd/vpextd instruction [v2,14/34] target/ppc: Implement vsldbi/vsrdbi instructions [v2,15/34] target/ppc: Implement Vector Insert from GPR using GPR index insns [v2,16/34] target/ppc: Implement Vector Insert Word from GPR using Immediate insns [v2,17/34] target/ppc: Implement Vector Insert from VSR using GPR index insns [v2,18/34] target/ppc: Move vinsertb/vinserth/vinsertw/vinsertd to decodetree [v2,19/34] target/ppc: Implement Vector Extract Double to VSR using GPR index insns [v2,20/34] target/ppc: Introduce REQUIRE_VSX macro [v2,21/34] target/ppc: receive high/low as argument in get/set_cpu_vsr [v2,22/34] target/ppc: moved stxv and lxv from legacy to decodtree [v2,23/34] target/ppc: moved stxvx and lxvx from legacy to decodtree [v2,24/34] target/ppc: added the instructions LXVP and STXVP [v2,25/34] target/ppc: added the instructions LXVPX and STXVPX [v2,26/34] target/ppc: added the instructions PLXV and PSTXV [v2,27/34] target/ppc: added the instructions PLXVP and PSTXVP [v2,28/34] target/ppc: moved XXSPLTW to using decodetree [v2,29/34] target/ppc: moved XXSPLTIB to using decodetree [v2,30/34] target/ppc: implemented XXSPLTI32DX [v2,31/34] target/ppc: Implemented XXSPLTIW using decodetree [v2,32/34] target/ppc: implemented XXSPLTIDP instruction [v2,33/34] target/ppc: Implement xxblendvb/xxblendvh/xxblendvw/xxblendvd instructions [v2,34/34] target/ppc: Implement lxvkq instruction

Message ID

20211029202424.175401-8-matheus.ferst@eldorado.org.br (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 195936103E
From: matheus.ferst@eldorado.org.br
To: qemu-devel@nongnu.org,
	qemu-ppc@nongnu.org
Subject: [PATCH v2 07/34] target/ppc: Implement cntlzdm
Date: Fri, 29 Oct 2021 17:23:57 -0300
Message-Id: <20211029202424.175401-8-matheus.ferst@eldorado.org.br>
In-Reply-To: <20211029202424.175401-1-matheus.ferst@eldorado.org.br>
References: <20211029202424.175401-1-matheus.ferst@eldorado.org.br>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=201.28.113.2;
 envelope-from=matheus.ferst@eldorado.org.br; helo=outlook.eldorado.org.br
X-Spam_score_int: -10
X-Spam_score: -1.1
X-Spam_bar: -
X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, PDS_HP_HELO_NORDNS=0.001,
 RDNS_NONE=0.793, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: lucas.castro@eldorado.org.br, richard.henderson@linaro.org,
 groug@kaod.org,
 luis.pires@eldorado.org.br, Matheus Ferst <matheus.ferst@eldorado.org.br>,
 david@gibson.dropbear.id.au
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

PowerISA v3.1 instruction batch | expand

Commit Message

Matheus K. Ferst Oct. 29, 2021, 8:23 p.m. UTC

From: Luis Pires <luis.pires@eldorado.org.br>

Implement the following PowerISA v3.1 instruction:
cntlzdm: Count Leading Zeros Doubleword Under Bit Mask

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v2:
- Inline implementation of cntlzdm
---
 target/ppc/insn32.decode                   |  1 +
 target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++
 2 files changed, 37 insertions(+)

Comments

Richard Henderson Oct. 30, 2021, 9:17 p.m. UTC | #1

On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote:
> From: Luis Pires <luis.pires@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instruction:
> cntlzdm: Count Leading Zeros Doubleword Under Bit Mask
> 
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v2:
> - Inline implementation of cntlzdm
> ---
>   target/ppc/insn32.decode                   |  1 +
>   target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++
>   2 files changed, 37 insertions(+)
> 
> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> index 9cb9fc00b8..221cb00dd6 100644
> --- a/target/ppc/insn32.decode
> +++ b/target/ppc/insn32.decode
> @@ -203,6 +203,7 @@ ADDPCIS         010011 ..... ..... .......... 00010 .   @DX
>   ## Fixed-Point Logical Instructions
>   
>   CFUGED          011111 ..... ..... ..... 0011011100 -   @X
> +CNTLZDM         011111 ..... ..... ..... 0000111011 -   @X
>   
>   ### Float-Point Load Instructions
>   
> diff --git a/target/ppc/translate/fixedpoint-impl.c.inc b/target/ppc/translate/fixedpoint-impl.c.inc
> index 0d9c6e0996..c9e9ae35df 100644
> --- a/target/ppc/translate/fixedpoint-impl.c.inc
> +++ b/target/ppc/translate/fixedpoint-impl.c.inc
> @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X *a)
>   #endif
>       return true;
>   }
> +
> +#if defined(TARGET_PPC64)
> +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask)
> +{
> +    TCGv_i64 tmp;
> +    TCGLabel *l1;
> +
> +    tmp = tcg_temp_local_new_i64();
> +    l1 = gen_new_label();
> +
> +    tcg_gen_and_i64(tmp, src, mask);
> +    tcg_gen_clzi_i64(tmp, tmp, 64);
> +
> +    tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1);
> +
> +    tcg_gen_subfi_i64(tmp, 64, tmp);
> +    tcg_gen_shr_i64(tmp, mask, tmp);
> +    tcg_gen_ctpop_i64(tmp, tmp);
> +
> +    gen_set_label(l1);
> +
> +    tcg_gen_mov_i64(dst, tmp);

This works, but a form without brcond would be better (due to how poorly tcg handles basic 
blocks).

How about

     tcg_gen_clzi_i64(tmp, tmp, 0);

     tcg_gen_xori_i64(tmp, tmp, 63);
     tcg_gen_shr_i64(tmp, mask, tmp);
     tcg_gen_shri_i64(tmp, tmp, 1);

     tcg_gen_ctpop_i64(dst, tmp);

The middle 3 operations perform a shift between [1-64], such that we are assured of 0 for 64.

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

David Gibson Nov. 1, 2021, 12:16 a.m. UTC | #2

On Sat, Oct 30, 2021 at 02:17:07PM -0700, Richard Henderson wrote:
> On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote:
> > From: Luis Pires <luis.pires@eldorado.org.br>
> > 
> > Implement the following PowerISA v3.1 instruction:
> > cntlzdm: Count Leading Zeros Doubleword Under Bit Mask
> > 
> > Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> > Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
> > Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> > ---
> > v2:
> > - Inline implementation of cntlzdm
> > ---
> >   target/ppc/insn32.decode                   |  1 +
> >   target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++
> >   2 files changed, 37 insertions(+)
> > 
> > diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> > index 9cb9fc00b8..221cb00dd6 100644
> > --- a/target/ppc/insn32.decode
> > +++ b/target/ppc/insn32.decode
> > @@ -203,6 +203,7 @@ ADDPCIS         010011 ..... ..... .......... 00010 .   @DX
> >   ## Fixed-Point Logical Instructions
> >   CFUGED          011111 ..... ..... ..... 0011011100 -   @X
> > +CNTLZDM         011111 ..... ..... ..... 0000111011 -   @X
> >   ### Float-Point Load Instructions
> > diff --git a/target/ppc/translate/fixedpoint-impl.c.inc b/target/ppc/translate/fixedpoint-impl.c.inc
> > index 0d9c6e0996..c9e9ae35df 100644
> > --- a/target/ppc/translate/fixedpoint-impl.c.inc
> > +++ b/target/ppc/translate/fixedpoint-impl.c.inc
> > @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X *a)
> >   #endif
> >       return true;
> >   }
> > +
> > +#if defined(TARGET_PPC64)
> > +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask)
> > +{
> > +    TCGv_i64 tmp;
> > +    TCGLabel *l1;
> > +
> > +    tmp = tcg_temp_local_new_i64();
> > +    l1 = gen_new_label();
> > +
> > +    tcg_gen_and_i64(tmp, src, mask);
> > +    tcg_gen_clzi_i64(tmp, tmp, 64);
> > +
> > +    tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1);
> > +
> > +    tcg_gen_subfi_i64(tmp, 64, tmp);
> > +    tcg_gen_shr_i64(tmp, mask, tmp);
> > +    tcg_gen_ctpop_i64(tmp, tmp);
> > +
> > +    gen_set_label(l1);
> > +
> > +    tcg_gen_mov_i64(dst, tmp);
> 
> This works, but a form without brcond would be better (due to how poorly tcg
> handles basic blocks).
> 
> How about
> 
>     tcg_gen_clzi_i64(tmp, tmp, 0);
> 
>     tcg_gen_xori_i64(tmp, tmp, 63);
>     tcg_gen_shr_i64(tmp, mask, tmp);
>     tcg_gen_shri_i64(tmp, tmp, 1);
> 
>     tcg_gen_ctpop_i64(dst, tmp);

I've applied this to ppc-for-6.2.  You can make this improvement as a
followup if you want.

> 
> The middle 3 operations perform a shift between [1-64], such that we are assured of 0 for 64.
> 
> Either way,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
>

Matheus K. Ferst Nov. 4, 2021, 11:37 a.m. UTC | #3

On 30/10/2021 18:17, Richard Henderson wrote:
> On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote:
>> From: Luis Pires <luis.pires@eldorado.org.br>
>>
>> Implement the following PowerISA v3.1 instruction:
>> cntlzdm: Count Leading Zeros Doubleword Under Bit Mask
>>
>> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> v2:
>> - Inline implementation of cntlzdm
>> ---
>>   target/ppc/insn32.decode                   |  1 +
>>   target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++
>>   2 files changed, 37 insertions(+)
>>
>> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
>> index 9cb9fc00b8..221cb00dd6 100644
>> --- a/target/ppc/insn32.decode
>> +++ b/target/ppc/insn32.decode
>> @@ -203,6 +203,7 @@ ADDPCIS         010011 ..... ..... .......... 
>> 00010 .   @DX
>>   ## Fixed-Point Logical Instructions
>>
>>   CFUGED          011111 ..... ..... ..... 0011011100 -   @X
>> +CNTLZDM         011111 ..... ..... ..... 0000111011 -   @X
>>
>>   ### Float-Point Load Instructions
>>
>> diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
>> b/target/ppc/translate/fixedpoint-impl.c.inc
>> index 0d9c6e0996..c9e9ae35df 100644
>> --- a/target/ppc/translate/fixedpoint-impl.c.inc
>> +++ b/target/ppc/translate/fixedpoint-impl.c.inc
>> @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X 
>> *a)
>>   #endif
>>       return true;
>>   }
>> +
>> +#if defined(TARGET_PPC64)
>> +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask)
>> +{
>> +    TCGv_i64 tmp;
>> +    TCGLabel *l1;
>> +
>> +    tmp = tcg_temp_local_new_i64();
>> +    l1 = gen_new_label();
>> +
>> +    tcg_gen_and_i64(tmp, src, mask);
>> +    tcg_gen_clzi_i64(tmp, tmp, 64);
>> +
>> +    tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1);
>> +
>> +    tcg_gen_subfi_i64(tmp, 64, tmp);
>> +    tcg_gen_shr_i64(tmp, mask, tmp);
>> +    tcg_gen_ctpop_i64(tmp, tmp);
>> +
>> +    gen_set_label(l1);
>> +
>> +    tcg_gen_mov_i64(dst, tmp);
> 
> This works, but a form without brcond would be better (due to how poorly 
> tcg handles basic
> blocks).
> 

I should've tried a little harder to get rid of this branch...

> How about
> 
>      tcg_gen_clzi_i64(tmp, tmp, 0);
> 
>      tcg_gen_xori_i64(tmp, tmp, 63);
>      tcg_gen_shr_i64(tmp, mask, tmp);
>      tcg_gen_shri_i64(tmp, tmp, 1);
> 
>      tcg_gen_ctpop_i64(dst, tmp);
> 
> The middle 3 operations perform a shift between [1-64], such that we are 
> assured of 0 for 64.

When src & mask == 0 we shouldn't shift mask (or shift it zero bits), so 
I guess we can't have this shri. Maybe something like

tcg_gen_and_i64(t0, src, mask);
tcg_gen_clzi_i64(t0, t0, -1);

tcg_gen_setcondi_i64(TCG_COND_NE, t1, t0, -1);
tcg_gen_andi_i64(t0, t0, 63);
tcg_gen_xori_i64(t0, t0, 63);

tcg_gen_shr_i64(t0, mask, t0);
tcg_gen_shr_i64(t0, t0, t1);

tcg_gen_ctpop_i64(dst, t0);

So we still shift 63+1 bits when there are no leading zeros and shift 0 
bits when it's all zeros.

> 
> Either way,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 9cb9fc00b8..221cb00dd6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -203,6 +203,7 @@  ADDPCIS         010011 ..... ..... .......... 00010 .   @DX
 ## Fixed-Point Logical Instructions
 
 CFUGED          011111 ..... ..... ..... 0011011100 -   @X
+CNTLZDM         011111 ..... ..... ..... 0000111011 -   @X
 
 ### Float-Point Load Instructions
 
diff --git a/target/ppc/translate/fixedpoint-impl.c.inc b/target/ppc/translate/fixedpoint-impl.c.inc
index 0d9c6e0996..c9e9ae35df 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -413,3 +413,39 @@  static bool trans_CFUGED(DisasContext *ctx, arg_X *a)
 #endif
     return true;
 }
+
+#if defined(TARGET_PPC64)
+static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask)
+{
+    TCGv_i64 tmp;
+    TCGLabel *l1;
+
+    tmp = tcg_temp_local_new_i64();
+    l1 = gen_new_label();
+
+    tcg_gen_and_i64(tmp, src, mask);
+    tcg_gen_clzi_i64(tmp, tmp, 64);
+
+    tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1);
+
+    tcg_gen_subfi_i64(tmp, 64, tmp);
+    tcg_gen_shr_i64(tmp, mask, tmp);
+    tcg_gen_ctpop_i64(tmp, tmp);
+
+    gen_set_label(l1);
+
+    tcg_gen_mov_i64(dst, tmp);
+}
+#endif
+
+static bool trans_CNTLZDM(DisasContext *ctx, arg_X *a)
+{
+    REQUIRE_64BIT(ctx);
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+#if defined(TARGET_PPC64)
+    do_cntlzdm(cpu_gpr[a->ra], cpu_gpr[a->rt], cpu_gpr[a->rb]);
+#else
+    qemu_build_not_reached();
+#endif
+    return true;
+}

[v2,07/34] target/ppc: Implement cntlzdm

Commit Message

Comments

Patch