Message ID | 20211029202424.175401-8-matheus.ferst@eldorado.org.br (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PowerISA v3.1 instruction batch | expand |
On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote: > From: Luis Pires <luis.pires@eldorado.org.br> > > Implement the following PowerISA v3.1 instruction: > cntlzdm: Count Leading Zeros Doubleword Under Bit Mask > > Suggested-by: Richard Henderson <richard.henderson@linaro.org> > Signed-off-by: Luis Pires <luis.pires@eldorado.org.br> > Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> > --- > v2: > - Inline implementation of cntlzdm > --- > target/ppc/insn32.decode | 1 + > target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode > index 9cb9fc00b8..221cb00dd6 100644 > --- a/target/ppc/insn32.decode > +++ b/target/ppc/insn32.decode > @@ -203,6 +203,7 @@ ADDPCIS 010011 ..... ..... .......... 00010 . @DX > ## Fixed-Point Logical Instructions > > CFUGED 011111 ..... ..... ..... 0011011100 - @X > +CNTLZDM 011111 ..... ..... ..... 0000111011 - @X > > ### Float-Point Load Instructions > > diff --git a/target/ppc/translate/fixedpoint-impl.c.inc b/target/ppc/translate/fixedpoint-impl.c.inc > index 0d9c6e0996..c9e9ae35df 100644 > --- a/target/ppc/translate/fixedpoint-impl.c.inc > +++ b/target/ppc/translate/fixedpoint-impl.c.inc > @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X *a) > #endif > return true; > } > + > +#if defined(TARGET_PPC64) > +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask) > +{ > + TCGv_i64 tmp; > + TCGLabel *l1; > + > + tmp = tcg_temp_local_new_i64(); > + l1 = gen_new_label(); > + > + tcg_gen_and_i64(tmp, src, mask); > + tcg_gen_clzi_i64(tmp, tmp, 64); > + > + tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1); > + > + tcg_gen_subfi_i64(tmp, 64, tmp); > + tcg_gen_shr_i64(tmp, mask, tmp); > + tcg_gen_ctpop_i64(tmp, tmp); > + > + gen_set_label(l1); > + > + tcg_gen_mov_i64(dst, tmp); This works, but a form without brcond would be better (due to how poorly tcg handles basic blocks). How about tcg_gen_clzi_i64(tmp, tmp, 0); tcg_gen_xori_i64(tmp, tmp, 63); tcg_gen_shr_i64(tmp, mask, tmp); tcg_gen_shri_i64(tmp, tmp, 1); tcg_gen_ctpop_i64(dst, tmp); The middle 3 operations perform a shift between [1-64], such that we are assured of 0 for 64. Either way, Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On Sat, Oct 30, 2021 at 02:17:07PM -0700, Richard Henderson wrote: > On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote: > > From: Luis Pires <luis.pires@eldorado.org.br> > > > > Implement the following PowerISA v3.1 instruction: > > cntlzdm: Count Leading Zeros Doubleword Under Bit Mask > > > > Suggested-by: Richard Henderson <richard.henderson@linaro.org> > > Signed-off-by: Luis Pires <luis.pires@eldorado.org.br> > > Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> > > --- > > v2: > > - Inline implementation of cntlzdm > > --- > > target/ppc/insn32.decode | 1 + > > target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > > > diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode > > index 9cb9fc00b8..221cb00dd6 100644 > > --- a/target/ppc/insn32.decode > > +++ b/target/ppc/insn32.decode > > @@ -203,6 +203,7 @@ ADDPCIS 010011 ..... ..... .......... 00010 . @DX > > ## Fixed-Point Logical Instructions > > CFUGED 011111 ..... ..... ..... 0011011100 - @X > > +CNTLZDM 011111 ..... ..... ..... 0000111011 - @X > > ### Float-Point Load Instructions > > diff --git a/target/ppc/translate/fixedpoint-impl.c.inc b/target/ppc/translate/fixedpoint-impl.c.inc > > index 0d9c6e0996..c9e9ae35df 100644 > > --- a/target/ppc/translate/fixedpoint-impl.c.inc > > +++ b/target/ppc/translate/fixedpoint-impl.c.inc > > @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X *a) > > #endif > > return true; > > } > > + > > +#if defined(TARGET_PPC64) > > +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask) > > +{ > > + TCGv_i64 tmp; > > + TCGLabel *l1; > > + > > + tmp = tcg_temp_local_new_i64(); > > + l1 = gen_new_label(); > > + > > + tcg_gen_and_i64(tmp, src, mask); > > + tcg_gen_clzi_i64(tmp, tmp, 64); > > + > > + tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1); > > + > > + tcg_gen_subfi_i64(tmp, 64, tmp); > > + tcg_gen_shr_i64(tmp, mask, tmp); > > + tcg_gen_ctpop_i64(tmp, tmp); > > + > > + gen_set_label(l1); > > + > > + tcg_gen_mov_i64(dst, tmp); > > This works, but a form without brcond would be better (due to how poorly tcg > handles basic blocks). > > How about > > tcg_gen_clzi_i64(tmp, tmp, 0); > > tcg_gen_xori_i64(tmp, tmp, 63); > tcg_gen_shr_i64(tmp, mask, tmp); > tcg_gen_shri_i64(tmp, tmp, 1); > > tcg_gen_ctpop_i64(dst, tmp); I've applied this to ppc-for-6.2. You can make this improvement as a followup if you want. > > The middle 3 operations perform a shift between [1-64], such that we are assured of 0 for 64. > > Either way, > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > > > r~ >
On 30/10/2021 18:17, Richard Henderson wrote: > On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote: >> From: Luis Pires <luis.pires@eldorado.org.br> >> >> Implement the following PowerISA v3.1 instruction: >> cntlzdm: Count Leading Zeros Doubleword Under Bit Mask >> >> Suggested-by: Richard Henderson <richard.henderson@linaro.org> >> Signed-off-by: Luis Pires <luis.pires@eldorado.org.br> >> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> >> --- >> v2: >> - Inline implementation of cntlzdm >> --- >> target/ppc/insn32.decode | 1 + >> target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++ >> 2 files changed, 37 insertions(+) >> >> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode >> index 9cb9fc00b8..221cb00dd6 100644 >> --- a/target/ppc/insn32.decode >> +++ b/target/ppc/insn32.decode >> @@ -203,6 +203,7 @@ ADDPCIS 010011 ..... ..... .......... >> 00010 . @DX >> ## Fixed-Point Logical Instructions >> >> CFUGED 011111 ..... ..... ..... 0011011100 - @X >> +CNTLZDM 011111 ..... ..... ..... 0000111011 - @X >> >> ### Float-Point Load Instructions >> >> diff --git a/target/ppc/translate/fixedpoint-impl.c.inc >> b/target/ppc/translate/fixedpoint-impl.c.inc >> index 0d9c6e0996..c9e9ae35df 100644 >> --- a/target/ppc/translate/fixedpoint-impl.c.inc >> +++ b/target/ppc/translate/fixedpoint-impl.c.inc >> @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X >> *a) >> #endif >> return true; >> } >> + >> +#if defined(TARGET_PPC64) >> +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask) >> +{ >> + TCGv_i64 tmp; >> + TCGLabel *l1; >> + >> + tmp = tcg_temp_local_new_i64(); >> + l1 = gen_new_label(); >> + >> + tcg_gen_and_i64(tmp, src, mask); >> + tcg_gen_clzi_i64(tmp, tmp, 64); >> + >> + tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1); >> + >> + tcg_gen_subfi_i64(tmp, 64, tmp); >> + tcg_gen_shr_i64(tmp, mask, tmp); >> + tcg_gen_ctpop_i64(tmp, tmp); >> + >> + gen_set_label(l1); >> + >> + tcg_gen_mov_i64(dst, tmp); > > This works, but a form without brcond would be better (due to how poorly > tcg handles basic > blocks). > I should've tried a little harder to get rid of this branch... > How about > > tcg_gen_clzi_i64(tmp, tmp, 0); > > tcg_gen_xori_i64(tmp, tmp, 63); > tcg_gen_shr_i64(tmp, mask, tmp); > tcg_gen_shri_i64(tmp, tmp, 1); > > tcg_gen_ctpop_i64(dst, tmp); > > The middle 3 operations perform a shift between [1-64], such that we are > assured of 0 for 64. When src & mask == 0 we shouldn't shift mask (or shift it zero bits), so I guess we can't have this shri. Maybe something like tcg_gen_and_i64(t0, src, mask); tcg_gen_clzi_i64(t0, t0, -1); tcg_gen_setcondi_i64(TCG_COND_NE, t1, t0, -1); tcg_gen_andi_i64(t0, t0, 63); tcg_gen_xori_i64(t0, t0, 63); tcg_gen_shr_i64(t0, mask, t0); tcg_gen_shr_i64(t0, t0, t1); tcg_gen_ctpop_i64(dst, t0); So we still shift 63+1 bits when there are no leading zeros and shift 0 bits when it's all zeros. > > Either way, > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > Thanks, Matheus K. Ferst Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/> Analista de Software Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 9cb9fc00b8..221cb00dd6 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -203,6 +203,7 @@ ADDPCIS 010011 ..... ..... .......... 00010 . @DX ## Fixed-Point Logical Instructions CFUGED 011111 ..... ..... ..... 0011011100 - @X +CNTLZDM 011111 ..... ..... ..... 0000111011 - @X ### Float-Point Load Instructions diff --git a/target/ppc/translate/fixedpoint-impl.c.inc b/target/ppc/translate/fixedpoint-impl.c.inc index 0d9c6e0996..c9e9ae35df 100644 --- a/target/ppc/translate/fixedpoint-impl.c.inc +++ b/target/ppc/translate/fixedpoint-impl.c.inc @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X *a) #endif return true; } + +#if defined(TARGET_PPC64) +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask) +{ + TCGv_i64 tmp; + TCGLabel *l1; + + tmp = tcg_temp_local_new_i64(); + l1 = gen_new_label(); + + tcg_gen_and_i64(tmp, src, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + + tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1); + + tcg_gen_subfi_i64(tmp, 64, tmp); + tcg_gen_shr_i64(tmp, mask, tmp); + tcg_gen_ctpop_i64(tmp, tmp); + + gen_set_label(l1); + + tcg_gen_mov_i64(dst, tmp); +} +#endif + +static bool trans_CNTLZDM(DisasContext *ctx, arg_X *a) +{ + REQUIRE_64BIT(ctx); + REQUIRE_INSNS_FLAGS2(ctx, ISA310); +#if defined(TARGET_PPC64) + do_cntlzdm(cpu_gpr[a->ra], cpu_gpr[a->rt], cpu_gpr[a->rb]); +#else + qemu_build_not_reached(); +#endif + return true; +}