Message ID | 20211013184125.2010897-1-philipp.tomsich@vrull.eu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/riscv: Fix orc.b implementation | expand |
On Wed, Oct 13, 2021 at 8:41 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > The earlier implementation fell into a corner case for bytes that were > 0x01, giving a wrong result (but not affecting our application test > cases for strings, as an ASCII value 0x01 is rare in those...). > > This changes the algorithm to: > 1. Mask out the high-bit of each bytes (so that each byte is <= 127). > 2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow > into the highest bit of each byte). > 3. Bitwise-or the original value back in (to cover those cases where the > source byte was exactly 128) to saturate the high-bit. > 4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of > each byte into its LSB. > 5. Multiply with 0xff to fan out the LSB to all bits of each byte. > > Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci") > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > Reported-by: Vincent Palatin <vpalatin@rivosinc.com> > Tested-by: Vincent Palatin <vpalatin@rivosinc.com> > --- > > target/riscv/insn_trans/trans_rvb.c.inc | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc > index 185c3e9a60..3095624f32 100644 > --- a/target/riscv/insn_trans/trans_rvb.c.inc > +++ b/target/riscv/insn_trans/trans_rvb.c.inc > @@ -249,13 +249,16 @@ static bool trans_rev8_64(DisasContext *ctx, arg_rev8_64 *a) > static void gen_orc_b(TCGv ret, TCGv source1) > { > TCGv tmp = tcg_temp_new(); > - TCGv ones = tcg_constant_tl(dup_const_tl(MO_8, 0x01)); > + TCGv low7 = tcg_constant_tl(dup_const_tl(MO_8, 0x7f)); > > - /* Set lsb in each byte if the byte was zero. */ > - tcg_gen_sub_tl(tmp, source1, ones); > - tcg_gen_andc_tl(tmp, tmp, source1); > + /* Set msb in each byte if the byte was non-zero. */ > + tcg_gen_and_tl(tmp, source1, low7); > + tcg_gen_add_tl(tmp, tmp, low7); > + tcg_gen_or_tl(tmp, tmp, source1); > + > + /* Extract the msb to the lsb in each byte */ > + tcg_gen_andc_tl(tmp, tmp, low7); > tcg_gen_shri_tl(tmp, tmp, 7); > - tcg_gen_andc_tl(tmp, ones, tmp); > > /* Replicate the lsb of each byte across the byte. */ > tcg_gen_muli_tl(ret, tmp, 0xff); > -- > 2.25.1 >
On 10/13/21 11:41 AM, Philipp Tomsich wrote: > The earlier implementation fell into a corner case for bytes that were > 0x01, giving a wrong result (but not affecting our application test > cases for strings, as an ASCII value 0x01 is rare in those...). > > This changes the algorithm to: > 1. Mask out the high-bit of each bytes (so that each byte is <= 127). > 2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow > into the highest bit of each byte). > 3. Bitwise-or the original value back in (to cover those cases where the > source byte was exactly 128) to saturate the high-bit. > 4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of > each byte into its LSB. > 5. Multiply with 0xff to fan out the LSB to all bits of each byte. > > Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci") > > Signed-off-by: Philipp Tomsich<philipp.tomsich@vrull.eu> > Reported-by: Vincent Palatin<vpalatin@rivosinc.com> > > --- Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On Thu, Oct 14, 2021 at 4:43 AM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > The earlier implementation fell into a corner case for bytes that were > 0x01, giving a wrong result (but not affecting our application test > cases for strings, as an ASCII value 0x01 is rare in those...). > > This changes the algorithm to: > 1. Mask out the high-bit of each bytes (so that each byte is <= 127). > 2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow > into the highest bit of each byte). > 3. Bitwise-or the original value back in (to cover those cases where the > source byte was exactly 128) to saturate the high-bit. > 4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of > each byte into its LSB. > 5. Multiply with 0xff to fan out the LSB to all bits of each byte. > > Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci") > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > Reported-by: Vincent Palatin <vpalatin@rivosinc.com> Thanks! Applied to riscv-to-apply.next Alistair > > --- > > target/riscv/insn_trans/trans_rvb.c.inc | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc > index 185c3e9a60..3095624f32 100644 > --- a/target/riscv/insn_trans/trans_rvb.c.inc > +++ b/target/riscv/insn_trans/trans_rvb.c.inc > @@ -249,13 +249,16 @@ static bool trans_rev8_64(DisasContext *ctx, arg_rev8_64 *a) > static void gen_orc_b(TCGv ret, TCGv source1) > { > TCGv tmp = tcg_temp_new(); > - TCGv ones = tcg_constant_tl(dup_const_tl(MO_8, 0x01)); > + TCGv low7 = tcg_constant_tl(dup_const_tl(MO_8, 0x7f)); > > - /* Set lsb in each byte if the byte was zero. */ > - tcg_gen_sub_tl(tmp, source1, ones); > - tcg_gen_andc_tl(tmp, tmp, source1); > + /* Set msb in each byte if the byte was non-zero. */ > + tcg_gen_and_tl(tmp, source1, low7); > + tcg_gen_add_tl(tmp, tmp, low7); > + tcg_gen_or_tl(tmp, tmp, source1); > + > + /* Extract the msb to the lsb in each byte */ > + tcg_gen_andc_tl(tmp, tmp, low7); > tcg_gen_shri_tl(tmp, tmp, 7); > - tcg_gen_andc_tl(tmp, ones, tmp); > > /* Replicate the lsb of each byte across the byte. */ > tcg_gen_muli_tl(ret, tmp, 0xff); > -- > 2.25.1 > >
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc index 185c3e9a60..3095624f32 100644 --- a/target/riscv/insn_trans/trans_rvb.c.inc +++ b/target/riscv/insn_trans/trans_rvb.c.inc @@ -249,13 +249,16 @@ static bool trans_rev8_64(DisasContext *ctx, arg_rev8_64 *a) static void gen_orc_b(TCGv ret, TCGv source1) { TCGv tmp = tcg_temp_new(); - TCGv ones = tcg_constant_tl(dup_const_tl(MO_8, 0x01)); + TCGv low7 = tcg_constant_tl(dup_const_tl(MO_8, 0x7f)); - /* Set lsb in each byte if the byte was zero. */ - tcg_gen_sub_tl(tmp, source1, ones); - tcg_gen_andc_tl(tmp, tmp, source1); + /* Set msb in each byte if the byte was non-zero. */ + tcg_gen_and_tl(tmp, source1, low7); + tcg_gen_add_tl(tmp, tmp, low7); + tcg_gen_or_tl(tmp, tmp, source1); + + /* Extract the msb to the lsb in each byte */ + tcg_gen_andc_tl(tmp, tmp, low7); tcg_gen_shri_tl(tmp, tmp, 7); - tcg_gen_andc_tl(tmp, ones, tmp); /* Replicate the lsb of each byte across the byte. */ tcg_gen_muli_tl(ret, tmp, 0xff);
The earlier implementation fell into a corner case for bytes that were 0x01, giving a wrong result (but not affecting our application test cases for strings, as an ASCII value 0x01 is rare in those...). This changes the algorithm to: 1. Mask out the high-bit of each bytes (so that each byte is <= 127). 2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow into the highest bit of each byte). 3. Bitwise-or the original value back in (to cover those cases where the source byte was exactly 128) to saturate the high-bit. 4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of each byte into its LSB. 5. Multiply with 0xff to fan out the LSB to all bits of each byte. Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci") Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Reported-by: Vincent Palatin <vpalatin@rivosinc.com> --- target/riscv/insn_trans/trans_rvb.c.inc | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)