diff mbox series

target/riscv: Fix orc.b implementation

Message ID 20211013184125.2010897-1-philipp.tomsich@vrull.eu (mailing list archive)
State New, archived
Headers show
Series target/riscv: Fix orc.b implementation | expand

Commit Message

Philipp Tomsich Oct. 13, 2021, 6:41 p.m. UTC
The earlier implementation fell into a corner case for bytes that were
0x01, giving a wrong result (but not affecting our application test
cases for strings, as an ASCII value 0x01 is rare in those...).

This changes the algorithm to:
 1. Mask out the high-bit of each bytes (so that each byte is <= 127).
 2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow
    into the highest bit of each byte).
 3. Bitwise-or the original value back in (to cover those cases where the
    source byte was exactly 128) to saturate the high-bit.
 4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of
    each byte into its LSB.
 5. Multiply with 0xff to fan out the LSB to all bits of each byte.

Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci")

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Reported-by: Vincent Palatin <vpalatin@rivosinc.com>

---

 target/riscv/insn_trans/trans_rvb.c.inc | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Vincent Palatin Oct. 13, 2021, 7:12 p.m. UTC | #1
On Wed, Oct 13, 2021 at 8:41 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> The earlier implementation fell into a corner case for bytes that were
> 0x01, giving a wrong result (but not affecting our application test
> cases for strings, as an ASCII value 0x01 is rare in those...).
>
> This changes the algorithm to:
>  1. Mask out the high-bit of each bytes (so that each byte is <= 127).
>  2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow
>     into the highest bit of each byte).
>  3. Bitwise-or the original value back in (to cover those cases where the
>     source byte was exactly 128) to saturate the high-bit.
>  4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of
>     each byte into its LSB.
>  5. Multiply with 0xff to fan out the LSB to all bits of each byte.
>
> Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci")
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> Reported-by: Vincent Palatin <vpalatin@rivosinc.com>
>


Tested-by: Vincent Palatin <vpalatin@rivosinc.com>


> ---
>
>  target/riscv/insn_trans/trans_rvb.c.inc | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc
> index 185c3e9a60..3095624f32 100644
> --- a/target/riscv/insn_trans/trans_rvb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvb.c.inc
> @@ -249,13 +249,16 @@ static bool trans_rev8_64(DisasContext *ctx, arg_rev8_64 *a)
>  static void gen_orc_b(TCGv ret, TCGv source1)
>  {
>      TCGv  tmp = tcg_temp_new();
> -    TCGv  ones = tcg_constant_tl(dup_const_tl(MO_8, 0x01));
> +    TCGv  low7 = tcg_constant_tl(dup_const_tl(MO_8, 0x7f));
>
> -    /* Set lsb in each byte if the byte was zero. */
> -    tcg_gen_sub_tl(tmp, source1, ones);
> -    tcg_gen_andc_tl(tmp, tmp, source1);
> +    /* Set msb in each byte if the byte was non-zero. */
> +    tcg_gen_and_tl(tmp, source1, low7);
> +    tcg_gen_add_tl(tmp, tmp, low7);
> +    tcg_gen_or_tl(tmp, tmp, source1);
> +
> +    /* Extract the msb to the lsb in each byte */
> +    tcg_gen_andc_tl(tmp, tmp, low7);
>      tcg_gen_shri_tl(tmp, tmp, 7);
> -    tcg_gen_andc_tl(tmp, ones, tmp);
>
>      /* Replicate the lsb of each byte across the byte. */
>      tcg_gen_muli_tl(ret, tmp, 0xff);
> --
> 2.25.1
>
Richard Henderson Oct. 13, 2021, 7:57 p.m. UTC | #2
On 10/13/21 11:41 AM, Philipp Tomsich wrote:
> The earlier implementation fell into a corner case for bytes that were
> 0x01, giving a wrong result (but not affecting our application test
> cases for strings, as an ASCII value 0x01 is rare in those...).
> 
> This changes the algorithm to:
>   1. Mask out the high-bit of each bytes (so that each byte is <= 127).
>   2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow
>      into the highest bit of each byte).
>   3. Bitwise-or the original value back in (to cover those cases where the
>      source byte was exactly 128) to saturate the high-bit.
>   4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of
>      each byte into its LSB.
>   5. Multiply with 0xff to fan out the LSB to all bits of each byte.
> 
> Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci")
> 
> Signed-off-by: Philipp Tomsich<philipp.tomsich@vrull.eu>
> Reported-by: Vincent Palatin<vpalatin@rivosinc.com>
> 
> ---

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~
Alistair Francis Oct. 15, 2021, 5:28 a.m. UTC | #3
On Thu, Oct 14, 2021 at 4:43 AM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> The earlier implementation fell into a corner case for bytes that were
> 0x01, giving a wrong result (but not affecting our application test
> cases for strings, as an ASCII value 0x01 is rare in those...).
>
> This changes the algorithm to:
>  1. Mask out the high-bit of each bytes (so that each byte is <= 127).
>  2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow
>     into the highest bit of each byte).
>  3. Bitwise-or the original value back in (to cover those cases where the
>     source byte was exactly 128) to saturate the high-bit.
>  4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of
>     each byte into its LSB.
>  5. Multiply with 0xff to fan out the LSB to all bits of each byte.
>
> Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci")
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> Reported-by: Vincent Palatin <vpalatin@rivosinc.com>

Thanks!

Applied to riscv-to-apply.next

Alistair

>
> ---
>
>  target/riscv/insn_trans/trans_rvb.c.inc | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc
> index 185c3e9a60..3095624f32 100644
> --- a/target/riscv/insn_trans/trans_rvb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvb.c.inc
> @@ -249,13 +249,16 @@ static bool trans_rev8_64(DisasContext *ctx, arg_rev8_64 *a)
>  static void gen_orc_b(TCGv ret, TCGv source1)
>  {
>      TCGv  tmp = tcg_temp_new();
> -    TCGv  ones = tcg_constant_tl(dup_const_tl(MO_8, 0x01));
> +    TCGv  low7 = tcg_constant_tl(dup_const_tl(MO_8, 0x7f));
>
> -    /* Set lsb in each byte if the byte was zero. */
> -    tcg_gen_sub_tl(tmp, source1, ones);
> -    tcg_gen_andc_tl(tmp, tmp, source1);
> +    /* Set msb in each byte if the byte was non-zero. */
> +    tcg_gen_and_tl(tmp, source1, low7);
> +    tcg_gen_add_tl(tmp, tmp, low7);
> +    tcg_gen_or_tl(tmp, tmp, source1);
> +
> +    /* Extract the msb to the lsb in each byte */
> +    tcg_gen_andc_tl(tmp, tmp, low7);
>      tcg_gen_shri_tl(tmp, tmp, 7);
> -    tcg_gen_andc_tl(tmp, ones, tmp);
>
>      /* Replicate the lsb of each byte across the byte. */
>      tcg_gen_muli_tl(ret, tmp, 0xff);
> --
> 2.25.1
>
>
diff mbox series

Patch

diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc
index 185c3e9a60..3095624f32 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -249,13 +249,16 @@  static bool trans_rev8_64(DisasContext *ctx, arg_rev8_64 *a)
 static void gen_orc_b(TCGv ret, TCGv source1)
 {
     TCGv  tmp = tcg_temp_new();
-    TCGv  ones = tcg_constant_tl(dup_const_tl(MO_8, 0x01));
+    TCGv  low7 = tcg_constant_tl(dup_const_tl(MO_8, 0x7f));
 
-    /* Set lsb in each byte if the byte was zero. */
-    tcg_gen_sub_tl(tmp, source1, ones);
-    tcg_gen_andc_tl(tmp, tmp, source1);
+    /* Set msb in each byte if the byte was non-zero. */
+    tcg_gen_and_tl(tmp, source1, low7);
+    tcg_gen_add_tl(tmp, tmp, low7);
+    tcg_gen_or_tl(tmp, tmp, source1);
+
+    /* Extract the msb to the lsb in each byte */
+    tcg_gen_andc_tl(tmp, tmp, low7);
     tcg_gen_shri_tl(tmp, tmp, 7);
-    tcg_gen_andc_tl(tmp, ones, tmp);
 
     /* Replicate the lsb of each byte across the byte. */
     tcg_gen_muli_tl(ret, tmp, 0xff);