diff mbox series

tcg/optimize: Fix constant folding of setcond

Message ID 20241206095824.281216-1-wannacu2049@gmail.com (mailing list archive)
State New
Headers show
Series tcg/optimize: Fix constant folding of setcond | expand

Commit Message

wannacu Dec. 6, 2024, 9:58 a.m. UTC
The `z_mask` field of TCGTemp argument needs to be
properly set for the upcoming `fold_setcond_zmask` call

This patch resolves issues with running some x86_64
applications (e.g., FontForge, Krita) on riscv64

Signed-off-by: wannacu <wannacu2049@gmail.com>
---
 tcg/optimize.c                   |  3 +++
 tests/tcg/x86_64/Makefile.target |  1 +
 tests/tcg/x86_64/setcond.c       | 28 ++++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)
 create mode 100644 tests/tcg/x86_64/setcond.c

Comments

Richard Henderson Dec. 6, 2024, 3:41 p.m. UTC | #1
On 12/6/24 03:58, wannacu wrote:
> The `z_mask` field of TCGTemp argument needs to be
> properly set for the upcoming `fold_setcond_zmask` call
> 
> This patch resolves issues with running some x86_64
> applications (e.g., FontForge, Krita) on riscv64
> 
> Signed-off-by: wannacu <wannacu2049@gmail.com>
> ---
>   tcg/optimize.c                   |  3 +++
>   tests/tcg/x86_64/Makefile.target |  1 +
>   tests/tcg/x86_64/setcond.c       | 28 ++++++++++++++++++++++++++++
>   3 files changed, 32 insertions(+)
>   create mode 100644 tests/tcg/x86_64/setcond.c
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index e9ef16b3c6..e580b8d8b1 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -834,6 +834,9 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
>                                ? INDEX_op_and_i32 : INDEX_op_and_i64);
>           TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
>           TCGArg tmp = arg_new_temp(ctx);
> +        /* Set z_mask for the follwing `fold_setcond_zmask` call. */
> +        arg_info(tmp)->z_mask = (ctx->type == TCG_TYPE_I32
> +                                      ? UINT32_MAX : UINT64_MAX);
>   
>           op2->args[0] = tmp;
>           op2->args[1] = *p1;

The problem is stale data in tmp's arg_info, correct?  Better would be adding 
fold_and(op2) a few lines later, after finishing setup of op2's arguments.

> +++ b/tests/tcg/x86_64/setcond.c
> @@ -0,0 +1,28 @@
> +#include <stdint.h>
> +#include <assert.h>
> +
> +uint8_t test(uint8_t a)
> +{
> +    uint8_t res = 0xff;
> +    asm(
> +        "lea -0x1160(%%edi), %%edx\n\t"
> +        "lea -0xd7b0(%%edi), %%ecx\n\t"
> +        "cmp $0x9f, %%edx\n\t"
> +        "setbe %%dl\n\t"
> +        "cmp $0x4f, %%ecx\n\t"
> +        "setbe %%cl\n\t"
> +        "or %%ecx, %%edx\n\t"
> +        "cmp $0x200b, %%edi\n\t"
> +        "sete %0\n\t"
> +        : "=r"(res)
> +    );
> +    return res;
> +}
> +
> +int main(void)
> +{
> +    for (uint8_t a = 0; a < 0xff; a++) {
> +        assert(test(a) == 0);
> +    }
> +    return 0;
> +}

(1) The inline assembly is incorrect, and (2) the test does not fail without your patch. 
So it is difficult for me to tell exactly what you're trying to test.

Problems with the asm:
   - using edi directly instead of having a as an input,
   - passing uint8_t a, but using all 32-bits of edi; upper 24 are logically garbage.
   - not adding edx, ecx as clobbers, or better as temp arguments.
   - initializing res, but not adding as an input, or in-out argument.

I *think* you want

unsigned char test(unsigned a)
{
     unsigned char res = 0xff;
     unsigned t1, t2;

     asm("lea -0x1160(%3), %1\n\t"
         "lea -0xd7b0(%3), %2\n\t"
         "cmp $0x9f, %1\n\t"
         "setbe %b1\n\t"
         "cmp $0x4f, %2\n\t"
         "setbe %b2\n\t"
         "or %2, %1\n\t"
         "cmp $0x200b, %3\n\t"
         "sete %0\n\t"
         : "+r"(res), "=&r"(t1), "=&r"(t2) : "r"(a));
     return res;
}

But as the test doesn't ever fail for me, I can't tell.


r~
wannacu Dec. 7, 2024, 8:29 a.m. UTC | #2
> The problem is stale data in tmp's arg_info, correct?  Better would be adding 
> fold_and(op2) a few lines later, after finishing setup of op2's arguments.

You're absolutely right, stale data in tmp's arg_info is the root issue here.

> (1) The inline assembly is incorrect, and (2) the test does not fail without your patch. 
> So it is difficult for me to tell exactly what you're trying to test.

> Problems with the asm:
>   - using edi directly instead of having a as an input,
>   - passing uint8_t a, but using all 32-bits of edi; upper 24 are logically garbage.
>   - not adding edx, ecx as clobbers, or better as temp arguments.
>   - initializing res, but not adding as an input, or in-out argument.

Thank you for the detailed feedback and for pointing out the issues with the inline assembly.
These points are all valid concerns, and I'll work on addressing them in the patch.
After reading your response, I tested it again on another riscv machine(licheepi 4a), and the
issue still persists.

In the error scenario, the `sete` instruction corresponds to the following op and op_opt:
op:
 setcond_i64 loc0,cc_dst,$0xffffffff,tsteq
op_opt:
 and_i64 tmp14,cc_dst,$0xffffffff         dead: 1 2  pref=0xffffffff
 xor_i64 tmp0,tmp14,$0x1                  dead: 1 2  pref=0xffffffff

> I *think* you want
>
> unsigned char test(unsigned a)
> {
>      unsigned char res = 0xff;
>      unsigned t1, t2;
>
>      asm("lea -0x1160(%3), %1\n\t"
>          "lea -0xd7b0(%3), %2\n\t"
>          "cmp $0x9f, %1\n\t"
>          "setbe %b1\n\t"
>          "cmp $0x4f, %2\n\t"
>          "setbe %b2\n\t"
>          "or %2, %1\n\t"
>          "cmp $0x200b, %3\n\t"
>          "sete %0\n\t"
>          : "+r"(res), "=&r"(t1), "=&r"(t2) : "r"(a));
>      return res;
> }
>
> But as the test doesn't ever fail for me, I can't tell.

I also tested the updated inline assembly, and the test failed as expected.
I'm not sure why you couldn't reproduce the issue on your side; perhaps it's easier to
reproduce on an RV64GC machine.


Thanks,
w
Richard Henderson Dec. 7, 2024, 6:27 p.m. UTC | #3
On 12/6/24 03:58, wannacu wrote:
> The `z_mask` field of TCGTemp argument needs to be
> properly set for the upcoming `fold_setcond_zmask` call
> 
> This patch resolves issues with running some x86_64
> applications (e.g., FontForge, Krita) on riscv64
> 
> Signed-off-by: wannacu <wannacu2049@gmail.com>
> ---
>   tcg/optimize.c                   |  3 +++
>   tests/tcg/x86_64/Makefile.target |  1 +
>   tests/tcg/x86_64/setcond.c       | 28 ++++++++++++++++++++++++++++
>   3 files changed, 32 insertions(+)
>   create mode 100644 tests/tcg/x86_64/setcond.c
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index e9ef16b3c6..e580b8d8b1 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -834,6 +834,9 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
>                                ? INDEX_op_and_i32 : INDEX_op_and_i64);
>           TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
>           TCGArg tmp = arg_new_temp(ctx);
> +        /* Set z_mask for the follwing `fold_setcond_zmask` call. */
> +        arg_info(tmp)->z_mask = (ctx->type == TCG_TYPE_I32
> +                                      ? UINT32_MAX : UINT64_MAX);

I was curious as to why this helped, when arg_new_temp was supposed to be doing exactly 
this.  It turns out we were incorrectly reusing an old temp, not allocating a new one.


r~
wannacu Dec. 9, 2024, 1:46 a.m. UTC | #4
> I was curious as to why this helped, when arg_new_temp was supposed to be doing exactly 
> this.  It turns out we were incorrectly reusing an old temp, not allocating a new one.

I appreciate your explanation of the issue. I had attempted to update the z_mask in 
init_ts_info when the temp arg was reused, but it didn't seem to have the desired effect.
Since I'm not deeply familiar with the codebase, I didn't dive into all the details.
I'm glad to see that your new patch resolves the issue so effectively. Great work!

Thanks,
w
diff mbox series

Patch

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e9ef16b3c6..e580b8d8b1 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -834,6 +834,9 @@  static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
                              ? INDEX_op_and_i32 : INDEX_op_and_i64);
         TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
         TCGArg tmp = arg_new_temp(ctx);
+        /* Set z_mask for the follwing `fold_setcond_zmask` call. */
+        arg_info(tmp)->z_mask = (ctx->type == TCG_TYPE_I32
+                                      ? UINT32_MAX : UINT64_MAX);
 
         op2->args[0] = tmp;
         op2->args[1] = *p1;
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index d6dff559c7..085efa01e1 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -9,6 +9,7 @@ 
 include $(SRC_PATH)/tests/tcg/i386/Makefile.target
 
 X86_64_TESTS += test-2413
+X86_64_TESTS += setcond.c
 
 ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET))
 X86_64_TESTS += vsyscall
diff --git a/tests/tcg/x86_64/setcond.c b/tests/tcg/x86_64/setcond.c
new file mode 100644
index 0000000000..317a7e74d2
--- /dev/null
+++ b/tests/tcg/x86_64/setcond.c
@@ -0,0 +1,28 @@ 
+#include <stdint.h>
+#include <assert.h>
+
+uint8_t test(uint8_t a)
+{
+    uint8_t res = 0xff;
+    asm(
+        "lea -0x1160(%%edi), %%edx\n\t"
+        "lea -0xd7b0(%%edi), %%ecx\n\t"
+        "cmp $0x9f, %%edx\n\t"
+        "setbe %%dl\n\t"
+        "cmp $0x4f, %%ecx\n\t"
+        "setbe %%cl\n\t"
+        "or %%ecx, %%edx\n\t"
+        "cmp $0x200b, %%edi\n\t"
+        "sete %0\n\t"
+        : "=r"(res)
+    );
+    return res;
+}
+
+int main(void)
+{
+    for (uint8_t a = 0; a < 0xff; a++) {
+        assert(test(a) == 0);
+    }
+    return 0;
+}