Message ID | 20241206095824.281216-1-wannacu2049@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | tcg/optimize: Fix constant folding of setcond | expand |
On 12/6/24 03:58, wannacu wrote: > The `z_mask` field of TCGTemp argument needs to be > properly set for the upcoming `fold_setcond_zmask` call > > This patch resolves issues with running some x86_64 > applications (e.g., FontForge, Krita) on riscv64 > > Signed-off-by: wannacu <wannacu2049@gmail.com> > --- > tcg/optimize.c | 3 +++ > tests/tcg/x86_64/Makefile.target | 1 + > tests/tcg/x86_64/setcond.c | 28 ++++++++++++++++++++++++++++ > 3 files changed, 32 insertions(+) > create mode 100644 tests/tcg/x86_64/setcond.c > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index e9ef16b3c6..e580b8d8b1 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > @@ -834,6 +834,9 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest, > ? INDEX_op_and_i32 : INDEX_op_and_i64); > TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3); > TCGArg tmp = arg_new_temp(ctx); > + /* Set z_mask for the follwing `fold_setcond_zmask` call. */ > + arg_info(tmp)->z_mask = (ctx->type == TCG_TYPE_I32 > + ? UINT32_MAX : UINT64_MAX); > > op2->args[0] = tmp; > op2->args[1] = *p1; The problem is stale data in tmp's arg_info, correct? Better would be adding fold_and(op2) a few lines later, after finishing setup of op2's arguments. > +++ b/tests/tcg/x86_64/setcond.c > @@ -0,0 +1,28 @@ > +#include <stdint.h> > +#include <assert.h> > + > +uint8_t test(uint8_t a) > +{ > + uint8_t res = 0xff; > + asm( > + "lea -0x1160(%%edi), %%edx\n\t" > + "lea -0xd7b0(%%edi), %%ecx\n\t" > + "cmp $0x9f, %%edx\n\t" > + "setbe %%dl\n\t" > + "cmp $0x4f, %%ecx\n\t" > + "setbe %%cl\n\t" > + "or %%ecx, %%edx\n\t" > + "cmp $0x200b, %%edi\n\t" > + "sete %0\n\t" > + : "=r"(res) > + ); > + return res; > +} > + > +int main(void) > +{ > + for (uint8_t a = 0; a < 0xff; a++) { > + assert(test(a) == 0); > + } > + return 0; > +} (1) The inline assembly is incorrect, and (2) the test does not fail without your patch. So it is difficult for me to tell exactly what you're trying to test. Problems with the asm: - using edi directly instead of having a as an input, - passing uint8_t a, but using all 32-bits of edi; upper 24 are logically garbage. - not adding edx, ecx as clobbers, or better as temp arguments. - initializing res, but not adding as an input, or in-out argument. I *think* you want unsigned char test(unsigned a) { unsigned char res = 0xff; unsigned t1, t2; asm("lea -0x1160(%3), %1\n\t" "lea -0xd7b0(%3), %2\n\t" "cmp $0x9f, %1\n\t" "setbe %b1\n\t" "cmp $0x4f, %2\n\t" "setbe %b2\n\t" "or %2, %1\n\t" "cmp $0x200b, %3\n\t" "sete %0\n\t" : "+r"(res), "=&r"(t1), "=&r"(t2) : "r"(a)); return res; } But as the test doesn't ever fail for me, I can't tell. r~
> The problem is stale data in tmp's arg_info, correct? Better would be adding > fold_and(op2) a few lines later, after finishing setup of op2's arguments. You're absolutely right, stale data in tmp's arg_info is the root issue here. > (1) The inline assembly is incorrect, and (2) the test does not fail without your patch. > So it is difficult for me to tell exactly what you're trying to test. > Problems with the asm: > - using edi directly instead of having a as an input, > - passing uint8_t a, but using all 32-bits of edi; upper 24 are logically garbage. > - not adding edx, ecx as clobbers, or better as temp arguments. > - initializing res, but not adding as an input, or in-out argument. Thank you for the detailed feedback and for pointing out the issues with the inline assembly. These points are all valid concerns, and I'll work on addressing them in the patch. After reading your response, I tested it again on another riscv machine(licheepi 4a), and the issue still persists. In the error scenario, the `sete` instruction corresponds to the following op and op_opt: op: setcond_i64 loc0,cc_dst,$0xffffffff,tsteq op_opt: and_i64 tmp14,cc_dst,$0xffffffff dead: 1 2 pref=0xffffffff xor_i64 tmp0,tmp14,$0x1 dead: 1 2 pref=0xffffffff > I *think* you want > > unsigned char test(unsigned a) > { > unsigned char res = 0xff; > unsigned t1, t2; > > asm("lea -0x1160(%3), %1\n\t" > "lea -0xd7b0(%3), %2\n\t" > "cmp $0x9f, %1\n\t" > "setbe %b1\n\t" > "cmp $0x4f, %2\n\t" > "setbe %b2\n\t" > "or %2, %1\n\t" > "cmp $0x200b, %3\n\t" > "sete %0\n\t" > : "+r"(res), "=&r"(t1), "=&r"(t2) : "r"(a)); > return res; > } > > But as the test doesn't ever fail for me, I can't tell. I also tested the updated inline assembly, and the test failed as expected. I'm not sure why you couldn't reproduce the issue on your side; perhaps it's easier to reproduce on an RV64GC machine. Thanks, w
On 12/6/24 03:58, wannacu wrote: > The `z_mask` field of TCGTemp argument needs to be > properly set for the upcoming `fold_setcond_zmask` call > > This patch resolves issues with running some x86_64 > applications (e.g., FontForge, Krita) on riscv64 > > Signed-off-by: wannacu <wannacu2049@gmail.com> > --- > tcg/optimize.c | 3 +++ > tests/tcg/x86_64/Makefile.target | 1 + > tests/tcg/x86_64/setcond.c | 28 ++++++++++++++++++++++++++++ > 3 files changed, 32 insertions(+) > create mode 100644 tests/tcg/x86_64/setcond.c > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index e9ef16b3c6..e580b8d8b1 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > @@ -834,6 +834,9 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest, > ? INDEX_op_and_i32 : INDEX_op_and_i64); > TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3); > TCGArg tmp = arg_new_temp(ctx); > + /* Set z_mask for the follwing `fold_setcond_zmask` call. */ > + arg_info(tmp)->z_mask = (ctx->type == TCG_TYPE_I32 > + ? UINT32_MAX : UINT64_MAX); I was curious as to why this helped, when arg_new_temp was supposed to be doing exactly this. It turns out we were incorrectly reusing an old temp, not allocating a new one. r~
> I was curious as to why this helped, when arg_new_temp was supposed to be doing exactly > this. It turns out we were incorrectly reusing an old temp, not allocating a new one. I appreciate your explanation of the issue. I had attempted to update the z_mask in init_ts_info when the temp arg was reused, but it didn't seem to have the desired effect. Since I'm not deeply familiar with the codebase, I didn't dive into all the details. I'm glad to see that your new patch resolves the issue so effectively. Great work! Thanks, w
diff --git a/tcg/optimize.c b/tcg/optimize.c index e9ef16b3c6..e580b8d8b1 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -834,6 +834,9 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest, ? INDEX_op_and_i32 : INDEX_op_and_i64); TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3); TCGArg tmp = arg_new_temp(ctx); + /* Set z_mask for the follwing `fold_setcond_zmask` call. */ + arg_info(tmp)->z_mask = (ctx->type == TCG_TYPE_I32 + ? UINT32_MAX : UINT64_MAX); op2->args[0] = tmp; op2->args[1] = *p1; diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target index d6dff559c7..085efa01e1 100644 --- a/tests/tcg/x86_64/Makefile.target +++ b/tests/tcg/x86_64/Makefile.target @@ -9,6 +9,7 @@ include $(SRC_PATH)/tests/tcg/i386/Makefile.target X86_64_TESTS += test-2413 +X86_64_TESTS += setcond.c ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET)) X86_64_TESTS += vsyscall diff --git a/tests/tcg/x86_64/setcond.c b/tests/tcg/x86_64/setcond.c new file mode 100644 index 0000000000..317a7e74d2 --- /dev/null +++ b/tests/tcg/x86_64/setcond.c @@ -0,0 +1,28 @@ +#include <stdint.h> +#include <assert.h> + +uint8_t test(uint8_t a) +{ + uint8_t res = 0xff; + asm( + "lea -0x1160(%%edi), %%edx\n\t" + "lea -0xd7b0(%%edi), %%ecx\n\t" + "cmp $0x9f, %%edx\n\t" + "setbe %%dl\n\t" + "cmp $0x4f, %%ecx\n\t" + "setbe %%cl\n\t" + "or %%ecx, %%edx\n\t" + "cmp $0x200b, %%edi\n\t" + "sete %0\n\t" + : "=r"(res) + ); + return res; +} + +int main(void) +{ + for (uint8_t a = 0; a < 0xff; a++) { + assert(test(a) == 0); + } + return 0; +}
The `z_mask` field of TCGTemp argument needs to be properly set for the upcoming `fold_setcond_zmask` call This patch resolves issues with running some x86_64 applications (e.g., FontForge, Krita) on riscv64 Signed-off-by: wannacu <wannacu2049@gmail.com> --- tcg/optimize.c | 3 +++ tests/tcg/x86_64/Makefile.target | 1 + tests/tcg/x86_64/setcond.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 32 insertions(+) create mode 100644 tests/tcg/x86_64/setcond.c