tcg: Fix the overflow in indexing tcg_ctx->temps

Message ID	20240418102747.27703-1-jiangzw@tecorigin.com (mailing list archive)
State	New
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Zhiwei Jiang <jiangzw@tecorigin.com> To: qemu-devel@nongnu.org Cc: richard.henderson@linaro.org, pbonzini@redhat.com, Zhiwei Jiang <jiangzw@tecorigin.com> Subject: [PATCH] tcg: Fix the overflow in indexing tcg_ctx->temps Date: Thu, 18 Apr 2024 10:27:47 +0000 Message-Id: <20240418102747.27703-1-jiangzw@tecorigin.com> Received-SPF: pass client-ip=115.124.28.148; envelope-from=jiangzw@tecorigin.com; helo=out28-148.mail.aliyun.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	tcg: Fix the overflow in indexing tcg_ctx->temps \| expand tcg: Fix the overflow in indexing tcg_ctx->temps

姜智伟 April 18, 2024, 10:27 a.m. UTC

Sometimes, when the address of the passed TCGTemp *ts variable is the same as tcg_ctx,
the index calculated in the temp_idx function, i.e., ts - tcg_ctx->temps,
can result in a particularly large value, causing overflow in the subsequent array access.

0  0x00007f79590132ac in test_bit (addr=<optimized out>, nr=<optimized out>)
    at /data/system/jiangzw/release_version/qemu8.2/include/qemu/bitops.h:135
1  init_ts_info (ctx=ctx@entry=0x7f794bffe460, ts=0x7f76fc000e00) at ../tcg/optimize.c:148
2  0x00007f7959014b50 in init_arguments (nb_args=2, op=0x7f76fc0101f8, ctx=0x7f794bffe460) at ../tcg/optimize.c:792
3  fold_call (op=0x7f76fc0101f8, ctx=0x7f794bffe460) at ../tcg/optimize.c:1348
4  tcg_optimize (s=<optimized out>) at ../tcg/optimize.c:2369
5  0x00007f7958ffa136 in tcg_gen_code (s=0x7f76fc000e00, tb=0x7f7904202380, pc_start=140741246462840) at ../tcg/tcg.c:6066

Signed-off-by: Zhiwei Jiang <jiangzw@tecorigin.com>
---
 include/tcg/tcg.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Richard Henderson April 18, 2024, 2:58 p.m. UTC | #1

On 4/18/24 03:27, Zhiwei Jiang wrote:
> Sometimes, when the address of the passed TCGTemp *ts variable is the same as tcg_ctx,

Pardon?  When would TCGTemp *ts == TCGContext *tcg_ctx?


> the index calculated in the temp_idx function, i.e., ts - tcg_ctx->temps,
> can result in a particularly large value, causing overflow in the subsequent array access.

Or, assert:

size_t temp_idx(TCGTemp *ts)
{
     ptrdiff_t n = ts - tcg_ctx->temps;
     assert(n >= 0 && n < tcg_ctx->nb_temps);
     return n;
}

>   static inline TCGTemp *tcgv_i32_temp(TCGv_i32 v)
>   {
> -    return (void *)tcg_ctx + (uintptr_t)v;
> +    return (void *)tcg_ctx->temps + (uintptr_t)v;
>   }

This will generate 0 for the first temp, which will test as NULL.



r~

姜智伟 April 19, 2024, 3:48 a.m. UTC | #2

> On 4/18/24 03:27, Zhiwei Jiang wrote:
> > Sometimes, when the address of the passed TCGTemp *ts variable is the same as tcg_ctx,
> 
> Pardon?  When would TCGTemp *ts == TCGContext *tcg_ctx?
> 
> 
> > the index calculated in the temp_idx function, i.e., ts - tcg_ctx->temps,
> > can result in a particularly large value, causing overflow in the subsequent array access.
> 
> Or, assert:
> 
> size_t temp_idx(TCGTemp *ts)
> {
>      ptrdiff_t n = ts - tcg_ctx->temps;
>      assert(n >= 0 && n < tcg_ctx->nb_temps);
>      return n;
> }
> 
> >   static inline TCGTemp *tcgv_i32_temp(TCGv_i32 v)
> >   {
> > -    return (void *)tcg_ctx + (uintptr_t)v;
> > +    return (void *)tcg_ctx->temps + (uintptr_t)v;
> >   }
> 
> This will generate 0 for the first temp, which will test as NULL.

Hi Richard:
You can reproduce this issue on the latest upstream QEMU version. Using the RISC-V QEMU version, 
if we compile a test program with the first instruction being '.insn r 0xf, 2, 0, x0, x0, x0',that is a RISC-V CBO instruction, 
qemu will crash with a segmentation fault upon execution.

When the first instruction in the program is a CBO instruction,  temp_idx in init_ts_info func returns a very large value, 
causing the subsequent test_bit function to access out-of-bounds memory.

static void init_ts_info(OptContext *ctx, TCGTemp *ts)
{
    size_t idx = temp_idx(ts);
    TempOptInfo *ti;

    if (test_bit(idx, ctx->temps_used.l)) {
        return;
    }
...
I can fix this segmentation fault by applying the modification above, 
and it seems more logical in terms of code logic to match the allocation and indexing of TCGTemp.

Ths

Peter Maydell April 19, 2024, 9:19 a.m. UTC | #3

On Fri, 19 Apr 2024 at 04:49, 姜智伟 <jiangzw@tecorigin.com> wrote:
>
> > On 4/18/24 03:27, Zhiwei Jiang wrote:
> > > Sometimes, when the address of the passed TCGTemp *ts variable is the same as tcg_ctx,
> >
> > Pardon?  When would TCGTemp *ts == TCGContext *tcg_ctx?
> >
> >
> > > the index calculated in the temp_idx function, i.e., ts - tcg_ctx->temps,
> > > can result in a particularly large value, causing overflow in the subsequent array access.
> >
> > Or, assert:
> >
> > size_t temp_idx(TCGTemp *ts)
> > {
> >      ptrdiff_t n = ts - tcg_ctx->temps;
> >      assert(n >= 0 && n < tcg_ctx->nb_temps);
> >      return n;
> > }
> >
> > >   static inline TCGTemp *tcgv_i32_temp(TCGv_i32 v)
> > >   {
> > > -    return (void *)tcg_ctx + (uintptr_t)v;
> > > +    return (void *)tcg_ctx->temps + (uintptr_t)v;
> > >   }
> >
> > This will generate 0 for the first temp, which will test as NULL.
>
> Hi Richard:
> You can reproduce this issue on the latest upstream QEMU version. Using the RISC-V QEMU version,
> if we compile a test program with the first instruction being '.insn r 0xf, 2, 0, x0, x0, x0',that is a RISC-V CBO instruction,
> qemu will crash with a segmentation fault upon execution.
>
> When the first instruction in the program is a CBO instruction,  temp_idx in init_ts_info func returns a very large value,
> causing the subsequent test_bit function to access out-of-bounds memory.

I feel like this might be a bug elsewhere. Can you provide
a repro binary and command line?

thanks
-- PMM

姜智伟 April 19, 2024, 9:37 a.m. UTC | #4

> > > On 4/18/24 03:27, Zhiwei Jiang wrote:
> > > > Sometimes, when the address of the passed TCGTemp *ts variable is the same as tcg_ctx,
> > >
> > > Pardon?  When would TCGTemp *ts == TCGContext *tcg_ctx?
> > >
> > >
> > > > the index calculated in the temp_idx function, i.e., ts - tcg_ctx->temps,
> > > > can result in a particularly large value, causing overflow in the subsequent array access.
> > >
> > > Or, assert:
> > >
> > > size_t temp_idx(TCGTemp *ts)
> > > {
> > >      ptrdiff_t n = ts - tcg_ctx->temps;
> > >      assert(n >= 0 && n < tcg_ctx->nb_temps);
> > >      return n;
> > > }
> > >
> > > >   static inline TCGTemp *tcgv_i32_temp(TCGv_i32 v)
> > > >   {
> > > > -    return (void *)tcg_ctx + (uintptr_t)v;
> > > > +    return (void *)tcg_ctx->temps + (uintptr_t)v;
> > > >   }
> > >
> > > This will generate 0 for the first temp, which will test as NULL.
> >
> > Hi Richard:
> > You can reproduce this issue on the latest upstream QEMU version. Using the RISC-V QEMU version,
> > if we compile a test program with the first instruction being '.insn r 0xf, 2, 0, x0, x0, x0',that is a RISC-V CBO instruction,
> > qemu will crash with a segmentation fault upon execution.
> >
> > When the first instruction in the program is a CBO instruction,  temp_idx in init_ts_info func returns a very large value,
> > causing the subsequent test_bit function to access out-of-bounds memory.
> 
> I feel like this might be a bug elsewhere. Can you provide
> a repro binary and command line?

The test file has been attached with RISCV CBO instruction as the first instruction to execute, with command-line arguments as 
./build/qemu-system-riscv64 -M virt -smp 1 -nographic -bios crash_test.bin

Peter Maydell April 19, 2024, 10:21 a.m. UTC | #5

On Fri, 19 Apr 2024 at 10:37, 姜智伟 <jiangzw@tecorigin.com> wrote:
> Peter Maydell wrote:
> > I feel like this might be a bug elsewhere. Can you provide
> > a repro binary and command line?
>
> The test file has been attached with RISCV CBO instruction as the first instruction to execute, with command-line arguments as
> ./build/qemu-system-riscv64 -M virt -smp 1 -nographic -bios crash_test.bin

It looks like you're building without --enable-debug. If you do
that then you'll find that we hit an assert in the debug version
of the function, which your patch doesn't touch:

#6  0x00007ffff4b90e96 in __GI___assert_fail
    (assertion=0x55555639a508 "o < sizeof(TCGTemp) *
tcg_ctx->nb_temps", file=0x5555563995d5 "../../tcg/tcg.c", line=1940,
function=0x55555639c000 <__PRETTY_FUNCTION__.60> "tcgv_i32_temp") at
./assert/assert.c:101
#7  0x000055555613104c in tcgv_i32_temp (v=0x0) at ../../tcg/tcg.c:1940
#8  0x0000555555d0882b in tcgv_i64_temp (v=0x0) at
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/tcg/tcg.h:638
#9  0x0000555555d0c17b in gen_helper_cbo_inval (arg1=0x2a8, arg2=0x0)
at ../../target/riscv/helper.h:121
#10 0x0000555555d7be65 in trans_cbo_inval (ctx=0x7fffef1c8e50,
a=0x7fffef1c8cf0) at
../../target/riscv/insn_trans/trans_rvzicbo.c.inc:48
#11 0x0000555555d41f4f in decode_insn32 (ctx=0x7fffef1c8e50,
insn=8207) at libqemu-riscv64-softmmu.fa.p/decode-insn32.c.inc:2332
#12 0x0000555555d925f1 in decode_opc (env=0x555556d53e30,
ctx=0x7fffef1c8e50, opcode=8207) at
../../target/riscv/translate.c:1165
#13 0x0000555555d92ab4 in riscv_tr_translate_insn
(dcbase=0x7fffef1c8e50, cpu=0x555556d51670) at
../../target/riscv/translate.c:1236

This happens because we've been passed in 0 as our TCGv,
which isn't valid. That in turn is because trans_cbo_inval()
does:
           gen_helper_cbo_inval(tcg_env, cpu_gpr[a->rs1]);
but a->rs1 is 0.

The comment in riscv_translate_init() says:
    /*
     * cpu_gpr[0] is a placeholder for the zero register. Do not use it.
     * Use the gen_set_gpr and get_gpr helper functions when accessing regs,
     * unless you specifically block reads/writes to reg 0.
     */

trans_cbo_inval() doesn't do either of those things, so that is
where your bug is.

thanks
-- PMM

Philippe Mathieu-Daudé April 19, 2024, 11:02 a.m. UTC | #6

On 19/4/24 12:21, Peter Maydell wrote:
> On Fri, 19 Apr 2024 at 10:37, 姜智伟 <jiangzw@tecorigin.com> wrote:
>> Peter Maydell wrote:
>>> I feel like this might be a bug elsewhere. Can you provide
>>> a repro binary and command line?
>>
>> The test file has been attached with RISCV CBO instruction as the first instruction to execute, with command-line arguments as
>> ./build/qemu-system-riscv64 -M virt -smp 1 -nographic -bios crash_test.bin
> 
> It looks like you're building without --enable-debug. If you do
> that then you'll find that we hit an assert in the debug version
> of the function, which your patch doesn't touch:
> 
> #6  0x00007ffff4b90e96 in __GI___assert_fail
>      (assertion=0x55555639a508 "o < sizeof(TCGTemp) *
> tcg_ctx->nb_temps", file=0x5555563995d5 "../../tcg/tcg.c", line=1940,
> function=0x55555639c000 <__PRETTY_FUNCTION__.60> "tcgv_i32_temp") at
> ./assert/assert.c:101
> #7  0x000055555613104c in tcgv_i32_temp (v=0x0) at ../../tcg/tcg.c:1940
> #8  0x0000555555d0882b in tcgv_i64_temp (v=0x0) at
> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/tcg/tcg.h:638
> #9  0x0000555555d0c17b in gen_helper_cbo_inval (arg1=0x2a8, arg2=0x0)
> at ../../target/riscv/helper.h:121
> #10 0x0000555555d7be65 in trans_cbo_inval (ctx=0x7fffef1c8e50,
> a=0x7fffef1c8cf0) at
> ../../target/riscv/insn_trans/trans_rvzicbo.c.inc:48
> #11 0x0000555555d41f4f in decode_insn32 (ctx=0x7fffef1c8e50,
> insn=8207) at libqemu-riscv64-softmmu.fa.p/decode-insn32.c.inc:2332
> #12 0x0000555555d925f1 in decode_opc (env=0x555556d53e30,
> ctx=0x7fffef1c8e50, opcode=8207) at
> ../../target/riscv/translate.c:1165
> #13 0x0000555555d92ab4 in riscv_tr_translate_insn
> (dcbase=0x7fffef1c8e50, cpu=0x555556d51670) at
> ../../target/riscv/translate.c:1236
> 
> This happens because we've been passed in 0 as our TCGv,
> which isn't valid. That in turn is because trans_cbo_inval()
> does:
>             gen_helper_cbo_inval(tcg_env, cpu_gpr[a->rs1]);
> but a->rs1 is 0.
> 
> The comment in riscv_translate_init() says:
>      /*
>       * cpu_gpr[0] is a placeholder for the zero register. Do not use it.
>       * Use the gen_set_gpr and get_gpr helper functions when accessing regs,
>       * unless you specifically block reads/writes to reg 0.
>       */
> 
> trans_cbo_inval() doesn't do either of those things, so that is
> where your bug is.

Our minds crossed =)

We need to use get_address() to get an address from cpu_gpr[],
since $zero is "special" (NULL).

I'm about to post this fix:

-- >8 --
diff --git a/target/riscv/insn_trans/trans_rvzicbo.c.inc 
b/target/riscv/insn_trans/trans_rvzicbo.c.inc
index d5d7095903..6f6b29598d 100644
--- a/target/riscv/insn_trans/trans_rvzicbo.c.inc
+++ b/target/riscv/insn_trans/trans_rvzicbo.c.inc
@@ -31,27 +31,27 @@
  static bool trans_cbo_clean(DisasContext *ctx, arg_cbo_clean *a)
  {
      REQUIRE_ZICBOM(ctx);
-    gen_helper_cbo_clean_flush(tcg_env, cpu_gpr[a->rs1]);
+    gen_helper_cbo_clean_flush(tcg_env, get_address(ctx, a->rs1, 0));
      return true;
  }

  static bool trans_cbo_flush(DisasContext *ctx, arg_cbo_flush *a)
  {
      REQUIRE_ZICBOM(ctx);
-    gen_helper_cbo_clean_flush(tcg_env, cpu_gpr[a->rs1]);
+    gen_helper_cbo_clean_flush(tcg_env, get_address(ctx, a->rs1, 0));
      return true;
  }

  static bool trans_cbo_inval(DisasContext *ctx, arg_cbo_inval *a)
  {
      REQUIRE_ZICBOM(ctx);
-    gen_helper_cbo_inval(tcg_env, cpu_gpr[a->rs1]);
+    gen_helper_cbo_inval(tcg_env, get_address(ctx, a->rs1, 0));
      return true;
  }

  static bool trans_cbo_zero(DisasContext *ctx, arg_cbo_zero *a)
  {
      REQUIRE_ZICBOZ(ctx);
-    gen_helper_cbo_zero(tcg_env, cpu_gpr[a->rs1]);
+    gen_helper_cbo_zero(tcg_env, get_address(ctx, a->rs1, 0));
      return true;
  }
---

tcg: Fix the overflow in indexing tcg_ctx->temps

Commit Message

Comments

Patch