diff mbox series

[PULL,18/23] accel/tcg: re-factor non-RAM execution code

Message ID 20210218094706.23038-19-alex.bennee@linaro.org (mailing list archive)
State New, archived
Headers show
Series [PULL,01/23] hw/virtio/pci: include vdev name in registered PCI sections | expand

Commit Message

Alex Bennée Feb. 18, 2021, 9:47 a.m. UTC
There is no real need to use CF_NOCACHE here. As long as the TB isn't
linked to other TBs or included in the QHT or jump cache then it will
only get executed once.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>

Comments

Peter Maydell April 15, 2021, 1:18 p.m. UTC | #1
On Thu, 18 Feb 2021 at 09:47, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> There is no real need to use CF_NOCACHE here. As long as the TB isn't
> linked to other TBs or included in the QHT or jump cache then it will
> only get executed once.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>

Hi; I've just noticed that this commit seems to break the case of:
 * execution of code not from a RAM block
 * when icount is enabled
 * and an instruction is an IO insn that triggers io-recompile

because:

> @@ -2097,6 +2086,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          tb_reset_jump(tb, 1);
>      }
>
> +    /*
> +     * If the TB is not associated with a physical RAM page then
> +     * it must be a temporary one-insn TB, and we have nothing to do
> +     * except fill in the page_addr[] fields. Return early before
> +     * attempting to link to other TBs or add to the lookup table.
> +     */
> +    if (phys_pc == -1) {
> +        tb->page_addr[0] = tb->page_addr[1] = -1;
> +        return tb;
> +    }

we used to fall through here, which meant we called
tcg_tb_insert(tb). No we no longer do. That's bad, because
cpu_io_recompile() does:

    tb = tcg_tb_lookup(retaddr);
    if (!tb) {
        cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
                  (void *)retaddr);
    }

and since it can no longer find the TB, QEMU aborts.

thanks
-- PMM
Peter Maydell April 15, 2021, 1:37 p.m. UTC | #2
On Thu, 15 Apr 2021 at 14:18, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 18 Feb 2021 at 09:47, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> > There is no real need to use CF_NOCACHE here. As long as the TB isn't
> > linked to other TBs or included in the QHT or jump cache then it will
> > only get executed once.
> >
> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> > Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>
>
> Hi; I've just noticed that this commit seems to break the case of:
>  * execution of code not from a RAM block
>  * when icount is enabled
>  * and an instruction is an IO insn that triggers io-recompile
>
> because:
>
> > @@ -2097,6 +2086,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >          tb_reset_jump(tb, 1);
> >      }
> >
> > +    /*
> > +     * If the TB is not associated with a physical RAM page then
> > +     * it must be a temporary one-insn TB, and we have nothing to do
> > +     * except fill in the page_addr[] fields. Return early before
> > +     * attempting to link to other TBs or add to the lookup table.
> > +     */
> > +    if (phys_pc == -1) {
> > +        tb->page_addr[0] = tb->page_addr[1] = -1;
> > +        return tb;
> > +    }
>
> we used to fall through here, which meant we called
> tcg_tb_insert(tb). No we no longer do. That's bad, because
> cpu_io_recompile() does:
>
>     tb = tcg_tb_lookup(retaddr);
>     if (!tb) {
>         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
>                   (void *)retaddr);
>     }
>
> and since it can no longer find the TB, QEMU aborts.

Adding the tcg_tb_insert() call to the early exit path:

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ba6ab09790e..6014285e4dc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2081,6 +2081,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      */
     if (phys_pc == -1) {
         tb->page_addr[0] = tb->page_addr[1] = -1;
+        tcg_tb_insert(tb);
         return tb;
     }

seems to fix my test case, but I don't know enough about the new
design here to know if that has undesirable side effects.

-- PMM
Alex Bennée April 15, 2021, 2:31 p.m. UTC | #3
--8<---------------cut here---------------start------------->8---

Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 15 Apr 2021 at 14:18, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> On Thu, 18 Feb 2021 at 09:47, Alex Bennée <alex.bennee@linaro.org> wrote:
>> >
>> > There is no real need to use CF_NOCACHE here. As long as the TB isn't
>> > linked to other TBs or included in the QHT or jump cache then it will
>> > only get executed once.
>> >
>> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> > Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>
>>
>> Hi; I've just noticed that this commit seems to break the case of:
>>  * execution of code not from a RAM block
>>  * when icount is enabled
>>  * and an instruction is an IO insn that triggers io-recompile
>>
>> because:
>>
>> > @@ -2097,6 +2086,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>> >          tb_reset_jump(tb, 1);
>> >      }
>> >
>> > +    /*
>> > +     * If the TB is not associated with a physical RAM page then
>> > +     * it must be a temporary one-insn TB, and we have nothing to do
>> > +     * except fill in the page_addr[] fields. Return early before
>> > +     * attempting to link to other TBs or add to the lookup table.
>> > +     */
>> > +    if (phys_pc == -1) {
>> > +        tb->page_addr[0] = tb->page_addr[1] = -1;
>> > +        return tb;
>> > +    }
>>
>> we used to fall through here, which meant we called
>> tcg_tb_insert(tb). No we no longer do. That's bad, because
>> cpu_io_recompile() does:
>>
>>     tb = tcg_tb_lookup(retaddr);
>>     if (!tb) {
>>         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
>>                   (void *)retaddr);
>>     }
>>
>> and since it can no longer find the TB, QEMU aborts.
>
> Adding the tcg_tb_insert() call to the early exit path:
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index ba6ab09790e..6014285e4dc 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -2081,6 +2081,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>       */
>      if (phys_pc == -1) {
>          tb->page_addr[0] = tb->page_addr[1] = -1;
> +        tcg_tb_insert(tb);
>          return tb;
>      }
>
> seems to fix my test case, but I don't know enough about the new
> design here to know if that has undesirable side effects.

No we don't want to do that as the comment says above. However as it's a
single instruction block it can do IO so could you try this instead
please:

--8<---------------cut here---------------start------------->8---
accel/tcg: avoid re-translating one-shot instructions

By definition a single instruction is capable of being an IO
instruction. This avoids a problem of triggering a cpu_io_recompile on
a non-cached translation which would only do exactly this anyway.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

1 file changed, 1 insertion(+), 1 deletion(-)
accel/tcg/translate-all.c | 2 +-

modified   accel/tcg/translate-all.c
@@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
     if (phys_pc == -1) {
         /* Generate a one-shot TB with 1 insn in it */
-        cflags = (cflags & ~CF_COUNT_MASK) | 1;
+        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
     }
 
     max_insns = cflags & CF_COUNT_MASK;
--8<---------------cut here---------------end--------------->8---
Peter Maydell April 15, 2021, 2:54 p.m. UTC | #4
On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
> --8<---------------cut here---------------start------------->8---
> accel/tcg: avoid re-translating one-shot instructions
>
> By definition a single instruction is capable of being an IO
> instruction. This avoids a problem of triggering a cpu_io_recompile on
> a non-cached translation which would only do exactly this anyway.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> 1 file changed, 1 insertion(+), 1 deletion(-)
> accel/tcg/translate-all.c | 2 +-
>
> modified   accel/tcg/translate-all.c
> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>
>      if (phys_pc == -1) {
>          /* Generate a one-shot TB with 1 insn in it */
> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
>      }
>
>      max_insns = cflags & CF_COUNT_MASK;
> --8<---------------cut here---------------end--------------->8---

Yes, this fixes the problem. Do we want to put this in for 6.0? My
feeling is that executing from non-RAM is pretty niche, so maybe
if we need an rc4 anyway, but this isn't important enough to cause an
rc4 itself.

-- PMM
Philippe Mathieu-Daudé April 15, 2021, 3:55 p.m. UTC | #5
On 4/15/21 4:54 PM, Peter Maydell wrote:
> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
>> --8<---------------cut here---------------start------------->8---
>> accel/tcg: avoid re-translating one-shot instructions
>>
>> By definition a single instruction is capable of being an IO
>> instruction. This avoids a problem of triggering a cpu_io_recompile on
>> a non-cached translation which would only do exactly this anyway.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> accel/tcg/translate-all.c | 2 +-
>>
>> modified   accel/tcg/translate-all.c
>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>
>>      if (phys_pc == -1) {
>>          /* Generate a one-shot TB with 1 insn in it */
>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
>>      }
>>
>>      max_insns = cflags & CF_COUNT_MASK;
>> --8<---------------cut here---------------end--------------->8---
> 
> Yes, this fixes the problem. Do we want to put this in for 6.0? My
> feeling is that executing from non-RAM is pretty niche, so maybe
> if we need an rc4 anyway, but this isn't important enough to cause an
> rc4 itself.

Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).
Cédric Le Goater April 15, 2021, 5:18 p.m. UTC | #6
On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:
> On 4/15/21 4:54 PM, Peter Maydell wrote:
>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
>>> --8<---------------cut here---------------start------------->8---
>>> accel/tcg: avoid re-translating one-shot instructions
>>>
>>> By definition a single instruction is capable of being an IO
>>> instruction. This avoids a problem of triggering a cpu_io_recompile on
>>> a non-cached translation which would only do exactly this anyway.
>>>
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> accel/tcg/translate-all.c | 2 +-
>>>
>>> modified   accel/tcg/translate-all.c
>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>>
>>>      if (phys_pc == -1) {
>>>          /* Generate a one-shot TB with 1 insn in it */
>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
>>>      }
>>>
>>>      max_insns = cflags & CF_COUNT_MASK;
>>> --8<---------------cut here---------------end--------------->8---
>>
>> Yes, this fixes the problem. Do we want to put this in for 6.0? My
>> feeling is that executing from non-RAM is pretty niche, so maybe
>> if we need an rc4 anyway, but this isn't important enough to cause an
>> rc4 itself.
> 
> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).

You need to set the 'execute-in-place' machine option to load/execute the
instructions from the AHB window of CE0. It's not on by default because 
boot can be really slow with some recent u-boot which heavily trash the TBs.

But this seems to work fine with -rc3. 

C.
Peter Maydell April 15, 2021, 5:34 p.m. UTC | #7
On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:
>
> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:
> > On 4/15/21 4:54 PM, Peter Maydell wrote:
> >> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
> >>> --8<---------------cut here---------------start------------->8---
> >>> accel/tcg: avoid re-translating one-shot instructions
> >>>
> >>> By definition a single instruction is capable of being an IO
> >>> instruction. This avoids a problem of triggering a cpu_io_recompile on
> >>> a non-cached translation which would only do exactly this anyway.
> >>>
> >>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> >>>
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>> accel/tcg/translate-all.c | 2 +-
> >>>
> >>> modified   accel/tcg/translate-all.c
> >>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >>>
> >>>      if (phys_pc == -1) {
> >>>          /* Generate a one-shot TB with 1 insn in it */
> >>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
> >>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
> >>>      }
> >>>
> >>>      max_insns = cflags & CF_COUNT_MASK;
> >>> --8<---------------cut here---------------end--------------->8---
> >>
> >> Yes, this fixes the problem. Do we want to put this in for 6.0? My
> >> feeling is that executing from non-RAM is pretty niche, so maybe
> >> if we need an rc4 anyway, but this isn't important enough to cause an
> >> rc4 itself.
> >
> > Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).
>
> You need to set the 'execute-in-place' machine option to load/execute the
> instructions from the AHB window of CE0. It's not on by default because
> boot can be really slow with some recent u-boot which heavily trash the TBs.
>
> But this seems to work fine with -rc3.

Triggering the bug requires both execute-in-place and -icount -- did
you test with -icount enabled?

thanks
-- PMM
Cédric Le Goater April 16, 2021, 7:55 a.m. UTC | #8
On 4/15/21 7:34 PM, Peter Maydell wrote:
> On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:
>>
>> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:
>>> On 4/15/21 4:54 PM, Peter Maydell wrote:
>>>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>> --8<---------------cut here---------------start------------->8---
>>>>> accel/tcg: avoid re-translating one-shot instructions
>>>>>
>>>>> By definition a single instruction is capable of being an IO
>>>>> instruction. This avoids a problem of triggering a cpu_io_recompile on
>>>>> a non-cached translation which would only do exactly this anyway.
>>>>>
>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>>>
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> accel/tcg/translate-all.c | 2 +-
>>>>>
>>>>> modified   accel/tcg/translate-all.c
>>>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>>>>
>>>>>      if (phys_pc == -1) {
>>>>>          /* Generate a one-shot TB with 1 insn in it */
>>>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
>>>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
>>>>>      }
>>>>>
>>>>>      max_insns = cflags & CF_COUNT_MASK;
>>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>> Yes, this fixes the problem. Do we want to put this in for 6.0? My
>>>> feeling is that executing from non-RAM is pretty niche, so maybe
>>>> if we need an rc4 anyway, but this isn't important enough to cause an
>>>> rc4 itself.
>>>
>>> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).
>>
>> You need to set the 'execute-in-place' machine option to load/execute the
>> instructions from the AHB window of CE0. It's not on by default because
>> boot can be really slow with some recent u-boot which heavily trash the TBs.
>>
>> But this seems to work fine with -rc3.
> 
> Triggering the bug requires both execute-in-place and -icount -- did
> you test with -icount enabled?

It crashes.

Thanks,

C. 

$ qemu-system-arm -M romulus-bmc,execute-in-place=true -icount auto -drive file=./flash-romulus,format=raw,if=mtd  -serial mon:stdio
qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7efbcc001992
R00=0005107a R01=00000000 R02=00000000 R03=00000000
R04=00000350 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=00000000 R14=00000350 R15=00000c70
PSR=400001d3 -Z-- A S svc32
s00=00000000 s01=00000000 d00=0000000000000000
s02=00000000 s03=00000000 d01=0000000000000000
s04=00000000 s05=00000000 d02=0000000000000000
s06=00000000 s07=00000000 d03=0000000000000000
s08=00000000 s09=00000000 d04=0000000000000000
s10=00000000 s11=00000000 d05=0000000000000000
s12=00000000 s13=00000000 d06=0000000000000000
s14=00000000 s15=00000000 d07=0000000000000000
s16=00000000 s17=00000000 d08=0000000000000000
s18=00000000 s19=00000000 d09=0000000000000000
s20=00000000 s21=00000000 d10=0000000000000000
s22=00000000 s23=00000000 d11=0000000000000000
s24=00000000 s25=00000000 d12=0000000000000000
s26=00000000 s27=00000000 d13=0000000000000000
s28=00000000 s29=00000000 d14=0000000000000000
s30=00000000 s31=00000000 d15=0000000000000000
FPSCR: 00000000
Aborted (core dumped)
Alex Bennée April 16, 2021, 9:14 a.m. UTC | #9
Cédric Le Goater <clg@kaod.org> writes:

> On 4/15/21 7:34 PM, Peter Maydell wrote:
>> On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:
>>>
>>> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:
>>>> On 4/15/21 4:54 PM, Peter Maydell wrote:
>>>>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>> --8<---------------cut here---------------start------------->8---
>>>>>> accel/tcg: avoid re-translating one-shot instructions
>>>>>>
>>>>>> By definition a single instruction is capable of being an IO
>>>>>> instruction. This avoids a problem of triggering a cpu_io_recompile on
>>>>>> a non-cached translation which would only do exactly this anyway.
>>>>>>
>>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>>>>
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>> accel/tcg/translate-all.c | 2 +-
>>>>>>
>>>>>> modified   accel/tcg/translate-all.c
>>>>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>>>>>
>>>>>>      if (phys_pc == -1) {
>>>>>>          /* Generate a one-shot TB with 1 insn in it */
>>>>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
>>>>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
>>>>>>      }
>>>>>>
>>>>>>      max_insns = cflags & CF_COUNT_MASK;
>>>>>> --8<---------------cut here---------------end--------------->8---
>>>>>
>>>>> Yes, this fixes the problem. Do we want to put this in for 6.0? My
>>>>> feeling is that executing from non-RAM is pretty niche, so maybe
>>>>> if we need an rc4 anyway, but this isn't important enough to cause an
>>>>> rc4 itself.
>>>>
>>>> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).
>>>
>>> You need to set the 'execute-in-place' machine option to load/execute the
>>> instructions from the AHB window of CE0. It's not on by default because
>>> boot can be really slow with some recent u-boot which heavily trash the TBs.
>>>
>>> But this seems to work fine with -rc3.
>> 
>> Triggering the bug requires both execute-in-place and -icount -- did
>> you test with -icount enabled?
>
> It crashes.


Without the above patch? I've re-posted as a proper patch here:

  Subject: [RFC PATCH] accel/tcg: avoid re-translating one-shot instructions
  Date: Thu, 15 Apr 2021 17:24:53 +0100
  Message-Id: <20210415162454.22056-1-alex.bennee@linaro.org>
Cédric Le Goater April 16, 2021, 10:14 a.m. UTC | #10
On 4/16/21 11:14 AM, Alex Bennée wrote:
> 
> Cédric Le Goater <clg@kaod.org> writes:
> 
>> On 4/15/21 7:34 PM, Peter Maydell wrote:
>>> On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:
>>>>
>>>> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:
>>>>> On 4/15/21 4:54 PM, Peter Maydell wrote:
>>>>>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>> --8<---------------cut here---------------start------------->8---
>>>>>>> accel/tcg: avoid re-translating one-shot instructions
>>>>>>>
>>>>>>> By definition a single instruction is capable of being an IO
>>>>>>> instruction. This avoids a problem of triggering a cpu_io_recompile on
>>>>>>> a non-cached translation which would only do exactly this anyway.
>>>>>>>
>>>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>>>>>
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>> accel/tcg/translate-all.c | 2 +-
>>>>>>>
>>>>>>> modified   accel/tcg/translate-all.c
>>>>>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>>>>>>
>>>>>>>      if (phys_pc == -1) {
>>>>>>>          /* Generate a one-shot TB with 1 insn in it */
>>>>>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;
>>>>>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
>>>>>>>      }
>>>>>>>
>>>>>>>      max_insns = cflags & CF_COUNT_MASK;
>>>>>>> --8<---------------cut here---------------end--------------->8---
>>>>>>
>>>>>> Yes, this fixes the problem. Do we want to put this in for 6.0? My
>>>>>> feeling is that executing from non-RAM is pretty niche, so maybe
>>>>>> if we need an rc4 anyway, but this isn't important enough to cause an
>>>>>> rc4 itself.
>>>>>
>>>>> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).
>>>>
>>>> You need to set the 'execute-in-place' machine option to load/execute the
>>>> instructions from the AHB window of CE0. It's not on by default because
>>>> boot can be really slow with some recent u-boot which heavily trash the TBs.
>>>>
>>>> But this seems to work fine with -rc3.
>>>
>>> Triggering the bug requires both execute-in-place and -icount -- did
>>> you test with -icount enabled?
>>
>> It crashes.
> 
> 
> Without the above patch? I've re-posted as a proper patch here:
> 
>   Subject: [RFC PATCH] accel/tcg: avoid re-translating one-shot instructions
>   Date: Thu, 15 Apr 2021 17:24:53 +0100
>   Message-Id: <20210415162454.22056-1-alex.bennee@linaro.org>
> 


This patch does not fix the crash for the aspeed machines.

C.
diff mbox series

Patch

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c0b98e76b9..72b3c663c5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1779,7 +1779,8 @@  static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
 #endif
 }
 
-/* add a new TB and link it to the physical page tables. phys_page2 is
+/*
+ * Add a new TB and link it to the physical page tables. phys_page2 is
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
@@ -1798,17 +1799,6 @@  tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 
     assert_memory_lock();
 
-    if (phys_pc == -1) {
-        /*
-         * If the TB is not associated with a physical RAM page then
-         * it must be a temporary one-insn TB, and we have nothing to do
-         * except fill in the page_addr[] fields.
-         */
-        assert(tb->cflags & CF_NOCACHE);
-        tb->page_addr[0] = tb->page_addr[1] = -1;
-        return tb;
-    }
-
     /*
      * Add the TB to the page list, acquiring first the pages's locks.
      * We keep the locks held until after inserting the TB in the hash table,
@@ -1881,9 +1871,8 @@  TranslationBlock *tb_gen_code(CPUState *cpu,
     phys_pc = get_page_addr_code(env, pc);
 
     if (phys_pc == -1) {
-        /* Generate a temporary TB with 1 insn in it */
-        cflags &= ~CF_COUNT_MASK;
-        cflags |= CF_NOCACHE | 1;
+        /* Generate a one-shot TB with 1 insn in it */
+        cflags = (cflags & ~CF_COUNT_MASK) | 1;
     }
 
     cflags &= ~CF_CLUSTER_MASK;
@@ -2097,6 +2086,17 @@  TranslationBlock *tb_gen_code(CPUState *cpu,
         tb_reset_jump(tb, 1);
     }
 
+    /*
+     * If the TB is not associated with a physical RAM page then
+     * it must be a temporary one-insn TB, and we have nothing to do
+     * except fill in the page_addr[] fields. Return early before
+     * attempting to link to other TBs or add to the lookup table.
+     */
+    if (phys_pc == -1) {
+        tb->page_addr[0] = tb->page_addr[1] = -1;
+        return tb;
+    }
+
     /* check next page if needed */
     virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;
     phys_page2 = -1;