[PULL,22/26] target/aarch64: optimize indirect branches
diff mbox

Message ID 20170605165233.4135-23-rth@twiddle.net
State New
Headers show

Commit Message

Richard Henderson June 5, 2017, 4:52 p.m. UTC
From: "Emilio G. Cota" <cota@braap.org>

Measurements:

[Baseline performance is that before applying this and the previous commit]

-                                    NBench, aarch64-softmmu. Host: Intel i7-4790K @ 4.00GHz

 1.7x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |   cross                                                                                                          |
 1.6x +cross+jr.................................................####...................................................+-+
      |                                                         #++#                                                     |
      |                                                         #  #                                                     |
 1.5x +-+...................................................*****..#...................................................+-+
      |                                                     *+++*  #                                                     |
      |                                                     *   *  #                                                     |
 1.4x +-+...................................................*...*..#...................................................+-+
      |                                                     *   *  #                                                     |
      |                                     #####           *   *  #                                                     |
 1.3x +-+................................****+++#...........*...*..#...................................................+-+
      |                                  *++*   #           *   *  #                                                     |
      |                                  *  *   #           *   *  #                                                     |
 1.2x +-+................................*..*...#...........*...*..#...................................................+-+
      |                                  *  *   #           *   *  #                                                     |
      |                            ####  *  *   #           *   *  #                                                     |
 1.1x +-+.......................+++#..#..*..*...#...........*...*..#...................................................+-+
      |                         ****  #  *  *   #           *   *  #                                        ****####     |
      |                         *  *  #  *  *   #           *   *  #  ****###   +++####            ****###  *  *   #     |
   1x +-++-++++++-++++****###++-*++*++#++*++*+-+#++****+++++*+++*++#++*++*-+#++*****++#++****###-++*++*-+#++*+-*+++#+-++-+
      |     *****###  *  *  #   *  *  #  *  *   #  *++*###  *   *  #  *  *  #  *   *  #  *  *++#   *  *  #  *  *   #     |
      |     *   *++#  *  *  #   *  *  #  *  *   #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #   *  *  #  *  *   #     |
 0.9x +-+---*****###--****###---****###--****####--****###--*****###--****###--*****###--****###---****###--****####---+-+
      ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONNEURAL NUMERIC SORSTRING SORT    hmean
  png: http://imgur.com/qO9ubtk
NB. cross here represents the previous commit.

-                            SPECint06 (test set), aarch64-linux-user. Host: Intel i7-4790K @ 4.00GHz

 1.5x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                       *****                                      |
      |                                                                       *+++*                           jr         |
      |                                                                       *   *                                      |
 1.4x +-+.....................................................................*...*.....................+++............+-+
      |                                                                       *   *                      |               |
      |                                      *****                            *   *                      |               |
      |                                      *   *                            *   *                    *****             |
 1.3x +-+....................................*...*............................*...*....................*.|.*...........+-+
      |                       +++            *   *                            *   *                    * | *             |
      |                      *****           *   *                            *   *                    *+++*             |
      |                      *   *           *   *                            *   *                    *   *             |
 1.2x +-+....................*...*...........*...*............................*...*...........*****....*...*...........+-+
      |     *****            *   *           *   *                            *   *           *   *    *   *    +++      |
      |     *   *            *   *           *   *                            *   *           *   *    *   *   *****     |
      |     *   *            *   *   *****   *   *                            *   *           *   *    *   *   *   *     |
 1.1x +-+...*...*............*...*...*...*...*...*............................*...*....+++....*...*....*...*...*...*...+-+
      |     *   *            *   *   *   *   *   *                            *   *   *****   *   *    *   *   *   *     |
      |     *   *            *   *   *   *   *   *   *****                    *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *****    *   *   *   *   *   *   *   *   ******           *   *   *   *   *   *    *   *   *   *     |
   1x +-++-+*+++*-++*+++*++++*+-+*+++*-++*+++*-++*+++*+++*++-*++++*-++*****+++*++-*+++*++-*+++*+-+*++++*+++*++-*+++*+-++-+
      |     *   *   *   *    *   *   *   *   *   *   *   *   *    *   *+++*   *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *     |
 0.9x +-+---*****---*****----*****---*****---*****---*****---******---*****---*****---*****---*****----*****---*****---+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/3Dp4vvq

-                           SPECint06 (train set), aarch64-linux-user. Host: Intel i7-4790K @ 4.00GHz

 1.7x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |                                                                                                       jr         |
 1.6x +-+...............................................................................................+++............+-+
      |                                                                                                *****             |
      |                                                                                                *+++*             |
      |                                                                                                *   *             |
 1.5x +-+..............................................................................................*...*...........+-+
      |                                                                        +++                     *   *             |
      |                                                                       *****                    *   *             |
 1.4x +-+.....................................................................*+++*....................*...*...........+-+
      |                                                                       *   *                    *   *             |
      |                                      *****                            *   *                    *   *             |
      |                                      *   *                            *   *   *****            *   *             |
 1.3x +-+....................................*...*............................*...*...*...*............*...*...........+-+
      |                       +++            *   *                            *   *   *   *            *   *             |
      |                      *****           *   *                            *   *   *   *   *****    *   *             |
 1.2x +-+....................*...*...........*...*............................*...*...*...*...*+++*....*...*...*****...+-+
      |                      *   *           *   *                            *   *   *   *   *   *    *   *   *+++*     |
      |     *****            *   *   *****   *   *                            *   *   *   *   *   *    *   *   *   *     |
      |     *   *            *   *   *+++*   *   *                            *   *   *   *   *   *    *   *   *   *     |
 1.1x +-+...*...*............*...*...*...*...*...*............................*...*...*...*...*...*....*...*...*...*...+-+
      |     *   *   *****    *   *   *   *   *   *                    *****   *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *   *    *   *   *   *   *   *    +++    ******   *+++*   *   *   *   *   *   *    *   *   *   *     |
   1x +-+---*****---*****----*****---*****---*****---*****---******---*****---*****---*****---*****----*****---*****---+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/vRrdc9j

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/arm/translate-a64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Alex Bennée June 7, 2017, 2:11 p.m. UTC | #1
Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Measurements:
>
> [Baseline performance is that before applying this and the previous
> commit]

Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
masked by an unrelated assertion breakage which I had to fix. However
with this patch my boot hangs spinning all 4 threads. Once reverted
things work again.

My command line:

  timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on

My tree with fix and revert:

  https://github.com/stsquad/qemu/tree/debug/aarch64-hang

I'm investigating now.

--
Alex Bennée
Alex Bennée June 7, 2017, 2:22 p.m. UTC | #2
Alex Bennée <alex.bennee@linaro.org> writes:

> Richard Henderson <rth@twiddle.net> writes:
>
>> From: "Emilio G. Cota" <cota@braap.org>
>>
>> Measurements:
>>
>> [Baseline performance is that before applying this and the previous
>> commit]
>
> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
> masked by an unrelated assertion breakage which I had to fix. However
> with this patch my boot hangs spinning all 4 threads. Once reverted
> things work again.
>
> My command line:
>
>   timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on
>
> My tree with fix and revert:
>
>   https://github.com/stsquad/qemu/tree/debug/aarch64-hang
>
> I'm investigating now.

Well this seems to be a case of hangs with -smp > 1 (which I guess was
obvious seeing as the TCG threads seem to be spinning against each
other).

--
Alex Bennée
Alex Bennée June 7, 2017, 3:19 p.m. UTC | #3
Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> From: "Emilio G. Cota" <cota@braap.org>
>>>
>>> Measurements:
>>>
>>> [Baseline performance is that before applying this and the previous
>>> commit]
>>
>> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
>> masked by an unrelated assertion breakage which I had to fix. However
>> with this patch my boot hangs spinning all 4 threads. Once reverted
>> things work again.
>>
>> My command line:
>>
>>   timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on
>>
>> My tree with fix and revert:
>>
>>   https://github.com/stsquad/qemu/tree/debug/aarch64-hang
>>
>> I'm investigating now.
>
> Well this seems to be a case of hangs with -smp > 1 (which I guess was
> obvious seeing as the TCG threads seem to be spinning against each
> other).

So the minimum fix I've found to get things working again is:

void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
{
    CPUState *cpu = ENV_GET_CPU(env);
    TranslationBlock *tb;
    target_ulong cs_base, pc;
    uint32_t flags;

    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
    if (likely(tb)) {
        cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
        if (likely(tb->pc == addr && tb->cs_base == cs_base &&
                   tb->flags == flags)) {
            goto found;
        }
        tb = tb_htable_lookup(cpu, addr, cs_base, flags);
        if (likely(tb)) {
            atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb);
            /* goto found; */
        }
    }
    return tcg_ctx.code_gen_epilogue;
 found:
    qemu_log_mask_and_addr(CPU_LOG_EXEC, addr,
                           "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
                           tb->tc_ptr, cpu->cpu_index, addr,
                           lookup_symbol(addr));
    return tb->tc_ptr;
}

Which I find very odd. It would imply that tb_htable_lookup is giving us
a bad result, except I would find it very unlikely what ever funky value
we stored in the jmp cache would never get hit again.

/me is very confused.

--
Alex Bennée
Alex Bennée June 7, 2017, 3:52 p.m. UTC | #4
Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>>> Richard Henderson <rth@twiddle.net> writes:
>>>
>>>> From: "Emilio G. Cota" <cota@braap.org>
>>>>
>>>> Measurements:
>>>>
>>>> [Baseline performance is that before applying this and the previous
>>>> commit]
>>>
>>> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
>>> masked by an unrelated assertion breakage which I had to fix. However
>>> with this patch my boot hangs spinning all 4 threads. Once reverted
>>> things work again.
>>>
>>> My command line:
>>>
>>>   timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on
>>>
>>> My tree with fix and revert:
>>>
>>>   https://github.com/stsquad/qemu/tree/debug/aarch64-hang
>>>
>>> I'm investigating now.
>>
>> Well this seems to be a case of hangs with -smp > 1 (which I guess was
>> obvious seeing as the TCG threads seem to be spinning against each
>> other).
>
> So the minimum fix I've found to get things working again is:
>
> void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
> {
>     CPUState *cpu = ENV_GET_CPU(env);
>     TranslationBlock *tb;
>     target_ulong cs_base, pc;
>     uint32_t flags;
>
>     tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>     if (likely(tb)) {
>         cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>         if (likely(tb->pc == addr && tb->cs_base == cs_base &&
>                    tb->flags == flags)) {
>             goto found;
>         }
>         tb = tb_htable_lookup(cpu, addr, cs_base, flags);
>         if (likely(tb)) {
>             atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb);
>             /* goto found; */
>         }
>     }
>     return tcg_ctx.code_gen_epilogue;
>  found:
>     qemu_log_mask_and_addr(CPU_LOG_EXEC, addr,
>                            "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
>                            tb->tc_ptr, cpu->cpu_index, addr,
>                            lookup_symbol(addr));
>     return tb->tc_ptr;
> }
>
> Which I find very odd. It would imply that tb_htable_lookup is giving us
> a bad result, except I would find it very unlikely what ever funky value
> we stored in the jmp cache would never get hit again.
>
> /me is very confused.

OK I think we need to think about when we can run code, c.f. comment in
tb_gen_code:

    /* As long as consistency of the TB stuff is provided by tb_lock in user
     * mode and is implicit in single-threaded softmmu emulation, no explicit
     * memory barrier is required before tb_link_page() makes the TB visible
     * through the physical hash table and physical page list.
     */
    tb_link_page(tb, phys_pc, phys_page2);

As tb_link_page does the qht_insert I think what is happening is we are
attempting to run code that has just been generated but is not
completely visible to other cores yet. By commenting out the goto found
we slip it enough that next time around it is complete.

However I'm still confused because qht_insert and qht_lookup have
implied barriers in them that should mean we don't pull a partially
complete TB out of the hash.

I did add some debug that saved the tb_htable_lookup's in an cpu indexed
array and they certainly get hits on the tb_jmp_cache lookup so I don't
think my commenting of found means those htable looked up TBs never get
executed.

--
Alex Bennée
Emilio G. Cota June 7, 2017, 8:22 p.m. UTC | #5
On Wed, Jun 07, 2017 at 15:22:48 +0100, Alex Bennée wrote:
> 
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
> > Richard Henderson <rth@twiddle.net> writes:
> >
> >> From: "Emilio G. Cota" <cota@braap.org>
> >>
> >> Measurements:
> >>
> >> [Baseline performance is that before applying this and the previous
> >> commit]
> >
> > Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
> > masked by an unrelated assertion breakage which I had to fix. However
> > with this patch my boot hangs spinning all 4 threads. Once reverted
> > things work again.
> >
> > My command line:
> >
> >   timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on
> >
> > My tree with fix and revert:
> >
> >   https://github.com/stsquad/qemu/tree/debug/aarch64-hang
> >
> > I'm investigating now.
> 
> Well this seems to be a case of hangs with -smp > 1 (which I guess was
> obvious seeing as the TCG threads seem to be spinning against each
> other).

Not sure the problem is in MTTCG; I cannot reproduce with -smp > 1 and
thread=single; can you? [ I get no output whatsoever when trying to boot ]

Note that you'll need to revert bde4d9205ee to get thread=foo to
work again -- see [1].

Thanks,

		E.

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01980.html
Richard Henderson June 7, 2017, 8:38 p.m. UTC | #6
On 06/07/2017 07:11 AM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> From: "Emilio G. Cota" <cota@braap.org>
>>
>> Measurements:
>>
>> [Baseline performance is that before applying this and the previous
>> commit]
> 
> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
> masked by an unrelated assertion breakage which I had to fix. However
> with this patch my boot hangs spinning all 4 threads. Once reverted
> things work again.
> 
> My command line:
> 
>    timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on

If you ignore the timeout, does it complete eventually?

It may not be the same thing, but that pull left out the final patch for 
target/alpha that enabled goto_ptr.  But as I hadn't seen a similar problem 
with other guests, I was assuming that the problem was in hw/alpha/, or some 
other bit of the interrupt chain.


r~
Alex Bennée June 8, 2017, 8:38 a.m. UTC | #7
Richard Henderson <rth@twiddle.net> writes:

> On 06/07/2017 07:11 AM, Alex Bennée wrote:
>>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> From: "Emilio G. Cota" <cota@braap.org>
>>>
>>> Measurements:
>>>
>>> [Baseline performance is that before applying this and the previous
>>> commit]
>>
>> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
>> masked by an unrelated assertion breakage which I had to fix. However
>> with this patch my boot hangs spinning all 4 threads. Once reverted
>> things work again.
>>
>> My command line:
>>
>>    timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on
>
> If you ignore the timeout, does it complete eventually?

It seems to hang indefinitely. The difference is between no output and
the usual kernel output.

> It may not be the same thing, but that pull left out the final patch
> for target/alpha that enabled goto_ptr.  But as I hadn't seen a
> similar problem with other guests, I was assuming that the problem was
> in hw/alpha/, or some other bit of the interrupt chain.

Given output is being suppressed I wonder if there is some funkyness
with IRQs going on here?

>
>
> r~


--
Alex Bennée
Alex Bennée June 8, 2017, 10:48 a.m. UTC | #8
Emilio G. Cota <cota@braap.org> writes:

> On Wed, Jun 07, 2017 at 15:22:48 +0100, Alex Bennée wrote:
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>> > Richard Henderson <rth@twiddle.net> writes:
>> >
>> >> From: "Emilio G. Cota" <cota@braap.org>
>> >>
>> >> Measurements:
>> >>
>> >> [Baseline performance is that before applying this and the previous
>> >> commit]
>> >
>> > Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
>> > masked by an unrelated assertion breakage which I had to fix. However
>> > with this patch my boot hangs spinning all 4 threads. Once reverted
>> > things work again.
>> >
>> > My command line:
>> >
>> >   timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on
>> >
>> > My tree with fix and revert:
>> >
>> >   https://github.com/stsquad/qemu/tree/debug/aarch64-hang
>> >
>> > I'm investigating now.
>>
>> Well this seems to be a case of hangs with -smp > 1 (which I guess was
>> obvious seeing as the TCG threads seem to be spinning against each
>> other).

*sigh* OK I can replicate with thread=single so I guess that rules out
race conditions. So that leaves us with some TCG state that is
different when we chain compared to returning to the epilogue. It's odd
that it doesn't trigger for cache lookups though.

I tried this:

        tb = tb_htable_lookup(cpu, addr, cs_base, flags);
        if (likely(tb)) {
            atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb);
            if (cpu->icount_decr.u32) {
                goto found;
            }
        }

So we only chain htable looksups when an IRQ is happening and it gets
further but still hangs. It doesn't rule out IRQ handling but I can't
image a mechanism that breaks it.

Time to re-review the code I guess!

>
> Not sure the problem is in MTTCG; I cannot reproduce with -smp > 1 and
> thread=single; can you? [ I get no output whatsoever when trying to boot ]
>
> Note that you'll need to revert bde4d9205ee to get thread=foo to
> work again -- see [1].
>
> Thanks,
>
> 		E.
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01980.html


--
Alex Bennée

Patch
diff mbox

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index ab61d96..860e279 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -11367,8 +11367,7 @@  void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
             gen_a64_set_pc_im(dc->pc);
             /* fall through */
         case DISAS_JUMP:
-            /* indicate that the hash table must be used to find the next TB */
-            tcg_gen_exit_tb(0);
+            tcg_gen_lookup_and_goto_ptr(cpu_pc);
             break;
         case DISAS_TB_JUMP:
         case DISAS_EXC: