Message ID | 20170605165233.4135-23-rth@twiddle.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Richard Henderson <rth@twiddle.net> writes: > From: "Emilio G. Cota" <cota@braap.org> > > Measurements: > > [Baseline performance is that before applying this and the previous > commit] Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly masked by an unrelated assertion breakage which I had to fix. However with this patch my boot hangs spinning all 4 threads. Once reverted things work again. My command line: timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on My tree with fix and revert: https://github.com/stsquad/qemu/tree/debug/aarch64-hang I'm investigating now. -- Alex Bennée
Alex Bennée <alex.bennee@linaro.org> writes: > Richard Henderson <rth@twiddle.net> writes: > >> From: "Emilio G. Cota" <cota@braap.org> >> >> Measurements: >> >> [Baseline performance is that before applying this and the previous >> commit] > > Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly > masked by an unrelated assertion breakage which I had to fix. However > with this patch my boot hangs spinning all 4 threads. Once reverted > things work again. > > My command line: > > timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on > > My tree with fix and revert: > > https://github.com/stsquad/qemu/tree/debug/aarch64-hang > > I'm investigating now. Well this seems to be a case of hangs with -smp > 1 (which I guess was obvious seeing as the TCG threads seem to be spinning against each other). -- Alex Bennée
Alex Bennée <alex.bennee@linaro.org> writes: > Alex Bennée <alex.bennee@linaro.org> writes: > >> Richard Henderson <rth@twiddle.net> writes: >> >>> From: "Emilio G. Cota" <cota@braap.org> >>> >>> Measurements: >>> >>> [Baseline performance is that before applying this and the previous >>> commit] >> >> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly >> masked by an unrelated assertion breakage which I had to fix. However >> with this patch my boot hangs spinning all 4 threads. Once reverted >> things work again. >> >> My command line: >> >> timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on >> >> My tree with fix and revert: >> >> https://github.com/stsquad/qemu/tree/debug/aarch64-hang >> >> I'm investigating now. > > Well this seems to be a case of hangs with -smp > 1 (which I guess was > obvious seeing as the TCG threads seem to be spinning against each > other). So the minimum fix I've found to get things working again is: void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr) { CPUState *cpu = ENV_GET_CPU(env); TranslationBlock *tb; target_ulong cs_base, pc; uint32_t flags; tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]); if (likely(tb)) { cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); if (likely(tb->pc == addr && tb->cs_base == cs_base && tb->flags == flags)) { goto found; } tb = tb_htable_lookup(cpu, addr, cs_base, flags); if (likely(tb)) { atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb); /* goto found; */ } } return tcg_ctx.code_gen_epilogue; found: qemu_log_mask_and_addr(CPU_LOG_EXEC, addr, "Chain %p [%d: " TARGET_FMT_lx "] %s\n", tb->tc_ptr, cpu->cpu_index, addr, lookup_symbol(addr)); return tb->tc_ptr; } Which I find very odd. It would imply that tb_htable_lookup is giving us a bad result, except I would find it very unlikely what ever funky value we stored in the jmp cache would never get hit again. /me is very confused. -- Alex Bennée
Alex Bennée <alex.bennee@linaro.org> writes: > Alex Bennée <alex.bennee@linaro.org> writes: > >> Alex Bennée <alex.bennee@linaro.org> writes: >> >>> Richard Henderson <rth@twiddle.net> writes: >>> >>>> From: "Emilio G. Cota" <cota@braap.org> >>>> >>>> Measurements: >>>> >>>> [Baseline performance is that before applying this and the previous >>>> commit] >>> >>> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly >>> masked by an unrelated assertion breakage which I had to fix. However >>> with this patch my boot hangs spinning all 4 threads. Once reverted >>> things work again. >>> >>> My command line: >>> >>> timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on >>> >>> My tree with fix and revert: >>> >>> https://github.com/stsquad/qemu/tree/debug/aarch64-hang >>> >>> I'm investigating now. >> >> Well this seems to be a case of hangs with -smp > 1 (which I guess was >> obvious seeing as the TCG threads seem to be spinning against each >> other). > > So the minimum fix I've found to get things working again is: > > void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr) > { > CPUState *cpu = ENV_GET_CPU(env); > TranslationBlock *tb; > target_ulong cs_base, pc; > uint32_t flags; > > tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]); > if (likely(tb)) { > cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > if (likely(tb->pc == addr && tb->cs_base == cs_base && > tb->flags == flags)) { > goto found; > } > tb = tb_htable_lookup(cpu, addr, cs_base, flags); > if (likely(tb)) { > atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb); > /* goto found; */ > } > } > return tcg_ctx.code_gen_epilogue; > found: > qemu_log_mask_and_addr(CPU_LOG_EXEC, addr, > "Chain %p [%d: " TARGET_FMT_lx "] %s\n", > tb->tc_ptr, cpu->cpu_index, addr, > lookup_symbol(addr)); > return tb->tc_ptr; > } > > Which I find very odd. It would imply that tb_htable_lookup is giving us > a bad result, except I would find it very unlikely what ever funky value > we stored in the jmp cache would never get hit again. > > /me is very confused. OK I think we need to think about when we can run code, c.f. comment in tb_gen_code: /* As long as consistency of the TB stuff is provided by tb_lock in user * mode and is implicit in single-threaded softmmu emulation, no explicit * memory barrier is required before tb_link_page() makes the TB visible * through the physical hash table and physical page list. */ tb_link_page(tb, phys_pc, phys_page2); As tb_link_page does the qht_insert I think what is happening is we are attempting to run code that has just been generated but is not completely visible to other cores yet. By commenting out the goto found we slip it enough that next time around it is complete. However I'm still confused because qht_insert and qht_lookup have implied barriers in them that should mean we don't pull a partially complete TB out of the hash. I did add some debug that saved the tb_htable_lookup's in an cpu indexed array and they certainly get hits on the tb_jmp_cache lookup so I don't think my commenting of found means those htable looked up TBs never get executed. -- Alex Bennée
On Wed, Jun 07, 2017 at 15:22:48 +0100, Alex Bennée wrote: > > Alex Bennée <alex.bennee@linaro.org> writes: > > > Richard Henderson <rth@twiddle.net> writes: > > > >> From: "Emilio G. Cota" <cota@braap.org> > >> > >> Measurements: > >> > >> [Baseline performance is that before applying this and the previous > >> commit] > > > > Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly > > masked by an unrelated assertion breakage which I had to fix. However > > with this patch my boot hangs spinning all 4 threads. Once reverted > > things work again. > > > > My command line: > > > > timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on > > > > My tree with fix and revert: > > > > https://github.com/stsquad/qemu/tree/debug/aarch64-hang > > > > I'm investigating now. > > Well this seems to be a case of hangs with -smp > 1 (which I guess was > obvious seeing as the TCG threads seem to be spinning against each > other). Not sure the problem is in MTTCG; I cannot reproduce with -smp > 1 and thread=single; can you? [ I get no output whatsoever when trying to boot ] Note that you'll need to revert bde4d9205ee to get thread=foo to work again -- see [1]. Thanks, E. [1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01980.html
On 06/07/2017 07:11 AM, Alex Bennée wrote: > > Richard Henderson <rth@twiddle.net> writes: > >> From: "Emilio G. Cota" <cota@braap.org> >> >> Measurements: >> >> [Baseline performance is that before applying this and the previous >> commit] > > Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly > masked by an unrelated assertion breakage which I had to fix. However > with this patch my boot hangs spinning all 4 threads. Once reverted > things work again. > > My command line: > > timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on If you ignore the timeout, does it complete eventually? It may not be the same thing, but that pull left out the final patch for target/alpha that enabled goto_ptr. But as I hadn't seen a similar problem with other guests, I was assuming that the problem was in hw/alpha/, or some other bit of the interrupt chain. r~
Richard Henderson <rth@twiddle.net> writes: > On 06/07/2017 07:11 AM, Alex Bennée wrote: >> >> Richard Henderson <rth@twiddle.net> writes: >> >>> From: "Emilio G. Cota" <cota@braap.org> >>> >>> Measurements: >>> >>> [Baseline performance is that before applying this and the previous >>> commit] >> >> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly >> masked by an unrelated assertion breakage which I had to fix. However >> with this patch my boot hangs spinning all 4 threads. Once reverted >> things work again. >> >> My command line: >> >> timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on > > If you ignore the timeout, does it complete eventually? It seems to hang indefinitely. The difference is between no output and the usual kernel output. > It may not be the same thing, but that pull left out the final patch > for target/alpha that enabled goto_ptr. But as I hadn't seen a > similar problem with other guests, I was assuming that the problem was > in hw/alpha/, or some other bit of the interrupt chain. Given output is being suppressed I wonder if there is some funkyness with IRQs going on here? > > > r~ -- Alex Bennée
Emilio G. Cota <cota@braap.org> writes: > On Wed, Jun 07, 2017 at 15:22:48 +0100, Alex Bennée wrote: >> >> Alex Bennée <alex.bennee@linaro.org> writes: >> >> > Richard Henderson <rth@twiddle.net> writes: >> > >> >> From: "Emilio G. Cota" <cota@braap.org> >> >> >> >> Measurements: >> >> >> >> [Baseline performance is that before applying this and the previous >> >> commit] >> > >> > Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly >> > masked by an unrelated assertion breakage which I had to fix. However >> > with this patch my boot hangs spinning all 4 threads. Once reverted >> > things work again. >> > >> > My command line: >> > >> > timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64 -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio -netdev user,id=unet -device virtio-net-device,netdev=unet -drive file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark.service" -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4 -machine gic-version=3 -machine virtualization=true -name debug-threads=on >> > >> > My tree with fix and revert: >> > >> > https://github.com/stsquad/qemu/tree/debug/aarch64-hang >> > >> > I'm investigating now. >> >> Well this seems to be a case of hangs with -smp > 1 (which I guess was >> obvious seeing as the TCG threads seem to be spinning against each >> other). *sigh* OK I can replicate with thread=single so I guess that rules out race conditions. So that leaves us with some TCG state that is different when we chain compared to returning to the epilogue. It's odd that it doesn't trigger for cache lookups though. I tried this: tb = tb_htable_lookup(cpu, addr, cs_base, flags); if (likely(tb)) { atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb); if (cpu->icount_decr.u32) { goto found; } } So we only chain htable looksups when an IRQ is happening and it gets further but still hangs. It doesn't rule out IRQ handling but I can't image a mechanism that breaks it. Time to re-review the code I guess! > > Not sure the problem is in MTTCG; I cannot reproduce with -smp > 1 and > thread=single; can you? [ I get no output whatsoever when trying to boot ] > > Note that you'll need to revert bde4d9205ee to get thread=foo to > work again -- see [1]. > > Thanks, > > E. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01980.html -- Alex Bennée
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index ab61d96..860e279 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -11367,8 +11367,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb) gen_a64_set_pc_im(dc->pc); /* fall through */ case DISAS_JUMP: - /* indicate that the hash table must be used to find the next TB */ - tcg_gen_exit_tb(0); + tcg_gen_lookup_and_goto_ptr(cpu_pc); break; case DISAS_TB_JUMP: case DISAS_EXC: