Message ID | ZMakYpOgco2Ihg0G@p100 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb() | expand |
On 7/30/23 20:02, Richard Henderson wrote: > On 7/30/23 11:01, Richard Henderson wrote: >> On 7/30/23 10:56, Helge Deller wrote: >>> +++ b/accel/tcg/translator.c >>> @@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr dest) >>> return false; >>> } >>> >>> +#ifndef CONFIG_USER_ONLY >>> /* Check for the dest on the same page as the start of the TB. */ >>> return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0; >>> +#else >>> + /* linux-user doesn't need to fear pagefaults for exec swap-in */ >>> + return false; >>> +#endif >>> } >> >> No, we still need to stop at pages for breakpoints. > ... and munmap for e.g. dlclose. Ok, so we can't optimize for linux-user. But what about my other question: Shouldn't it be return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) != 0; ? helge
On 7/30/23 10:56, Helge Deller wrote: > I'm quite unclear about translator_use_goto_tb() for qemu-user > emulation....(and in general). > > Based on the function name, the function translator_use_goto_tb() shall > help to decide if a program should use goto_tb() and exit_tb() to jump > to the next instruction. > > Currently, if the destination is on the same page, it returns true. > I wonder, if it shouldn't return false in this case instead, because > arches have code like this: (taken from target/hppa/translate.c): > if (... && translator_use_goto_tb(ctx, f)) { > tcg_gen_goto_tb(which); > tcg_gen_movi_reg(cpu_iaoq_f, f); > tcg_gen_movi_reg(cpu_iaoq_b, b); > tcg_gen_exit_tb(ctx->base.tb, which); > } else { > copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b); > copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var); > tcg_gen_lookup_and_goto_ptr(); > } > > Shouldn't, if the destination is on the same page, the (faster?) > path with tcg_gen_lookup_and_goto_ptr() be taken instead? No, because tcg_gen_lookup_and_goto_ptr is not the faster path. That always involves a lookup, then an indirect branch. The goto_tb path is linked, so only requires a lookup once, and the branch may be direct (depending on the host architecture). r~
On 7/30/23 22:03, Richard Henderson wrote: > On 7/30/23 10:56, Helge Deller wrote: >> I'm quite unclear about translator_use_goto_tb() for qemu-user >> emulation....(and in general). >> >> Based on the function name, the function translator_use_goto_tb() shall >> help to decide if a program should use goto_tb() and exit_tb() to jump >> to the next instruction. >> >> Currently, if the destination is on the same page, it returns true. >> I wonder, if it shouldn't return false in this case instead, because >> arches have code like this: (taken from target/hppa/translate.c): >> if (... && translator_use_goto_tb(ctx, f)) { >> tcg_gen_goto_tb(which); >> tcg_gen_movi_reg(cpu_iaoq_f, f); >> tcg_gen_movi_reg(cpu_iaoq_b, b); >> tcg_gen_exit_tb(ctx->base.tb, which); >> } else { >> copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b); >> copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var); >> tcg_gen_lookup_and_goto_ptr(); >> } >> >> Shouldn't, if the destination is on the same page, the (faster?) >> path with tcg_gen_lookup_and_goto_ptr() be taken instead? > > No, because tcg_gen_lookup_and_goto_ptr is not the faster path. > That always involves a lookup, then an indirect branch. Ah, ok. So my assumption was wrong, and this explains it. > The goto_tb path is linked, so only requires a lookup once, and the > branch may be direct (depending on the host architecture). Probably the last question in this regard: This code: IN: 0x00010c98: cmpib,<>,n 0,r19,0x10c98 generates "nop/jmp" in the code: the tcg_gen_goto_tb() branch: OUT: 0x7fd7e400070e: 85 db testl %ebx, %ebx 0x7fd7e4000710: 0f 85 20 00 00 00 jne 0x7fd7e4000736 0x7fd7e4000716: 90 nop <- from "tcg_gen_op1i(INDEX_op_goto_tb, idx)" in tcg_gen_goto_tb() 0x7fd7e4000717: e9 00 00 00 00 jmp 0x7fd7e400071c <- jump is effective useless. 0x7fd7e400071c: c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp) 0x7fd7e4000723: c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp) 0x7fd7e400072a: 48 8d 05 0f ff ff ff leaq -0xf1(%rip), %rax 0x7fd7e4000731: e9 e2 f8 ff ff jmp 0x7fd7e4000018 0x7fd7e4000736: 90 nop <- here too. 0x7fd7e4000737: e9 00 00 00 00 jmp 0x7fd7e400073c 0x7fd7e400073c: c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp) 0x7fd7e4000743: c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp) 0x7fd7e400074a: 48 8d 05 f0 fe ff ff leaq -0x110(%rip), %rax 0x7fd7e4000751: e9 c2 f8 ff ff jmp 0x7fd7e4000018 I assume those nops/jmp+0 is to be able to insert breakpoints? But those nops/jmps are never in the tcg_gen_lookup_and_goto_ptr() branch, probably because breakpoints are checked in HELPER(lookup_tb_ptr), right? 0x7ff47c0006d0: 0f 85 18 00 00 00 jne 0x7ff47c0006ee 0x7ff47c0006d6: c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp) 0x7ff47c0006dd: c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp) 0x7ff47c0006e4: 48 8b fd movq %rbp, %rdi 0x7ff47c0006e7: e8 34 bf 42 0f callq 0x7ff48b42c620 0x7ff47c0006ec: ff e0 jmpq *%rax 0x7ff47c0006ee: c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp) 0x7ff47c0006f5: c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp) 0x7ff47c0006fc: 48 8b fd movq %rbp, %rdi 0x7ff47c0006ff: e8 1c bf 42 0f callq 0x7ff48b42c620 0x7ff47c000704: ff e0 jmpq *%rax Question: Couldn't those nops be avoided in the tcg_gen_goto_tb() branch too, e.g. if breakpoints are checked when returning through the prologue exit path? Thanks! Helge
On 7/30/23 13:37, Helge Deller wrote: > On 7/30/23 22:03, Richard Henderson wrote: >> On 7/30/23 10:56, Helge Deller wrote: >>> I'm quite unclear about translator_use_goto_tb() for qemu-user >>> emulation....(and in general). >>> >>> Based on the function name, the function translator_use_goto_tb() shall >>> help to decide if a program should use goto_tb() and exit_tb() to jump >>> to the next instruction. >>> >>> Currently, if the destination is on the same page, it returns true. >>> I wonder, if it shouldn't return false in this case instead, because >>> arches have code like this: (taken from target/hppa/translate.c): >>> if (... && translator_use_goto_tb(ctx, f)) { >>> tcg_gen_goto_tb(which); >>> tcg_gen_movi_reg(cpu_iaoq_f, f); >>> tcg_gen_movi_reg(cpu_iaoq_b, b); >>> tcg_gen_exit_tb(ctx->base.tb, which); >>> } else { >>> copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b); >>> copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var); >>> tcg_gen_lookup_and_goto_ptr(); >>> } >>> >>> Shouldn't, if the destination is on the same page, the (faster?) >>> path with tcg_gen_lookup_and_goto_ptr() be taken instead? >> >> No, because tcg_gen_lookup_and_goto_ptr is not the faster path. >> That always involves a lookup, then an indirect branch. > > Ah, ok. So my assumption was wrong, and this explains it. > >> The goto_tb path is linked, so only requires a lookup once, and the >> branch may be direct (depending on the host architecture). > Probably the last question in this regard: > > This code: > IN: > 0x00010c98: cmpib,<>,n 0,r19,0x10c98 > > generates "nop/jmp" in the code: > > the tcg_gen_goto_tb() branch: > OUT: > 0x7fd7e400070e: 85 db testl %ebx, %ebx > 0x7fd7e4000710: 0f 85 20 00 00 00 jne 0x7fd7e4000736 > 0x7fd7e4000716: 90 nop <- from > "tcg_gen_op1i(INDEX_op_goto_tb, idx)" in tcg_gen_goto_tb() > 0x7fd7e4000717: e9 00 00 00 00 jmp 0x7fd7e400071c <- jump is effective > useless. > 0x7fd7e400071c: c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp) > 0x7fd7e4000723: c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp) > 0x7fd7e400072a: 48 8d 05 0f ff ff ff leaq -0xf1(%rip), %rax > 0x7fd7e4000731: e9 e2 f8 ff ff jmp 0x7fd7e4000018 > 0x7fd7e4000736: 90 nop <- here too. > 0x7fd7e4000737: e9 00 00 00 00 jmp 0x7fd7e400073c > 0x7fd7e400073c: c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp) > 0x7fd7e4000743: c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp) > 0x7fd7e400074a: 48 8d 05 f0 fe ff ff leaq -0x110(%rip), %rax > 0x7fd7e4000751: e9 c2 f8 ff ff jmp 0x7fd7e4000018 > > I assume those nops/jmp+0 is to be able to insert breakpoints? No. The destination of the jmp is patched by tb_target_set_jmp_target, which happens some time after this disassembly. The nop is present to ensure that the patch point is aligned, so that it is one 4-byte atomic store. r~
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c index 1a6a5448c8..07224a7f83 100644 --- a/accel/tcg/translator.c +++ b/accel/tcg/translator.c @@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr dest) return false; } +#ifndef CONFIG_USER_ONLY /* Check for the dest on the same page as the start of the TB. */ return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0; +#else + /* linux-user doesn't need to fear pagefaults for exec swap-in */ + return false; +#endif } void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
I'm quite unclear about translator_use_goto_tb() for qemu-user emulation....(and in general). Based on the function name, the function translator_use_goto_tb() shall help to decide if a program should use goto_tb() and exit_tb() to jump to the next instruction. Currently, if the destination is on the same page, it returns true. I wonder, if it shouldn't return false in this case instead, because arches have code like this: (taken from target/hppa/translate.c): if (... && translator_use_goto_tb(ctx, f)) { tcg_gen_goto_tb(which); tcg_gen_movi_reg(cpu_iaoq_f, f); tcg_gen_movi_reg(cpu_iaoq_b, b); tcg_gen_exit_tb(ctx->base.tb, which); } else { copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b); copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var); tcg_gen_lookup_and_goto_ptr(); } Shouldn't, if the destination is on the same page, the (faster?) path with tcg_gen_lookup_and_goto_ptr() be taken instead? Now, for user-mode emulation page faults can't happen at all. Shouldn't in this case the tcg_gen_lookup_and_goto_ptr() path been taken unconditionally, as shown in the patch below? Signed-off-by: Helge Deller <deller@gmx.de>