diff mbox

tcg/tci: do not use ldst label (never implemented)

Message ID 20170911022839.23231-1-f4bug@amsat.org (mailing list archive)
State New, archived
Headers show

Commit Message

Philippe Mathieu-Daudé Sept. 11, 2017, 2:28 a.m. UTC
changed in 659ef5cbb893, this fixes building with --enable-tcg-interpreter:

/home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
 static bool tcg_out_ldst_finalize(TCGContext *s);
              ^

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
see https://travis-ci.org/qemu/qemu/jobs/273351287

 tcg/tci/tcg-target.h | 4 ----
 1 file changed, 4 deletions(-)

Comments

Stefan Weil Sept. 11, 2017, 5:18 a.m. UTC | #1
Am 11.09.2017 um 04:28 schrieb Philippe Mathieu-Daudé:
> changed in 659ef5cbb893, this fixes building with --enable-tcg-interpreter:
> 
> /home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
>  static bool tcg_out_ldst_finalize(TCGContext *s);
>               ^
> 
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> see https://travis-ci.org/qemu/qemu/jobs/273351287
> 
>  tcg/tci/tcg-target.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 5d692e1f4b..26140d78cb 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -206,8 +206,4 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
>      /* no need to flush icache explicitly */
>  }
>  
> -#ifdef CONFIG_SOFTMMU
> -#define TCG_TARGET_NEED_LDST_LABELS
> -#endif
> -
>  #endif /* TCG_TARGET_H */
> 

Reviewed-by: Stefan Weil <sw@weilnetz.de>

Thank you.

Stefan
Richard Henderson Sept. 11, 2017, 6:01 p.m. UTC | #2
On 09/10/2017 07:28 PM, Philippe Mathieu-Daudé wrote:
> changed in 659ef5cbb893, this fixes building with --enable-tcg-interpreter:
> 
> /home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
>  static bool tcg_out_ldst_finalize(TCGContext *s);
>               ^
> 
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

Oops.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Philippe Mathieu-Daudé Sept. 11, 2017, 6:13 p.m. UTC | #3
On 09/10/2017 11:28 PM, Philippe Mathieu-Daudé wrote:
> changed in 659ef5cbb893, this fixes building with --enable-tcg-interpreter:
> 
> /home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
>   static bool tcg_out_ldst_finalize(TCGContext *s);
>                ^
> 
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

I'm not sure I can sign for Jincheng, but since his patch has the same 
content you could add:

Signed-off-by: Jincheng Miao <jincheng.miao@gmail.com>

(https://patchwork.kernel.org/patch/9947709/)

> ---
> see https://travis-ci.org/qemu/qemu/jobs/273351287
> 
>   tcg/tci/tcg-target.h | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 5d692e1f4b..26140d78cb 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -206,8 +206,4 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
>       /* no need to flush icache explicitly */
>   }
>   
> -#ifdef CONFIG_SOFTMMU
> -#define TCG_TARGET_NEED_LDST_LABELS
> -#endif
> -
>   #endif /* TCG_TARGET_H */
>
Peter Maydell Sept. 11, 2017, 6:14 p.m. UTC | #4
On 11 September 2017 at 19:13, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> On 09/10/2017 11:28 PM, Philippe Mathieu-Daudé wrote:
>>
>> changed in 659ef5cbb893, this fixes building with
>> --enable-tcg-interpreter:
>>
>> /home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error:
>> ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
>>   static bool tcg_out_ldst_finalize(TCGContext *s);
>>                ^
>>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
>
> I'm not sure I can sign for Jincheng, but since his patch has the same
> content you could add:
>
> Signed-off-by: Jincheng Miao <jincheng.miao@gmail.com>

In that sort of case we generally just say "first person's
patch won". Extra signed-off-by tags are for where there's
really multiple peoples' work in the patch.

thanks
-- PMM
Peter Maydell Sept. 11, 2017, 6:24 p.m. UTC | #5
On 11 September 2017 at 19:01, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 09/10/2017 07:28 PM, Philippe Mathieu-Daudé wrote:
>> changed in 659ef5cbb893, this fixes building with --enable-tcg-interpreter:
>>
>> /home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
>>  static bool tcg_out_ldst_finalize(TCGContext *s);
>>               ^
>>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
> Oops.
>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Applied to master as a buildfix, thanks.

I've also turned on a tci compile check on my pre-merge tests.
(It doesn't pass "make check" for me, though...)

thanks
-- PMM
Stefan Weil Sept. 11, 2017, 7:39 p.m. UTC | #6
Am 11.09.2017 um 20:24 schrieb Peter Maydell:
> On 11 September 2017 at 19:01, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> On 09/10/2017 07:28 PM, Philippe Mathieu-Daudé wrote:
>>> changed in 659ef5cbb893, this fixes building with --enable-tcg-interpreter:
>>>
>>> /home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
>>>  static bool tcg_out_ldst_finalize(TCGContext *s);
>>>               ^
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>
>> Oops.
>>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> Applied to master as a buildfix, thanks.

Thank you, Peter.

> I've also turned on a tci compile check on my pre-merge tests.
> (It doesn't pass "make check" for me, though...)

I just tested it myself. "make check" passed, but printed some
KVM related messages which could be suppressed by fixing the
test code:

[...]
  GTESTER tests/test-qht-par
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
[...]
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
make: Verzeichnis „/qemu/bin/“ wird verlassen

What failures do you get?

Regards,
Stefan
Philippe Mathieu-Daudé Sept. 11, 2017, 7:47 p.m. UTC | #7
Hi Peter, Stefan,

> Am 11.09.2017 um 20:24 schrieb Peter Maydell:
>> I've also turned on a tci compile check on my pre-merge tests.
>> (It doesn't pass "make check" for me, though...)

Peter you might want to restrict it to X86...

--target-list=subdir-i386-softmmu,x86_64-softmmu,x86_64-linux-user

> 
> I just tested it myself. "make check" passed, but printed some
> KVM related messages which could be suppressed by fixing the
> test code:
> 
> [...]
>    GTESTER tests/test-qht-par
> Could not access KVM kernel module: No such file or directory
> qemu-system-x86_64: failed to initialize KVM: No such file or directory
> qemu-system-x86_64: Back to tcg accelerator
> Could not access KVM kernel module: No such file or directory
> qemu-system-x86_64: failed to initialize KVM: No such file or directory
> [...]
> Could not access KVM kernel module: No such file or directory
> qemu-system-x86_64: failed to initialize KVM: No such file or directory
> qemu-system-x86_64: Back to tcg accelerator
> make: Verzeichnis „/qemu/bin/“ wird verlassen

Stefan are you testing no-X86 targets?

Regards,

Phil.
Stefan Weil Sept. 11, 2017, 7:54 p.m. UTC | #8
Am 11.09.2017 um 21:47 schrieb Philippe Mathieu-Daudé:
[...]> Stefan are you testing no-X86 targets?

I only tested x86_64:

./configure --enable-debug --enable-tcg-interpreter \
  --target-list=x86_64-linux-user,x86_64-softmmu

If other targets fail, I'll have a look to find the cause.

Stefan
Stefan Weil Sept. 11, 2017, 8:26 p.m. UTC | #9
Am 11.09.2017 um 20:24 schrieb Peter Maydell:
> I've also turned on a tci compile check on my pre-merge tests.
> (It doesn't pass "make check" for me, though...)

Did you run "make" before running "make check"?

I had an assertion because of a missing qemu-ga
when I‌ only had run "make check", so there is a
dependency (perhaps check: tools) missing.

Now "make check" is running fine with all targets
(at least until check-qtest-ppc64). Sorry, I have
to finish for today.

Stefan
Philippe Mathieu-Daudé Sept. 11, 2017, 9:06 p.m. UTC | #10
Hi Stefan,

On 09/11/2017 05:26 PM, Stefan Weil wrote:> I had an assertion because 
of a missing qemu-ga
> when I‌ only had run "make check", so there is a
> dependency (perhaps check: tools) missing.
> 
> Now "make check" is running fine with all targets
> (at least until check-qtest-ppc64). Sorry, I have
> to finish for today.

I had this patch in my travis-ci queue, here extracted alone:
https://patchwork.kernel.org/patch/9948117/

Regards,

Phil.
Stefan Weil Sept. 12, 2017, 8:52 a.m. UTC | #11
Am 11.09.2017 um 20:24 schrieb Peter Maydell:
> I've also turned on a tci compile check on my pre-merge tests.
> (It doesn't pass "make check" for me, though...) thanks -- PMM

"make check-qtest-ppc64" fails for me, too.

Thomas, this seems to be again the well known timing problem
in tests/prom-env-test.c. The time for the test had been
changedfrom 30 s to 10 s to 120 s in the past.

For TCI, even that latest value isnot sufficient when
testing with pseries. Of course that also depends on other
parameters (speed of test machine, compilerflags).

In my test pseries took nearly 5 minutes, so the test passes
when the loop upper limit is increased to 30000.

Is there a better way to handle this test? Why does pseries
still need much more time than the other machines
(not only with TCI)?

Regards,
Stefan
Stefan Weil Sept. 12, 2017, 9:20 a.m. UTC | #12
Am 12.09.2017 um 10:52 schrieb Stefan Weil:
> Am 11.09.2017 um 20:24 schrieb Peter Maydell:
>> I've also turned on a tci compile check on my pre-merge tests.
>> (It doesn't pass "make check" for me, though...) thanks -- PMM
> 
> "make check-qtest-ppc64" fails for me, too.
> 
> Thomas, this seems to be again the well known timing problem
> in tests/prom-env-test.c. The time for the test had been
> changedfrom 30 s to 10 s to 120 s in the past.

... changed from 10 s to 30 s to 120 s ...

> For TCI, even that latest value is not sufficient when
> testing with pseries. Of course that also depends on other
> parameters (speed of test machine, compiler flags).
> 
> In my test pseries took nearly 5 minutes, so the test passes
> when the loop upper limit is increased to 30000.

Timing data for prom-env-test with TCI on another test machine:

mac99:   78 s
g3beige: 74 s
pseries: 477 s

> 
> Is there a better way to handle this test? Why does pseries
> still need much more time than the other machines
> (not only with TCI)?
> 
> Regards,
> Stefan
Thomas Huth Sept. 12, 2017, 2:56 p.m. UTC | #13
On 12.09.2017 11:20, Stefan Weil wrote:
> Am 12.09.2017 um 10:52 schrieb Stefan Weil:
>> Am 11.09.2017 um 20:24 schrieb Peter Maydell:
>>> I've also turned on a tci compile check on my pre-merge tests.
>>> (It doesn't pass "make check" for me, though...) thanks -- PMM
>>
>> "make check-qtest-ppc64" fails for me, too.
>>
>> Thomas, this seems to be again the well known timing problem
>> in tests/prom-env-test.c. The time for the test had been
>> changedfrom 30 s to 10 s to 120 s in the past.
> 
> ... changed from 10 s to 30 s to 120 s ...
> 
>> For TCI, even that latest value is not sufficient when
>> testing with pseries. Of course that also depends on other
>> parameters (speed of test machine, compiler flags).
>>
>> In my test pseries took nearly 5 minutes, so the test passes
>> when the loop upper limit is increased to 30000.
> 
> Timing data for prom-env-test with TCI on another test machine:
> 
> mac99:   78 s
> g3beige: 74 s
> pseries: 477 s

How fast is your host machine? For me the whole prom-env-test finishes
within 52 seconds (my host machine has 3.2 GHz) in TCI mode, and there
are no errors reported during "make check-qtest-ppc64".

Did you compile your QEMU with --enable-debug by accident? I think that
could explain the bad performance here - TCI with --enable-debug is not
just slow, but rather unusable slow already...

>> Is there a better way to handle this test? Why does pseries
>> still need much more time than the other machines
>> (not only with TCI)?

The problem is that the SLOF firmware just performs very badly with TCG
(it's fine on real hardware). It executes a lot of Forth code, and the
Forth interpreter uses things like computed gotos or other tricks that
basically prevent proper JIT operation here. I've done quite a bit of
optimizations in SLOF in the past already, but I've got hardly any ideas
left how to fix that further.

So I hope the problem is just the "--enable-debug" here and we could run
the test with TCI in normal builds? I'm also fine if we increase the
timeout to 5 minutes instead - it should not affect the normal users
(i.e. those who don't use TCI) and ease this situation with TCI a little
bit.

 Thomas
Paolo Bonzini Sept. 12, 2017, 3:13 p.m. UTC | #14
On 12/09/2017 16:56, Thomas Huth wrote:
> The problem is that the SLOF firmware just performs very badly with TCG
> (it's fine on real hardware). It executes a lot of Forth code, and the
> Forth interpreter uses things like computed gotos or other tricks that
> basically prevent proper JIT operation here. I've done quite a bit of
> optimizations in SLOF in the past already, but I've got hardly any ideas
> left how to fix that further.

Two ideas for QEMU based on a quick "perf record" test:

- 25% of the time is spent in cpu_exec.  PPC doesn't use
tcg_gen_lookup_and_goto_ptr.  The main thing to be careful about is
that, whenever an interrupt is pending (e.g. after enabling them) you
need to force an exit to the loop.  See for example commits b29fd33db5
("target/arm: use DISAS_EXIT for eret handling", 2017-07-17) and
b74cddcbf6 ("target/mips: Use BS_EXCP where interrupts are expected",
2017-08-02).  On PPC this mostly means SPRs and env->msr writes.  Apart
from this, however, it shouldn't be hard to do.

- 8% of the time is spend in cpu_exec's call to
object_class_dynamic_cast_assert aka this line

    CPUClass *cc = CPU_GET_CLASS(cpu);

This maybe could avoid the dynamic cast.  But it's also possible that
fixing the first gets rid of this one too.

Thanks,

Paolo
Thomas Huth Sept. 21, 2017, 9:59 a.m. UTC | #15
On 12.09.2017 17:13, Paolo Bonzini wrote:
> On 12/09/2017 16:56, Thomas Huth wrote:
>> The problem is that the SLOF firmware just performs very badly with TCG
>> (it's fine on real hardware). It executes a lot of Forth code, and the
>> Forth interpreter uses things like computed gotos or other tricks that
>> basically prevent proper JIT operation here. I've done quite a bit of
>> optimizations in SLOF in the past already, but I've got hardly any ideas
>> left how to fix that further.
> 
> Two ideas for QEMU based on a quick "perf record" test:
> 
> - 25% of the time is spent in cpu_exec.  PPC doesn't use
> tcg_gen_lookup_and_goto_ptr.

I just realized that Richard recently already posted a patch for this:

https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg07124.html

I've applied it locally, and indeed, it speeds up a simple test with
-prom-env by factor two. Before the change:

$ time ppc64-softmmu/qemu-system-ppc64 -nographic -vga none -prom-env
'use-nvramrc?=true' -prom-env 'nvramrc=power-off'
[...]
real	0m28.784s
user	0m28.700s
sys	0m0.031s

After the change:

$ time ppc64-softmmu/qemu-system-ppc64 -nographic -vga none -prom-env
'use-nvramrc?=true' -prom-env 'nvramrc=power-off'
[...]
real	0m13.953s
user	0m13.904s
sys	0m0.046s

That's impressive! Richard, may I ask what's the current state of this?
Do you plan to merge this soon, or are there still issues (like the ones
that Paolo mentioned)?

However, I only see that speed-up with the normal x86 backend. I've also
tried it with TCI, but I hardly saw any improvements there ... is there
still something missing in the TCI backend that is required for the
tcg_gen_lookup_and_goto_ptr feature?

 Thomas
Richard Henderson Sept. 22, 2017, 1:31 p.m. UTC | #16
On 09/21/2017 04:59 AM, Thomas Huth wrote:
> $ time ppc64-softmmu/qemu-system-ppc64 -nographic -vga none -prom-env
> 'use-nvramrc?=true' -prom-env 'nvramrc=power-off'
> [...]
> real	0m13.953s
> user	0m13.904s
> sys	0m0.046s
> 
> That's impressive! Richard, may I ask what's the current state of this?
> Do you plan to merge this soon, or are there still issues (like the ones
> that Paolo mentioned)?

I had been assuming that it would go in via the normal ppc tree.
If I should just merge it with tcg-next, I can do so...

> However, I only see that speed-up with the normal x86 backend. I've also
> tried it with TCI, but I hardly saw any improvements there ... is there
> still something missing in the TCI backend that is required for the
> tcg_gen_lookup_and_goto_ptr feature?

Yes.

r~
diff mbox

Patch

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 5d692e1f4b..26140d78cb 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -206,8 +206,4 @@  static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
     /* no need to flush icache explicitly */
 }
 
-#ifdef CONFIG_SOFTMMU
-#define TCG_TARGET_NEED_LDST_LABELS
-#endif
-
 #endif /* TCG_TARGET_H */