diff mbox series

configure: Automatically fall back to TCI on non-release architectures

Message ID 20190404185730.GA22512@ls3530.dellerweb.de (mailing list archive)
State New, archived
Headers show
Series configure: Automatically fall back to TCI on non-release architectures | expand

Commit Message

Helge Deller April 4, 2019, 6:57 p.m. UTC
If a non-release architecture is found, and it's known that there is no
native TCG support for that CPU, automatically fall back to the TCI
implementation instead of requesting the user to run configure again
with the --enable-tcg-interpreter option.

This change simplifies building qemu in automatic build environments
(like in my case the debian buildds) because one does not need to
special case on the architectures.

Signed-off-by: Helge Deller <deller@gmx.de>

Comments

Philippe Mathieu-Daudé April 4, 2019, 7:24 p.m. UTC | #1
Hi Helge,

On 4/4/19 8:57 PM, Helge Deller wrote:
> If a non-release architecture is found, and it's known that there is no
> native TCG support for that CPU, automatically fall back to the TCI
> implementation instead of requesting the user to run configure again
> with the --enable-tcg-interpreter option.
> 
> This change simplifies building qemu in automatic build environments
> (like in my case the debian buildds) because one does not need to
> special case on the architectures.
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> 
> diff --git a/configure b/configure
> index 1c563a7027..8fe4fc84d8 100755
> --- a/configure
> +++ b/configure
> @@ -769,8 +769,10 @@ case "$cpu" in
>      cross_cc_sparc=$host_cc
>    ;;
>    *)
> -    # This will result in either an error or falling back to TCI later
> +    # Fall back to TCI on unsupported CPUs
>      ARCH=unknown
> +    echo "Unsupported '$cpu' CPU found. Will use TCG with TCI (experimental)."
> +    tcg_interpreter="yes"

I'm not comfortable with this change... I'd prefer to keep the explicit
flag. Is your debian buildd targetting mainstream QEMU or the Debian
package? Can't you manage this as a Debian specific patch?

Else, why not add a "hppa" case selecting TCI in the switch? This would
be less invasive.

>    ;;
>  esac
>  if test -z "$ARCH"; then
> @@ -1855,16 +1857,6 @@ if ! compile_prog ; then
>      error_exit "\"$cc\" cannot build an executable (is your linker broken?)"
>  fi
> 
> -# Now we have handled --enable-tcg-interpreter and know we're not just
> -# printing the help message, bail out if the host CPU isn't supported.
> -if test "$ARCH" = "unknown"; then
> -    if test "$tcg_interpreter" = "yes" ; then
> -        echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
> -    else
> -        error_exit "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
> -    fi
> -fi
> -
>  # Consult white-list to determine whether to enable werror
>  # by default.  Only enable by default for git builds
>  if test -z "$werror" ; then
>
Peter Maydell April 5, 2019, 1:34 a.m. UTC | #2
On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
> If a non-release architecture is found, and it's known that there is no
> native TCG support for that CPU, automatically fall back to the TCI
> implementation instead of requesting the user to run configure again
> with the --enable-tcg-interpreter option.
>
> This change simplifies building qemu in automatic build environments
> (like in my case the debian buildds) because one does not need to
> special case on the architectures.

I don't think we should do this. TCI is unmaintained, has several
known flaws, does not provide the level of performance that
people expect from QEMU, and we've talked about removing it
altogether. In particular, distros should not automatically ship
a TCI QEMU -- it's something that a user can use if they
explicitly opt into but which I don't think we want to surprise
anybody with.

If we care about a host architecture we should support it
with a proper TCG backend. If we don't care that much we
shouldn't support it.

thanks
-- PMM
Helge Deller April 5, 2019, 7:14 a.m. UTC | #3
Hi Philippe,

On 04.04.19 21:24, Philippe Mathieu-Daudé wrote:
> Hi Helge,
>
> On 4/4/19 8:57 PM, Helge Deller wrote:
>> If a non-release architecture is found, and it's known that there is no
>> native TCG support for that CPU, automatically fall back to the TCI
>> implementation instead of requesting the user to run configure again
>> with the --enable-tcg-interpreter option.
>>
>> This change simplifies building qemu in automatic build environments
>> (like in my case the debian buildds) because one does not need to
>> special case on the architectures.
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>>
>> diff --git a/configure b/configure
>> index 1c563a7027..8fe4fc84d8 100755
>> --- a/configure
>> +++ b/configure
>> @@ -769,8 +769,10 @@ case "$cpu" in
>>      cross_cc_sparc=$host_cc
>>    ;;
>>    *)
>> -    # This will result in either an error or falling back to TCI later
>> +    # Fall back to TCI on unsupported CPUs
>>      ARCH=unknown
>> +    echo "Unsupported '$cpu' CPU found. Will use TCG with TCI (experimental)."
>> +    tcg_interpreter="yes"
>
> I'm not comfortable with this change... I'd prefer to keep the explicit
> flag. Is your debian buildd targetting mainstream QEMU or the Debian
> package? Can't you manage this as a Debian specific patch?

I tried to get it added as debian specific patch a few times.

> Else, why not add a "hppa" case selecting TCI in the switch? This would
> be less invasive.

If that would be acceptable, I'm happy to send a new patch.

Helge

>>    ;;
>>  esac
>>  if test -z "$ARCH"; then
>> @@ -1855,16 +1857,6 @@ if ! compile_prog ; then
>>      error_exit "\"$cc\" cannot build an executable (is your linker broken?)"
>>  fi
>>
>> -# Now we have handled --enable-tcg-interpreter and know we're not just
>> -# printing the help message, bail out if the host CPU isn't supported.
>> -if test "$ARCH" = "unknown"; then
>> -    if test "$tcg_interpreter" = "yes" ; then
>> -        echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
>> -    else
>> -        error_exit "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
>> -    fi
>> -fi
>> -
>>  # Consult white-list to determine whether to enable werror
>>  # by default.  Only enable by default for git builds
>>  if test -z "$werror" ; then
>>
Helge Deller April 5, 2019, 7:56 a.m. UTC | #4
Hi Peter,

On 05.04.19 03:34, Peter Maydell wrote:
> On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
>> If a non-release architecture is found, and it's known that there is no
>> native TCG support for that CPU, automatically fall back to the TCI
>> implementation instead of requesting the user to run configure again
>> with the --enable-tcg-interpreter option.
>>
>> This change simplifies building qemu in automatic build environments
>> (like in my case the debian buildds) because one does not need to
>> special case on the architectures.
>
> I don't think we should do this. TCI is unmaintained, has several
> known flaws,

Just out of interest: Is there a list with those flaws somewhere?

> does not provide the level of performance that
> people expect from QEMU, and we've talked about removing it
> altogether. In particular, distros should not automatically ship
> a TCI QEMU -- it's something that a user can use if they
> explicitly opt into but which I don't think we want to surprise
> anybody with.

I won't argue against your points. They are all valid.

> If we care about a host architecture we should support it
> with a proper TCG backend. If we don't care that much we
> shouldn't support it.

Looking just at some of the debian "ports" (non-release) architectures:
alpha, hppa, ia64, m68k, powerpc, sh4, sparc64
Most likely nobody will ever care enough to write a TCG backend for those,
and it doesn't even make sense because they are so much slower than
the currently supported TCG backend architectures. So why should one want
to emulate a fast x86 machine on slow m68k for example?

The only reason when it *can* make sense is, when you need "basic"
emulation or availability of binaries for cross-dependecies in distros,
e.g. to build other packages or to be able to install other packages.
As one example, many packages depend on libvirt (frontend tools, monitoring
stuff, ...) and libvirt again depends on some qemu binaries.
So, having qemu buildable (even if the result is slow) makes life easier.

I know, my point above now turns into a "distro packaging" issue.
That's why I'm questioning, why should distros (or even direct end-users)
be forced to distinguish between architectures?
Shouldn't running "./configure" find out what's available and then
 auto-configure itself to the *best* what's possible?

And, my patch still prints the warning that that's an unsupported
architecture, at the end of configure there is still the big warning
about "*this architecture may go away*", and if someone complains about
performance on such an architecture you can still happily invite him to
write the TCG backend.

Anyway, if you think Philippe's suggestion of "why not add a "hppa" case selecting
TCI in the switch?" is the better solution, then I'm happy to send such a patch
instead.

Helge
Daniel P. Berrangé April 5, 2019, 8:26 a.m. UTC | #5
On Fri, Apr 05, 2019 at 09:56:46AM +0200, Helge Deller wrote:
> Hi Peter,
> 
> On 05.04.19 03:34, Peter Maydell wrote:
> > On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
> >> If a non-release architecture is found, and it's known that there is no
> >> native TCG support for that CPU, automatically fall back to the TCI
> >> implementation instead of requesting the user to run configure again
> >> with the --enable-tcg-interpreter option.
> >>
> >> This change simplifies building qemu in automatic build environments
> >> (like in my case the debian buildds) because one does not need to
> >> special case on the architectures.
> >
> > I don't think we should do this. TCI is unmaintained, has several
> > known flaws,
> 
> Just out of interest: Is there a list with those flaws somewhere?
> 
> > does not provide the level of performance that
> > people expect from QEMU, and we've talked about removing it
> > altogether. In particular, distros should not automatically ship
> > a TCI QEMU -- it's something that a user can use if they
> > explicitly opt into but which I don't think we want to surprise
> > anybody with.
> 
> I won't argue against your points. They are all valid.
> 
> > If we care about a host architecture we should support it
> > with a proper TCG backend. If we don't care that much we
> > shouldn't support it.
> 
> Looking just at some of the debian "ports" (non-release) architectures:
> alpha, hppa, ia64, m68k, powerpc, sh4, sparc64
> Most likely nobody will ever care enough to write a TCG backend for those,
> and it doesn't even make sense because they are so much slower than
> the currently supported TCG backend architectures. So why should one want
> to emulate a fast x86 machine on slow m68k for example?
> 
> The only reason when it *can* make sense is, when you need "basic"
> emulation or availability of binaries for cross-dependecies in distros,
> e.g. to build other packages or to be able to install other packages.
> As one example, many packages depend on libvirt (frontend tools, monitoring
> stuff, ...) and libvirt again depends on some qemu binaries.
> So, having qemu buildable (even if the result is slow) makes life easier.

If it is just a matter of satisfying dependencies, it is easy to build
a minimal libvirt that is essentially just the client libraries and not
the QEMU driver that would in turn allow other packages to build. Alternatively
it should be possible to build the downstream apps with their libvirt dep
disabled avoiding pulling in the whole virtualization chain as a build
pre-req.

> I know, my point above now turns into a "distro packaging" issue.
> That's why I'm questioning, why should distros (or even direct end-users)
> be forced to distinguish between architectures?
> Shouldn't running "./configure" find out what's available and then
>  auto-configure itself to the *best* what's possible?

"best" depends on the POV. You want configure to enable support on
hppa by default since you have an interest in that architecture, which
is fine. The QEMU maintainers, however, have a POV that considers
raising an error by default to be the "best" solution for architectures
that are considered unsupported, requiring explicit arg to optin to use
unsupported features.

Regards,
Daniel
Philippe Mathieu-Daudé April 5, 2019, 8:47 a.m. UTC | #6
Hi Helge,

On 4/5/19 9:56 AM, Helge Deller wrote:
> On 05.04.19 03:34, Peter Maydell wrote:
>> On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
>>> If a non-release architecture is found, and it's known that there is no
>>> native TCG support for that CPU, automatically fall back to the TCI
>>> implementation instead of requesting the user to run configure again
>>> with the --enable-tcg-interpreter option.
>>>
>>> This change simplifies building qemu in automatic build environments
>>> (like in my case the debian buildds) because one does not need to
>>> special case on the architectures.
>>
>> I don't think we should do this. TCI is unmaintained, has several
>> known flaws,
> 
> Just out of interest: Is there a list with those flaws somewhere?

The last big discussion is in this thread:
https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06528.html

>> does not provide the level of performance that
>> people expect from QEMU, and we've talked about removing it
>> altogether. In particular, distros should not automatically ship
>> a TCI QEMU -- it's something that a user can use if they
>> explicitly opt into but which I don't think we want to surprise
>> anybody with.
> 
> I won't argue against your points. They are all valid.
> 
>> If we care about a host architecture we should support it
>> with a proper TCG backend. If we don't care that much we
>> shouldn't support it.
> 
> Looking just at some of the debian "ports" (non-release) architectures:
> alpha, hppa, ia64, m68k, powerpc, sh4, sparc64
> Most likely nobody will ever care enough to write a TCG backend for those,
> and it doesn't even make sense because they are so much slower than
> the currently supported TCG backend architectures. So why should one want
> to emulate a fast x86 machine on slow m68k for example?

From experience: when a m68k compiler is only provided as a x86 binary
and you want to recompile m68k code on an embedded m68k with no network
access.

Another example appeared last year on the list, when using a binary
compiled for a more recent ISA on an outdated ISA.

> The only reason when it *can* make sense is, when you need "basic"
> emulation or availability of binaries for cross-dependecies in distros,
> e.g. to build other packages or to be able to install other packages.
> As one example, many packages depend on libvirt (frontend tools, monitoring
> stuff, ...) and libvirt again depends on some qemu binaries.
> So, having qemu buildable (even if the result is slow) makes life easier.
> 
> I know, my point above now turns into a "distro packaging" issue.
> That's why I'm questioning, why should distros (or even direct end-users)
> be forced to distinguish between architectures?
> Shouldn't running "./configure" find out what's available and then
>  auto-configure itself to the *best* what's possible?
> 
> And, my patch still prints the warning that that's an unsupported
> architecture, at the end of configure there is still the big warning
> about "*this architecture may go away*", and if someone complains about
> performance on such an architecture you can still happily invite him to
> write the TCG backend.
> 
> Anyway, if you think Philippe's suggestion of "why not add a "hppa" case selecting
> TCI in the switch?" is the better solution, then I'm happy to send such a patch
> instead.

I suggested that as a "less worth" approach, but I'm 100% with Peter on
this.
I understand your use and still think this should be handled by Debian
for his "ports" (non-release) architectures, as an explicit selected
option on the list you mentioned (alpha, hppa, ia64, m68k, powerpc, sh4,
sparc64).
Daniel P. Berrangé April 5, 2019, 9:02 a.m. UTC | #7
On Fri, Apr 05, 2019 at 10:47:54AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Helge,
> 
> On 4/5/19 9:56 AM, Helge Deller wrote:
> > On 05.04.19 03:34, Peter Maydell wrote:
> >> On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
> >>> If a non-release architecture is found, and it's known that there is no
> >>> native TCG support for that CPU, automatically fall back to the TCI
> >>> implementation instead of requesting the user to run configure again
> >>> with the --enable-tcg-interpreter option.
> >>>
> >>> This change simplifies building qemu in automatic build environments
> >>> (like in my case the debian buildds) because one does not need to
> >>> special case on the architectures.
> >>
> >> I don't think we should do this. TCI is unmaintained, has several
> >> known flaws,
> > 
> > Just out of interest: Is there a list with those flaws somewhere?
> 
> The last big discussion is in this thread:
> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06528.html

Do the various crashes that you illustrate in that cover letter
still exist today ? If so, 2 years of continued brokenness with no
fixes would reinforce the the view that it is time to remove TCI
from the codebase.

Regards,
Daniel
Helge Deller April 5, 2019, 9:02 a.m. UTC | #8
On 05.04.19 10:26, Daniel P. Berrangé wrote:
> On Fri, Apr 05, 2019 at 09:56:46AM +0200, Helge Deller wrote:
>> Hi Peter,
>>
>> On 05.04.19 03:34, Peter Maydell wrote:
>>> On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
>>>> If a non-release architecture is found, and it's known that there is no
>>>> native TCG support for that CPU, automatically fall back to the TCI
>>>> implementation instead of requesting the user to run configure again
>>>> with the --enable-tcg-interpreter option.
>>>>
>>>> This change simplifies building qemu in automatic build environments
>>>> (like in my case the debian buildds) because one does not need to
>>>> special case on the architectures.
>>>
>>> I don't think we should do this. TCI is unmaintained, has several
>>> known flaws,
>>
>> Just out of interest: Is there a list with those flaws somewhere?
>>
>>> does not provide the level of performance that
>>> people expect from QEMU, and we've talked about removing it
>>> altogether. In particular, distros should not automatically ship
>>> a TCI QEMU -- it's something that a user can use if they
>>> explicitly opt into but which I don't think we want to surprise
>>> anybody with.
>>
>> I won't argue against your points. They are all valid.
>>
>>> If we care about a host architecture we should support it
>>> with a proper TCG backend. If we don't care that much we
>>> shouldn't support it.
>>
>> Looking just at some of the debian "ports" (non-release) architectures:
>> alpha, hppa, ia64, m68k, powerpc, sh4, sparc64
>> Most likely nobody will ever care enough to write a TCG backend for those,
>> and it doesn't even make sense because they are so much slower than
>> the currently supported TCG backend architectures. So why should one want
>> to emulate a fast x86 machine on slow m68k for example?
>>
>> The only reason when it *can* make sense is, when you need "basic"
>> emulation or availability of binaries for cross-dependecies in distros,
>> e.g. to build other packages or to be able to install other packages.
>> As one example, many packages depend on libvirt (frontend tools, monitoring
>> stuff, ...) and libvirt again depends on some qemu binaries.
>> So, having qemu buildable (even if the result is slow) makes life easier.
>
> If it is just a matter of satisfying dependencies, it is easy to build
> a minimal libvirt that is essentially just the client libraries and not
> the QEMU driver that would in turn allow other packages to build. Alternatively
> it should be possible to build the downstream apps with their libvirt dep
> disabled avoiding pulling in the whole virtualization chain as a build
> pre-req.

libvirt was just one example, and yes, it's possible to work around in
multiple ways.

As another example, even if I only want to build "qemu-img", I still need
to manually give the --enable-tcg-interpreter configure option.

>> I know, my point above now turns into a "distro packaging" issue.
>> That's why I'm questioning, why should distros (or even direct end-users)
>> be forced to distinguish between architectures?
>> Shouldn't running "./configure" find out what's available and then
>>  auto-configure itself to the *best* what's possible?
>
> "best" depends on the POV. You want configure to enable support on
> hppa by default since you have an interest in that architecture, which
> is fine. The QEMU maintainers, however, have a POV that considers
> raising an error by default to be the "best" solution for architectures
> that are considered unsupported, requiring explicit arg to optin to use
> unsupported features.

Yes, fair point.
Sadly such special treatment by projects makes life for me
as an architecture maintainer much harder :-(

Helge
Philippe Mathieu-Daudé April 5, 2019, 9:10 a.m. UTC | #9
Cc'ing qemu-block.

On 4/5/19 11:02 AM, Helge Deller wrote:
[...]
> As another example, even if I only want to build "qemu-img", I still need
> to manually give the --enable-tcg-interpreter configure option.

You found a bug :)
Peter Maydell April 5, 2019, 9:13 a.m. UTC | #10
On Fri, 5 Apr 2019 at 16:02, Helge Deller <deller@gmx.de> wrote:
> Sadly such special treatment by projects makes life for me
> as an architecture maintainer much harder :-(

It's kind of inevitable for programs that aren't straightforwardly
architecture agnostic. gcc doesn't work for architectures which
don't have implemented backends either...

thanks
-- PMM
Daniel P. Berrangé April 5, 2019, 9:15 a.m. UTC | #11
On Fri, Apr 05, 2019 at 11:02:52AM +0200, Helge Deller wrote:
> On 05.04.19 10:26, Daniel P. Berrangé wrote:
> > On Fri, Apr 05, 2019 at 09:56:46AM +0200, Helge Deller wrote:
> >> Hi Peter,
> >>
> >> On 05.04.19 03:34, Peter Maydell wrote:
> >>> On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
> >>>> If a non-release architecture is found, and it's known that there is no
> >>>> native TCG support for that CPU, automatically fall back to the TCI
> >>>> implementation instead of requesting the user to run configure again
> >>>> with the --enable-tcg-interpreter option.
> >>>>
> >>>> This change simplifies building qemu in automatic build environments
> >>>> (like in my case the debian buildds) because one does not need to
> >>>> special case on the architectures.
> >>>
> >>> I don't think we should do this. TCI is unmaintained, has several
> >>> known flaws,
> >>
> >> Just out of interest: Is there a list with those flaws somewhere?
> >>
> >>> does not provide the level of performance that
> >>> people expect from QEMU, and we've talked about removing it
> >>> altogether. In particular, distros should not automatically ship
> >>> a TCI QEMU -- it's something that a user can use if they
> >>> explicitly opt into but which I don't think we want to surprise
> >>> anybody with.
> >>
> >> I won't argue against your points. They are all valid.
> >>
> >>> If we care about a host architecture we should support it
> >>> with a proper TCG backend. If we don't care that much we
> >>> shouldn't support it.
> >>
> >> Looking just at some of the debian "ports" (non-release) architectures:
> >> alpha, hppa, ia64, m68k, powerpc, sh4, sparc64
> >> Most likely nobody will ever care enough to write a TCG backend for those,
> >> and it doesn't even make sense because they are so much slower than
> >> the currently supported TCG backend architectures. So why should one want
> >> to emulate a fast x86 machine on slow m68k for example?
> >>
> >> The only reason when it *can* make sense is, when you need "basic"
> >> emulation or availability of binaries for cross-dependecies in distros,
> >> e.g. to build other packages or to be able to install other packages.
> >> As one example, many packages depend on libvirt (frontend tools, monitoring
> >> stuff, ...) and libvirt again depends on some qemu binaries.
> >> So, having qemu buildable (even if the result is slow) makes life easier.
> >
> > If it is just a matter of satisfying dependencies, it is easy to build
> > a minimal libvirt that is essentially just the client libraries and not
> > the QEMU driver that would in turn allow other packages to build. Alternatively
> > it should be possible to build the downstream apps with their libvirt dep
> > disabled avoiding pulling in the whole virtualization chain as a build
> > pre-req.
> 
> libvirt was just one example, and yes, it's possible to work around in
> multiple ways.
> 
> As another example, even if I only want to build "qemu-img", I still need
> to manually give the --enable-tcg-interpreter configure option.

That is something I would consider a valid bug to fix. We should permit
people to do  "./configure --disable-system --disable-user --enable-tools".
Requiring --enable-tcg-interpreter in that scenario is clearly wrong,
as we're not even building it.

> >> I know, my point above now turns into a "distro packaging" issue.
> >> That's why I'm questioning, why should distros (or even direct end-users)
> >> be forced to distinguish between architectures?
> >> Shouldn't running "./configure" find out what's available and then
> >>  auto-configure itself to the *best* what's possible?
> >
> > "best" depends on the POV. You want configure to enable support on
> > hppa by default since you have an interest in that architecture, which
> > is fine. The QEMU maintainers, however, have a POV that considers
> > raising an error by default to be the "best" solution for architectures
> > that are considered unsupported, requiring explicit arg to optin to use
> > unsupported features.
> 
> Yes, fair point.
> Sadly such special treatment by projects makes life for me
> as an architecture maintainer much harder :-(

FWIW I like to see code portability that is as broad as is practical.
At the same time though, essentially every open source project is
massively overloaded with work they want to accomplish vs number of
contributors available.

There has to be a tradeoff in where effort is spent and usually that
tradeoff ends up priortizing areas that are going to benefit the
largest number of users. In recent times QEMU has been trying to
formalize what it considers supported vs unsupported so that we have
a better plan for where we spend our limited contributor resources.
The intent is that this will help us improve overall project quality.
That has meant dropping support for many older distros, and marking
some OSes, distros & architectures as unsupported with a view to fully
dropping ability to build on them in future.

Regards,
Daniel
Philippe Mathieu-Daudé April 5, 2019, 9:16 a.m. UTC | #12
On 4/5/19 11:02 AM, Daniel P. Berrangé wrote:
> On Fri, Apr 05, 2019 at 10:47:54AM +0200, Philippe Mathieu-Daudé wrote:
>> Hi Helge,
>>
>> On 4/5/19 9:56 AM, Helge Deller wrote:
>>> On 05.04.19 03:34, Peter Maydell wrote:
>>>> On Fri, 5 Apr 2019 at 01:59, Helge Deller <deller@gmx.de> wrote:
>>>>> If a non-release architecture is found, and it's known that there is no
>>>>> native TCG support for that CPU, automatically fall back to the TCI
>>>>> implementation instead of requesting the user to run configure again
>>>>> with the --enable-tcg-interpreter option.
>>>>>
>>>>> This change simplifies building qemu in automatic build environments
>>>>> (like in my case the debian buildds) because one does not need to
>>>>> special case on the architectures.
>>>>
>>>> I don't think we should do this. TCI is unmaintained, has several
>>>> known flaws,
>>>
>>> Just out of interest: Is there a list with those flaws somewhere?
>>
>> The last big discussion is in this thread:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06528.html

I just noticed this link doesn't show the full thread replies list, and
want to point out Stefan's one:

https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06841.html

I guess remember someone (Thomas? you Daniel?) posting a link of 3 forks
using TCI for reverse engineering but I can't find it to check if they
are still active.

> Do the various crashes that you illustrate in that cover letter
> still exist today ? If so, 2 years of continued brokenness with no
> fixes would reinforce the the view that it is time to remove TCI
> from the codebase.

Or find a maintainer and add tests...
Richard Henderson April 6, 2019, 8:59 a.m. UTC | #13
On 4/5/19 2:56 PM, Helge Deller wrote:
> Looking just at some of the debian "ports" (non-release) architectures:
> alpha, hppa, ia64, m68k, powerpc, sh4, sparc64

FWIW:

sparc64 does have a tcg backend.
(Indeed, tcg/sparc does *not* support 32-bit cpus!)

powerpc does have a tcg backend, and it has both 32 and 64-bit users.

Both hppa and ia64 used to have tcg backends, and could
be resurrected if there was a will.

I have a tcg/alpha backend that I haven't bothered to
merge, because I assumed there was little interest.


r~
Stefan Weil April 9, 2019, 7:46 p.m. UTC | #14
On 05.04.19 11:16, Philippe Mathieu-Daudé wrote:
> On 4/5/19 11:02 AM, Daniel P. Berrangé wrote:
>> On Fri, Apr 05, 2019 at 10:47:54AM +0200, Philippe Mathieu-Daudé wrote:
>> Do the various crashes that you illustrate in that cover letter
>> still exist today ? If so, 2 years of continued brokenness with no
>> fixes would reinforce the the view that it is time to remove TCI
>> from the codebase.
> 
> Or find a maintainer and add tests...

Thank you for CC'ing me. I could not spend much of my free time for QEMU
last year and typically will miss important messages on the list unless
my address is explicitly given. Nevertheless I still feel responsible
for TCI, and I am also listed as maintainer in MAINTAINERS.

In the past there was only limited use of TCI (as far as I know), and
the current implementation worked for many applications, for example to
count the frequency of TCG opcodes.

The known problems with the current implementation are

* Misaligned memory accesses. This should be easy to fix. I already sent
a patch (currently discussed) which fixed user mode emulation on ARM for me.

* Calling conventions. The current implementation works on many hosts,
but for example not on Sparc. A fix would require simple calling
conventions for all helper functions (for example stack based argument
passing, can this be enforced?), or it needs to know the signature of
each helper function at runtime. I'm afraid that fixing this would
require much work. A runtime test whether calls of helper functions work
correctly could be implemented easily and could abort the program
execution when calls fail to pass the right arguments. Would such a
runtime test help a little bit?

* Performance. It can be improved a bit by implementing more TCG opcodes
for the interpreter, but of course TCI is slower than TCG (which is
slower than KVM).

Cheers
Stefan
Richard Henderson April 9, 2019, 8:39 p.m. UTC | #15
On 4/9/19 9:46 AM, Stefan Weil wrote:
> * Calling conventions. The current implementation works on many hosts,
> but for example not on Sparc. A fix would require simple calling
> conventions for all helper functions (for example stack based argument
> passing, can this be enforced?), or it needs to know the signature of
> each helper function at runtime. I'm afraid that fixing this would
> require much work. A runtime test whether calls of helper functions work
> correctly could be implemented easily and could abort the program
> execution when calls fail to pass the right arguments. Would such a
> runtime test help a little bit?

In the rewrite of tci that I proposed some years ago,
I used libffi for this.

Those patches could probably be recovered...

https://patchwork.ozlabs.org/patch/348528/
https://patchwork.ozlabs.org/patch/348527/


r~
Thomas Huth April 10, 2019, 6:07 a.m. UTC | #16
On 09/04/2019 21.46, Stefan Weil wrote:
> On 05.04.19 11:16, Philippe Mathieu-Daudé wrote:
>> On 4/5/19 11:02 AM, Daniel P. Berrangé wrote:
>>> On Fri, Apr 05, 2019 at 10:47:54AM +0200, Philippe Mathieu-Daudé wrote:
>>> Do the various crashes that you illustrate in that cover letter
>>> still exist today ? If so, 2 years of continued brokenness with no
>>> fixes would reinforce the the view that it is time to remove TCI
>>> from the codebase.
>>
>> Or find a maintainer and add tests...
> 
> Thank you for CC'ing me. I could not spend much of my free time for QEMU
> last year and typically will miss important messages on the list unless
> my address is explicitly given. Nevertheless I still feel responsible
> for TCI, and I am also listed as maintainer in MAINTAINERS.

That's great, good to know that you're still interested in TCI! ... but
I think one of the main problems is still that we completely lack test
coverage for TCI - the code always is in danger to bit-rot if it is not
tested by default.

I could maybe have a try to add some test to our .gitlab-ci.yml file ...
if you or somebody else could add one to .travis.yml, that would be
great. Something like:

 - ./configure --enable-tcg-interpreter --target-list=alpha-softmmu,ppc64-softmmu,i386-softmmu,sparc-softmmu,microblaze-softmmu,moxie-softmmu,arm-softmmu,hppa-softmmu
 - make
 - make tests/boot-serial-test
 - QTEST_QEMU_BINARY="alpha-softmmu/qemu-system-alpha"  ./tests/boot-serial-test
 - QTEST_QEMU_BINARY="ppc64-softmmu/qemu-system-ppc64"  ./tests/boot-serial-test
 - ...

 Thomas
Stefan Weil April 10, 2019, 6:17 a.m. UTC | #17
On 09.04.19 22:39, Richard Henderson wrote:
> On 4/9/19 9:46 AM, Stefan Weil wrote:
>> * Calling conventions. The current implementation works on many hosts,
>> but for example not on Sparc. A fix would require simple calling
>> conventions for all helper functions (for example stack based argument
>> passing, can this be enforced?), or it needs to know the signature of
>> each helper function at runtime. I'm afraid that fixing this would
>> require much work. A runtime test whether calls of helper functions work
>> correctly could be implemented easily and could abort the program
>> execution when calls fail to pass the right arguments. Would such a
>> runtime test help a little bit?
> 
> In the rewrite of tci that I proposed some years ago,
> I used libffi for this.
> 
> Those patches could probably be recovered...
> 
> https://patchwork.ozlabs.org/patch/348528/
> https://patchwork.ozlabs.org/patch/348527/


The first patch is the important one because it adds the signature to
each helper function. As soon as this information is passed to the
interpreter, the rest is simple. It should not be necessary to add a new
dependency on libffi, because there are no foreign calling conventions
here: all helpers use the C calling convention of the host architecture.

The TCG implementation for Sparc (and other hosts with special calling
conventions) must also have some minimum information on the signatures
of the helper functions. It should be possible to use the same
information for TCI. Maybe your approach ist a general form which could
be used for any TCG implementation.

Stefan
Stefan Weil April 10, 2019, 6:24 a.m. UTC | #18
On 10.04.19 08:07, Thomas Huth wrote:
> That's great, good to know that you're still interested in TCI! ... but
> I think one of the main problems is still that we completely lack test
> coverage for TCI - the code always is in danger to bit-rot if it is not
> tested by default.


Ideally it would be possible to build a QEMU executable with KVM (on
those hosts which support it), native TCG and TCI, so users could decide
at runtime which of the three alternatives they want. Then CI builds
(not necessarily production builds) would include all source files and
could also run tests with TCI if desired.

It was not possible to have native and interpreted TCG selectable at
runtime when I started the TCI implementation, but time went on and QEMU
and TCG evolved, so maybe it is now possible.

Stefan
Thomas Huth April 10, 2019, 7:48 a.m. UTC | #19
On 09/04/2019 21.46, Stefan Weil wrote:
[...]
> The known problems with the current implementation are
[...]
> * Calling conventions. The current implementation works on many hosts,
> but for example not on Sparc
Is there also a problem with the sparc *target* (i.e. not only sparc
hosts)? TCI does not seem to work here:

 https://gitlab.com/huth/qemu/-/jobs/193861498

 Thomas
Thomas Huth April 10, 2019, 8:22 a.m. UTC | #20
On 10/04/2019 08.07, Thomas Huth wrote:
> On 09/04/2019 21.46, Stefan Weil wrote:
>> On 05.04.19 11:16, Philippe Mathieu-Daudé wrote:
>>> On 4/5/19 11:02 AM, Daniel P. Berrangé wrote:
>>>> On Fri, Apr 05, 2019 at 10:47:54AM +0200, Philippe Mathieu-Daudé wrote:
>>>> Do the various crashes that you illustrate in that cover letter
>>>> still exist today ? If so, 2 years of continued brokenness with no
>>>> fixes would reinforce the the view that it is time to remove TCI
>>>> from the codebase.
>>>
>>> Or find a maintainer and add tests...
>>
>> Thank you for CC'ing me. I could not spend much of my free time for QEMU
>> last year and typically will miss important messages on the list unless
>> my address is explicitly given. Nevertheless I still feel responsible
>> for TCI, and I am also listed as maintainer in MAINTAINERS.
> 
> That's great, good to know that you're still interested in TCI! ... but
> I think one of the main problems is still that we completely lack test
> coverage for TCI - the code always is in danger to bit-rot if it is not
> tested by default.
> 
> I could maybe have a try to add some test to our .gitlab-ci.yml file ...
> if you or somebody else could add one to .travis.yml, that would be
> great. Something like:
> 
>  - ./configure --enable-tcg-interpreter --target-list=alpha-softmmu,ppc64-softmmu,i386-softmmu,sparc-softmmu,microblaze-softmmu,moxie-softmmu,arm-softmmu,hppa-softmmu
>  - make
>  - make tests/boot-serial-test
>  - QTEST_QEMU_BINARY="alpha-softmmu/qemu-system-alpha"  ./tests/boot-serial-test
>  - QTEST_QEMU_BINARY="ppc64-softmmu/qemu-system-ppc64"  ./tests/boot-serial-test
>  - ...

Additionally, I think it should be possible to compile with the
x86_64-linux-user target and then to run "make check-tcg" ... however,
that currently crashes with:

TODO qemu/tcg/tci.c:859: tcg_qemu_tb_exec()
qemu/tcg/tci.c:859: tcg fatal error
qemu: uncaught target signal 11 (Segmentation fault) - core dumped

 Thomas
Stefan Weil April 10, 2019, 7:22 p.m. UTC | #21
On 10.04.19 10:22, Thomas Huth wrote:
> Additionally, I think it should be possible to compile with the
> x86_64-linux-user target and then to run "make check-tcg" ... however,
> that currently crashes with:
> 
> TODO qemu/tcg/tci.c:859: tcg_qemu_tb_exec()
> qemu/tcg/tci.c:859: tcg fatal error
> qemu: uncaught target signal 11 (Segmentation fault) - core dumped

That's easy to fix. I implemented only those TCG opcodes which really
were executed during testing. This test triggers an assertion because it
needs INDEX_op_ld16u_i64 which was missing up to now.

I'll send a patch which adds support for INDEX_op_ld16u_i64.

Thank you for your tests.

Stefan
Richard Henderson April 11, 2019, 6:21 a.m. UTC | #22
On 4/9/19 8:17 PM, Stefan Weil wrote:
> On 09.04.19 22:39, Richard Henderson wrote:
>> On 4/9/19 9:46 AM, Stefan Weil wrote:
>>> * Calling conventions. The current implementation works on many hosts,
>>> but for example not on Sparc. A fix would require simple calling
>>> conventions for all helper functions (for example stack based argument
>>> passing, can this be enforced?), or it needs to know the signature of
>>> each helper function at runtime. I'm afraid that fixing this would
>>> require much work. A runtime test whether calls of helper functions work
>>> correctly could be implemented easily and could abort the program
>>> execution when calls fail to pass the right arguments. Would such a
>>> runtime test help a little bit?
>>
>> In the rewrite of tci that I proposed some years ago,
>> I used libffi for this.
>>
>> Those patches could probably be recovered...
>>
>> https://patchwork.ozlabs.org/patch/348528/
>> https://patchwork.ozlabs.org/patch/348527/
> 
> 
> The first patch is the important one because it adds the signature to
> each helper function. As soon as this information is passed to the
> interpreter, the rest is simple. It should not be necessary to add a new
> dependency on libffi, because there are no foreign calling conventions
> here: all helpers use the C calling convention of the host architecture.

Perhaps the "foreign" in libffi is confusing you.  Its goal *is* the C calling
convention of the host architecture.

> The TCG implementation for Sparc (and other hosts with special calling
> conventions) must also have some minimum information on the signatures
> of the helper functions. It should be possible to use the same
> information for TCI. Maybe your approach ist a general form which could
> be used for any TCG implementation.

We currently capture only enough information to support the TCG native hosts.
If we were to attempt to support, e.g. m68k as a host, then we would have to
expand this.  What I capture for libffi is exact, so it's certainly possible.

But the number of possible function signatures is large.  I forget the simple
math that expresses this well, but doing this the long way suggests 4372 as the
number of combinations (up to 6 arguments of 3 types; 4 return types).

This number is large enough that you cannot just open code all of the possible
calls.  You absolutely have to know something about the host compiler's calling
convention.

I see only 3 possibilities for this:

(1) Use per host ifdefs and/or assembly to DTRT within TCI.
    This admits that TCI requires *some* porting and does
    not automatically run everywhere.
(2) Use an external library that already has said assembly,
    and has already been ported to every interesting processor,
    since it is often done along with the compiler port.
(3) Write a proper TCG backend for each host.

If you see another possibility, please enlighten me.


r~
diff mbox series

Patch

diff --git a/configure b/configure
index 1c563a7027..8fe4fc84d8 100755
--- a/configure
+++ b/configure
@@ -769,8 +769,10 @@  case "$cpu" in
     cross_cc_sparc=$host_cc
   ;;
   *)
-    # This will result in either an error or falling back to TCI later
+    # Fall back to TCI on unsupported CPUs
     ARCH=unknown
+    echo "Unsupported '$cpu' CPU found. Will use TCG with TCI (experimental)."
+    tcg_interpreter="yes"
   ;;
 esac
 if test -z "$ARCH"; then
@@ -1855,16 +1857,6 @@  if ! compile_prog ; then
     error_exit "\"$cc\" cannot build an executable (is your linker broken?)"
 fi

-# Now we have handled --enable-tcg-interpreter and know we're not just
-# printing the help message, bail out if the host CPU isn't supported.
-if test "$ARCH" = "unknown"; then
-    if test "$tcg_interpreter" = "yes" ; then
-        echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
-    else
-        error_exit "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
-    fi
-fi
-
 # Consult white-list to determine whether to enable werror
 # by default.  Only enable by default for git builds
 if test -z "$werror" ; then