Message ID | E8B64B53-09F0-4B1E-9F11-48CC3957C4E1@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03/12/2016 06:59, Programmingkid wrote: > The floating point code used in fpu_helper.c can be sped up by using > the IEEE 754 support added to the C99 standard. To test this code out > simply set and unset the I_NEED_SPEED macro. The program to test out > each version of the helper_fmadd() function is below the patch. It > needs to be ran in the guest. The emulator to use is qemu-system-ppc. > I used a Mac OS X guest, but the test program would compile on a > Linux guest. > > This patch does make the fused multiply-add instruction fmadd work > faster and still give a correct result. > > This documentation might be of help to those who want to learn more > about C99's IEEE 754 support: > http://grouper.ieee.org/groups/754/meeting-materials/2001-07-18-c99.pdf You're undoing what was done in 2007: commit 76a66253e5e48f1744f689041c1c21cedcaff630 Author: j_mayer <j_mayer@c046a42c-6fe2-441c-8c8c-71466251a162> Date: Wed Mar 7 08:32:30 2007 +0000 Great PowerPC emulation code resynchronisation and improvments: ... - Micro-operation fixes: ... * use softfloat routines for all floating-point operations Paolo
On Dec 3, 2016, at 3:44 AM, Paolo Bonzini wrote: > > > On 03/12/2016 06:59, Programmingkid wrote: >> The floating point code used in fpu_helper.c can be sped up by using >> the IEEE 754 support added to the C99 standard. To test this code out >> simply set and unset the I_NEED_SPEED macro. The program to test out >> each version of the helper_fmadd() function is below the patch. It >> needs to be ran in the guest. The emulator to use is qemu-system-ppc. >> I used a Mac OS X guest, but the test program would compile on a >> Linux guest. >> >> This patch does make the fused multiply-add instruction fmadd work >> faster and still give a correct result. >> >> This documentation might be of help to those who want to learn more >> about C99's IEEE 754 support: >> http://grouper.ieee.org/groups/754/meeting-materials/2001-07-18-c99.pdf > > You're undoing what was done in 2007: > > commit 76a66253e5e48f1744f689041c1c21cedcaff630 > Author: j_mayer <j_mayer@c046a42c-6fe2-441c-8c8c-71466251a162> > Date: Wed Mar 7 08:32:30 2007 +0000 > > Great PowerPC emulation code resynchronisation and improvments: > ... > - Micro-operation fixes: > ... > * use softfloat routines for all floating-point operations > > Paolo Yes it would be. The commit message never stated why he wanted to switch to floating point softfloat routines. These are my guesses: - Different versions of gcc might have produced different results? - Different hosts produced different results and instead wanted consistency? - Author did not know about the C99 numerics support? The C99 standard will keep results accurate on both different versions of gcc and different host architecture. It will also give us speed over the softfloat routines.
On 12/03/2016 09:07 AM, Programmingkid wrote: > Yes it would be. The commit message never stated why he wanted to switch > to floating point softfloat routines. These are my guesses: ... > - Different hosts produced different results and instead wanted consistency? This is the correct answer. There are quite a lot of differences between floating point across different cpus. Which means that it's actually rare that the host and guest floating point are exactly alike. r~
Hi, Your series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [RFC] target-ppc/fpu_helper.c: Use C99 code to speed up floating point unit Type: series Message-id: E8B64B53-09F0-4B1E-9F11-48CC3957C4E1@gmail.com === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 # Useful git options git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20161202194854.18737-1-eblake@redhat.com -> patchew/20161202194854.18737-1-eblake@redhat.com Switched to a new branch 'test' 6f17682 target-ppc/fpu_helper.c: Use C99 code to speed up floating point unit === OUTPUT BEGIN === Checking PATCH 1/1: target-ppc/fpu_helper.c: Use C99 code to speed up floating point unit... WARNING: line over 80 characters #145: FILE: target-ppc/fpu_helper.c:811: + else if (isinf(farg1.d * farg2.d) && isinf(farg3.d) && (signbit(farg3.d) != 0)) { ERROR: Missing Signed-off-by: line(s) total: 1 errors, 1 warnings, 172 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@freelists.org
On 12/03/2016 07:32 PM, Richard Henderson wrote: > On 12/03/2016 09:07 AM, Programmingkid wrote: >> Yes it would be. The commit message never stated why he wanted to switch >> to floating point softfloat routines. These are my guesses: > ... >> - Different hosts produced different results and instead wanted >> consistency? > > This is the correct answer. There are quite a lot of differences > between floating point across different cpus. Which means that it's > actually rare that the host and guest floating point are exactly alike. Or to frame it in different words: qemu's goal is to be an architecturally-accurate emulator. fma() has specific C99 semantics (mapping to what IEEE floating point requires), but USUALLY those semantics are not fully implemented in hardware; rather, the hardware implements 90% and has specific flags or traps that let the C library recognize when it has to manually perform the 10% of special cases. But the special cases differ by hardware. If all hardware implemented C99 semantics 100% of the time, then using C99 to emulate hardware would be correct. But since that is not true, emulating C99 semantics instead of hardware semantics will be WRONG in the special cases, and cause programs to misbehave under qemu where they would behave correctly on bare metal. So the decision was that accurate emulation (guaranteed by using soft-float operations that are tailored to each hardware's specific quirks on the special cases) is more important that speed (where the behavior can be different from hardware on the special cases, even if that behavior is accurate to IEEE and C99 semantics). If you can completely describe ALL special cases that ANY particular platform has quirks for, and kick to softmmu for those cases while still sticking to C99 for the common cases, you may have success. But at this point, I doubt you have properly identified all the quirks where native fma instructions differ from C99 fma() requirements, let alone the differences in emulation that will be required to properly quirk all possible floating point hardware. And it's not even guaranteed that identifying all the inputs that need special casing for various guest hardware, before calling out to fma() in the remainder of cases, will not slow things down (since fma() will have its own set of additional quirking conditionals based on host hardware).
diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index b0760f0..05eb26e 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -20,10 +20,22 @@ #include "cpu.h" #include "exec/helper-proto.h" #include "exec/exec-all.h" +#include <inttypes.h> +#include <math.h> +#include <fenv.h> #define float64_snan_to_qnan(x) ((x) | 0x0008000000000000ULL) #define float32_snan_to_qnan(x) ((x) | 0x00400000) +#define DEBUG_FPU 0 + +#define DPRINTF(fmt, ...) do { \ + if (DEBUG_FPU) { \ + printf("FPU: " fmt , ## __VA_ARGS__); \ + } \ +} while (0); + + /*****************************************************************************/ /* Floating point operations helpers */ uint64_t helper_float32_to_float64(CPUPPCState *env, uint32_t arg) @@ -281,29 +293,36 @@ static inline void float_inexact_excp(CPUPPCState *env) static inline void fpscr_set_rounding_mode(CPUPPCState *env) { - int rnd_type; + int rnd_type, result = 0; /* Set rounding mode */ switch (fpscr_rn) { case 0: /* Best approximation (round to nearest) */ rnd_type = float_round_nearest_even; + result = fesetround(FE_TONEAREST); break; case 1: /* Smaller magnitude (round toward zero) */ rnd_type = float_round_to_zero; + result = fesetround(FE_TOWARDZERO); break; case 2: /* Round toward +infinite */ rnd_type = float_round_up; + result = fesetround(FE_UPWARD); break; default: case 3: /* Round toward -infinite */ rnd_type = float_round_down; + result = fesetround(FE_DOWNWARD); break; } set_float_rounding_mode(rnd_type, &env->fp_status); + if (result != 0) { + printf("Error: rounding mode was not set\n"); + } } void helper_fpscr_clrbit(CPUPPCState *env, uint32_t bit) @@ -534,6 +553,7 @@ void helper_float_check_status(CPUPPCState *env) void helper_reset_fpstatus(CPUPPCState *env) { set_float_exception_flags(0, &env->fp_status); + feclearexcept(FE_ALL_EXCEPT); } /* fadd - fadd. */ @@ -737,16 +757,94 @@ uint64_t helper_frim(CPUPPCState *env, uint64_t arg) return do_fri(env, arg, float_round_down); } -/* fmadd - fmadd. */ +#define I_NEED_SPEED 1 +#ifdef I_NEED_SPEED + +union Converter { + uint64_t i; + double d; +}; + +typedef union Converter Converter; + +/* fmadd - fmadd. - fast */ uint64_t helper_fmadd(CPUPPCState *env, uint64_t arg1, uint64_t arg2, uint64_t arg3) { + DPRINTF("Fast helper_fmadd() called\n"); + Converter farg1, farg2, farg3, result; + + farg1.i = arg1; + farg2.i = arg2; + farg3.i = arg3; + + DPRINTF("farg1.d = %f\n", farg1.d); + DPRINTF("farg2.d = %f\n", farg2.d); + DPRINTF("farg3.d = %f\n", farg3.d); + + /* if signalling NaN operation */ + if (unlikely(float64_is_signaling_nan(farg1.d, &env->fp_status) || + float64_is_signaling_nan(farg2.d, &env->fp_status) || + float64_is_signaling_nan(farg3.d, &env->fp_status))) { + float_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 1); + } + + result.d = fma(farg1.d, farg2.d, farg3.d); /* fused multiply-add function */ + if (fetestexcept(FE_INEXACT)) { + DPRINTF("FE_INEXACT\n"); + float_inexact_excp(env); + } + if (fetestexcept(FE_INVALID)) { + DPRINTF("FE_INVALID\n"); + + /* 0 * infinity */ + if ((fpclassify(farg1.d) == FP_ZERO) && isinf(farg2.d)) { + result.i = float_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ, 1); + } + + /* infinity * 0 */ + else if (isinf(farg1.d) && (fpclassify(farg2.d) == FP_ZERO)) { + result.i = float_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ, 1); + } + + /* infinity - infinity */ + else if (isinf(farg1.d * farg2.d) && isinf(farg3.d) && (signbit(farg3.d) != 0)) { + result.i = float_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, 1); + } + } + if (fetestexcept(FE_OVERFLOW)) { + DPRINTF("FE_OVERFLOW\n"); + float_overflow_excp(env); + } + if (fetestexcept(FE_UNDERFLOW)) { + DPRINTF("FE_UNDERFLOW\n"); + float_underflow_excp(env); + } + + DPRINTF("result.d = %f\n", result.d); + DPRINTF("result.i = 0x%" PRIx64 "\n", result.i); + + return result.i; +} + +#else + +/* fmadd - fmadd. - original */ +uint64_t helper_fmadd(CPUPPCState *env, uint64_t arg1, uint64_t arg2, + uint64_t arg3) +{ + DPRINTF("old helper_fmadd() called\n"); + CPU_DoubleU farg1, farg2, farg3; farg1.ll = arg1; farg2.ll = arg2; farg3.ll = arg3; + DPRINTF("farg1.d = %f\n", farg1.d); + DPRINTF("farg2.d = %f\n", farg2.d); + DPRINTF("farg3.d = %f\n", farg3.d); + if (unlikely((float64_is_infinity(farg1.d) && float64_is_zero(farg2.d)) || (float64_is_zero(farg1.d) && float64_is_infinity(farg2.d)))) { /* Multiplication of zero by infinity */ @@ -775,9 +873,10 @@ uint64_t helper_fmadd(CPUPPCState *env, uint64_t arg1, uint64_t arg2, farg1.d = float128_to_float64(ft0_128, &env->fp_status); } } - + DPRINTF("farg1.ll = 0x%" PRIx64 "\n", farg1.ll); return farg1.ll; } +#endif /* fmsub - fmsub. */ uint64_t helper_fmsub(CPUPPCState *env, uint64_t arg1, uint64_t arg2,