From patchwork Sat Dec 3 05:59:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Programmingkid X-Patchwork-Id: 9459547 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7ACE660515 for ; Sat, 3 Dec 2016 06:00:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 60EDA284E3 for ; Sat, 3 Dec 2016 06:00:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5251028554; Sat, 3 Dec 2016 06:00:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6FED7284E3 for ; Sat, 3 Dec 2016 06:00:23 +0000 (UTC) Received: from localhost ([::1]:37893 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cD3My-0005MD-PA for patchwork-qemu-devel@patchwork.kernel.org; Sat, 03 Dec 2016 01:00:20 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36966) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cD3Md-0005Le-6x for qemu-devel@nongnu.org; Sat, 03 Dec 2016 01:00:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cD3Ma-0003Au-2C for qemu-devel@nongnu.org; Sat, 03 Dec 2016 00:59:59 -0500 Received: from mail-io0-x243.google.com ([2607:f8b0:4001:c06::243]:35913) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cD3MZ-0003Ak-NI; Sat, 03 Dec 2016 00:59:56 -0500 Received: by mail-io0-x243.google.com with SMTP id s82so11957853ioi.3; Fri, 02 Dec 2016 21:59:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-transfer-encoding:subject:date:message-id:cc:to :mime-version; bh=6ktSbxH1YWPSCIAW1hNbL08q82C1uA6Adu3e89+FbEE=; b=fzTV5ftgqoADdbIm94v2FfrJkzP0jRlYJiBZmFysVr4RCAbgukjYswZbzxe7UwkBH4 XV6XQtpCT7U7LMk3TrydLqnyqUrVr7K/ZFo23NYHTTk25sp0bZTHVe5oJCXyAPHiwLuh LtV2L8TZvU1XGYOE21eHef5EeHyU8F8qFHHM2cfcbjm5aAGUzmOBqzn2TL9SUuiGmjyl UYCq7eRo2vg1Z9BO8EuL9e7JRWYJWrdtJL3knYwX/rfzvQJWb+G7AtqgLHpVtGHpuHSn KX6QE83nLWaqGjG2SWJQdjPtQx23XFIjJ4wmjQuzo8nXpHCh5pvwdQ5Dx6zexqMHeJ9y CzOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-transfer-encoding:subject:date :message-id:cc:to:mime-version; bh=6ktSbxH1YWPSCIAW1hNbL08q82C1uA6Adu3e89+FbEE=; b=iWTedHM4ff2FgyWzYP1/GX4+9KvJt8kxpU93QHi3Yhq9ZeAFFzSU1uL7rQvY2+YNpa zB/AlvwwmBpb2juTGjbYFWFya3ZPO1YrzZ8ObazSN1Q38iGD7GsBLI1o0mFo5GKKFDmS tYYJAANsMmv0pbgXLIqgeTNq7eXPoIGYfudJKyeIj1NRkfEdqLIQsZaDitQNc4HEXItu wukQWV6pEtA8kDREt3PCiGEfco+oEtXpXpCaLqvHm52l+4tI3Ah22jsNkm8sprSBk7bu WVkahLksCmxWYqU10LvRHjqppwsKTrmyTjfoxdsKg4UevdYgy6Dzak4NDjgIV36T74ay Vg8w== X-Gm-Message-State: AKaTC00GNqiK1XPzfMjR09EZgZg7InNsGNpyaSgHjVT/JlAyl5+XGhcPG9+P8PbYsb6xRw== X-Received: by 10.107.183.148 with SMTP id h142mr40017906iof.190.1480744794212; Fri, 02 Dec 2016 21:59:54 -0800 (PST) Received: from [192.168.0.7] (d199-74-164-53.col.wideopenwest.com. [74.199.53.164]) by smtp.gmail.com with ESMTPSA id y125sm2388590ity.13.2016.12.02.21.59.53 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 02 Dec 2016 21:59:53 -0800 (PST) From: Programmingkid Date: Sat, 3 Dec 2016 00:59:52 -0500 Message-Id: To: Peter Maydell , David Gibson Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:4001:c06::243 Subject: [Qemu-devel] [RFC] target-ppc/fpu_helper.c: Use C99 code to speed up floating point unit X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "list@suse.de:PowerPC list:PowerPC" , qemu-devel qemu-devel Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP The floating point code used in fpu_helper.c can be sped up by using the IEEE 754 support added to the C99 standard. To test this code out simply set and unset the I_NEED_SPEED macro. The program to test out each version of the helper_fmadd() function is below the patch. It needs to be ran in the guest. The emulator to use is qemu-system-ppc. I used a Mac OS X guest, but the test program would compile on a Linux guest. This patch does make the fused multiply-add instruction fmadd work faster and still give a correct result. This documentation might be of help to those who want to learn more about C99's IEEE 754 support: http://grouper.ieee.org/groups/754/meeting-materials/2001-07-18-c99.pdf --- target-ppc/fpu_helper.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 102 insertions(+), 3 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index b0760f0..05eb26e 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -20,10 +20,22 @@ #include "cpu.h" #include "exec/helper-proto.h" #include "exec/exec-all.h" +#include +#include +#include #define float64_snan_to_qnan(x) ((x) | 0x0008000000000000ULL) #define float32_snan_to_qnan(x) ((x) | 0x00400000) +#define DEBUG_FPU 0 + +#define DPRINTF(fmt, ...) do { \ + if (DEBUG_FPU) { \ + printf("FPU: " fmt , ## __VA_ARGS__); \ + } \ +} while (0); + + /*****************************************************************************/ /* Floating point operations helpers */ uint64_t helper_float32_to_float64(CPUPPCState *env, uint32_t arg) @@ -281,29 +293,36 @@ static inline void float_inexact_excp(CPUPPCState *env) static inline void fpscr_set_rounding_mode(CPUPPCState *env) { - int rnd_type; + int rnd_type, result = 0; /* Set rounding mode */ switch (fpscr_rn) { case 0: /* Best approximation (round to nearest) */ rnd_type = float_round_nearest_even; + result = fesetround(FE_TONEAREST); break; case 1: /* Smaller magnitude (round toward zero) */ rnd_type = float_round_to_zero; + result = fesetround(FE_TOWARDZERO); break; case 2: /* Round toward +infinite */ rnd_type = float_round_up; + result = fesetround(FE_UPWARD); break; default: case 3: /* Round toward -infinite */ rnd_type = float_round_down; + result = fesetround(FE_DOWNWARD); break; } set_float_rounding_mode(rnd_type, &env->fp_status); + if (result != 0) { + printf("Error: rounding mode was not set\n"); + } } void helper_fpscr_clrbit(CPUPPCState *env, uint32_t bit) @@ -534,6 +553,7 @@ void helper_float_check_status(CPUPPCState *env) void helper_reset_fpstatus(CPUPPCState *env) { set_float_exception_flags(0, &env->fp_status); + feclearexcept(FE_ALL_EXCEPT); } /* fadd - fadd. */ @@ -737,16 +757,94 @@ uint64_t helper_frim(CPUPPCState *env, uint64_t arg) return do_fri(env, arg, float_round_down); } -/* fmadd - fmadd. */ +#define I_NEED_SPEED 1 +#ifdef I_NEED_SPEED + +union Converter { + uint64_t i; + double d; +}; + +typedef union Converter Converter; + +/* fmadd - fmadd. - fast */ uint64_t helper_fmadd(CPUPPCState *env, uint64_t arg1, uint64_t arg2, uint64_t arg3) { + DPRINTF("Fast helper_fmadd() called\n"); + Converter farg1, farg2, farg3, result; + + farg1.i = arg1; + farg2.i = arg2; + farg3.i = arg3; + + DPRINTF("farg1.d = %f\n", farg1.d); + DPRINTF("farg2.d = %f\n", farg2.d); + DPRINTF("farg3.d = %f\n", farg3.d); + + /* if signalling NaN operation */ + if (unlikely(float64_is_signaling_nan(farg1.d, &env->fp_status) || + float64_is_signaling_nan(farg2.d, &env->fp_status) || + float64_is_signaling_nan(farg3.d, &env->fp_status))) { + float_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 1); + } + + result.d = fma(farg1.d, farg2.d, farg3.d); /* fused multiply-add function */ + if (fetestexcept(FE_INEXACT)) { + DPRINTF("FE_INEXACT\n"); + float_inexact_excp(env); + } + if (fetestexcept(FE_INVALID)) { + DPRINTF("FE_INVALID\n"); + + /* 0 * infinity */ + if ((fpclassify(farg1.d) == FP_ZERO) && isinf(farg2.d)) { + result.i = float_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ, 1); + } + + /* infinity * 0 */ + else if (isinf(farg1.d) && (fpclassify(farg2.d) == FP_ZERO)) { + result.i = float_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ, 1); + } + + /* infinity - infinity */ + else if (isinf(farg1.d * farg2.d) && isinf(farg3.d) && (signbit(farg3.d) != 0)) { + result.i = float_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, 1); + } + } + if (fetestexcept(FE_OVERFLOW)) { + DPRINTF("FE_OVERFLOW\n"); + float_overflow_excp(env); + } + if (fetestexcept(FE_UNDERFLOW)) { + DPRINTF("FE_UNDERFLOW\n"); + float_underflow_excp(env); + } + + DPRINTF("result.d = %f\n", result.d); + DPRINTF("result.i = 0x%" PRIx64 "\n", result.i); + + return result.i; +} + +#else + +/* fmadd - fmadd. - original */ +uint64_t helper_fmadd(CPUPPCState *env, uint64_t arg1, uint64_t arg2, + uint64_t arg3) +{ + DPRINTF("old helper_fmadd() called\n"); + CPU_DoubleU farg1, farg2, farg3; farg1.ll = arg1; farg2.ll = arg2; farg3.ll = arg3; + DPRINTF("farg1.d = %f\n", farg1.d); + DPRINTF("farg2.d = %f\n", farg2.d); + DPRINTF("farg3.d = %f\n", farg3.d); + if (unlikely((float64_is_infinity(farg1.d) && float64_is_zero(farg2.d)) || (float64_is_zero(farg1.d) && float64_is_infinity(farg2.d)))) { /* Multiplication of zero by infinity */ @@ -775,9 +873,10 @@ uint64_t helper_fmadd(CPUPPCState *env, uint64_t arg1, uint64_t arg2, farg1.d = float128_to_float64(ft0_128, &env->fp_status); } } - + DPRINTF("farg1.ll = 0x%" PRIx64 "\n", farg1.ll); return farg1.ll; } +#endif /* fmsub - fmsub. */ uint64_t helper_fmsub(CPUPPCState *env, uint64_t arg1, uint64_t arg2,