[03/72] qemu/host-utils: Add wrappers for carry builtins

Message ID	20210508014802.892561-4-richard.henderson@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=6eJd=KD=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA5A5611CA From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins Date: Fri, 7 May 2021 18:46:53 -0700 Message-Id: <20210508014802.892561-4-richard.henderson@linaro.org> In-Reply-To: <20210508014802.892561-1-richard.henderson@linaro.org> References: <20210508014802.892561-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::534; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x534.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: alex.bennee@linaro.org, david@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	Convert floatx80 and float128 to FloatParts \| expand [00/72] Convert floatx80 and float128 to FloatParts [01/72] qemu/host-utils: Use __builtin_bitreverseN [02/72] qemu/host-utils: Add wrappers for overflow builtins [03/72] qemu/host-utils: Add wrappers for carry builtins [04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c [05/72] tests/fp: add quad support to the benchmark utility [06/72] softfloat: Move the binary point to the msb [07/72] softfloat: Inline float_raise [08/72] softfloat: Use float_raise in more places [09/72] softfloat: Tidy a * b + inf return [10/72] softfloat: Add float_cmask and constants [11/72] softfloat: Use return_nan in float_to_float [12/72] softfloat: fix return_nan vs default_nan_mode [13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one [14/72] softfloat: Do not produce a default_nan from parts_silence_nan [15/72] softfloat: Rename FloatParts to FloatParts64 [16/72] softfloat: Move type-specific pack/unpack routines [17/72] softfloat: Use pointers with parts_default_nan [18/72] softfloat: Use pointers with unpack_raw [19/72] softfloat: Use pointers with ftype_unpack_raw [20/72] softfloat: Use pointers with pack_raw [21/72] softfloat: Use pointers with ftype_pack_raw [22/72] softfloat: Use pointers with ftype_unpack_canonical [23/72] softfloat: Use pointers with ftype_round_pack_canonical [24/72] softfloat: Use pointers with parts_silence_nan [25/72] softfloat: Rearrange FloatParts64 [26/72] softfloat: Convert float128_silence_nan to parts [27/72] softfloat: Convert float128_default_nan to parts [28/72] softfloat: Move return_nan to softfloat-parts.c.inc [29/72] softfloat: Move pick_nan to softfloat-parts.c.inc [30/72] softfloat: Move pick_nan_muladd to softfloat-parts.c.inc [31/72] softfloat: Move sf_canonicalize to softfloat-parts.c.inc [32/72] softfloat: Move round_canonical to softfloat-parts.c.inc [33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h [34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc [35/72] softfloat: Implement float128_add/sub via parts [36/72] softfloat: Move mul_floats to softfloat-parts.c.inc [37/72] softfloat: Move muladd_floats to softfloat-parts.c.inc [38/72] softfloat: Use mulu64 for mul64To128 [39/72] softfloat: Use add192 in mul128To256 [40/72] softfloat: Tidy mul128By64To192 [41/72] softfloat: Introduce sh[lr]_double primitives [42/72] softfloat: Move div_floats to softfloat-parts.c.inc [43/72] softfloat: Split float_to_float [44/72] softfloat: Convert float-to-float conversions with float128 [45/72] softfloat: Move round_to_int to softfloat-parts.c.inc [46/72] softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc [47/72] softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc [48/72] softfloat: Move int_to_float to softfloat-parts.c.inc [49/72] softfloat: Move uint_to_float to softfloat-parts.c.inc [50/72] softfloat: Move minmax_flags to softfloat-parts.c.inc [51/72] softfloat: Move compare_floats to softfloat-parts.c.inc [52/72] softfloat: Move scalbn_decomposed to softfloat-parts.c.inc [53/72] softfloat: Move sqrt_float to softfloat-parts.c.inc [54/72] softfloat: Split out parts_uncanon_normal [55/72] softfloat: Reduce FloatFmt [56/72] softfloat: Introduce Floatx80RoundPrec [57/72] softfloat: Adjust parts_uncanon_normal for floatx80 [58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests [59/72] softfloat: Convert floatx80_add/sub to FloatParts [60/72] softfloat: Convert floatx80_mul to FloatParts [61/72] softfloat: Convert floatx80_div to FloatParts [62/72] softfloat: Convert floatx80_sqrt to FloatParts [63/72] softfloat: Convert floatx80_round to FloatParts [64/72] softfloat: Convert floatx80_round_to_int to FloatParts [65/72] softfloat: Convert integer to floatx80 to FloatParts [66/72] softfloat: Convert floatx80 float conversions to FloatParts [67/72] softfloat: Convert floatx80 to integer to FloatParts [68/72] softfloat: Convert floatx80_scalbn to FloatParts [69/72] softfloat: Convert floatx80 compare to FloatParts [70/72] softfloat: Convert float32_exp2 to FloatParts [71/72] softfloat: Move floatN_log2 to softfloat-parts.c.inc [72/72] softfloat: Convert modrem operations to FloatParts

Message ID

20210508014802.892561-4-richard.henderson@linaro.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA5A5611CA
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins
Date: Fri,  7 May 2021 18:46:53 -0700
Message-Id: <20210508014802.892561-4-richard.henderson@linaro.org>
In-Reply-To: <20210508014802.892561-1-richard.henderson@linaro.org>
References: <20210508014802.892561-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::534;
 envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x534.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: alex.bennee@linaro.org, david@redhat.com
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

Convert floatx80 and float128 to FloatParts | expand

Commit Message

Richard Henderson May 8, 2021, 1:46 a.m. UTC

These builtins came in clang 3.8, but are not present in gcc through
version 11.  Even in clang the optimization is not ideal except for
x86_64, but no worse than the hand-coding that we currently do.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/host-utils.h | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Comments

Alex Bennée May 10, 2021, 12:57 p.m. UTC | #1

Richard Henderson <richard.henderson@linaro.org> writes:

> These builtins came in clang 3.8, but are not present in gcc through
> version 11.  Even in clang the optimization is not ideal except for
> x86_64, but no worse than the hand-coding that we currently do.

Given this statement....

<snip>
> +/**
> + * uadd64_carry - addition with carry-in and carry-out
> + * @x, @y: addends
> + * @pcarry: in-out carry value
> + *
> + * Computes @x + @y + *@pcarry, placing the carry-out back
> + * into *@pcarry and returning the 64-bit sum.
> + */
> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
> +{
> +#if __has_builtin(__builtin_addcll)
> +    unsigned long long c = *pcarry;
> +    x = __builtin_addcll(x, y, c, &c);

what happens when unsigned long long isn't the same as uint64_t? Doesn't
C99 only specify a minimum?

> +    *pcarry = c & 1;

Why do we need to clamp it here? Shouldn't the compiler automatically do
that due to the bool?

> +    return x;
> +#else
> +    bool c = *pcarry;
> +    /* This is clang's internal expansion of __builtin_addc. */
> +    c = uadd64_overflow(x, c, &x);
> +    c |= uadd64_overflow(x, y, &x);
> +    *pcarry = c;
> +    return x;
> +#endif

Either way if you aren't super happy with the compilers builtin and you
get equivalent code with the unambigious hand coded version then what is
the point of having a builtin leg?

> +}
> +
> +/**
> + * usub64_borrow - subtraction with borrow-in and borrow-out
> + * @x, @y: addends
> + * @pborrow: in-out borrow value
> + *
> + * Computes @x - @y - *@pborrow, placing the borrow-out back
> + * into *@pborrow and returning the 64-bit sum.
> + */
> +static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
> +{
> +#if __has_builtin(__builtin_subcll)
> +    unsigned long long b = *pborrow;
> +    x = __builtin_subcll(x, y, b, &b);
> +    *pborrow = b & 1;
> +    return x;
> +#else
> +    bool b = *pborrow;
> +    b = usub64_overflow(x, b, &x);
> +    b |= usub64_overflow(x, y, &x);
> +    *pborrow = b;
> +    return x;
> +#endif
> +}
> +
>  /* Host type specific sizes of these routines.  */
>  
>  #if ULONG_MAX == UINT32_MAX

Richard Henderson May 11, 2021, 8:10 p.m. UTC | #2

On 5/10/21 7:57 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> These builtins came in clang 3.8, but are not present in gcc through
>> version 11.  Even in clang the optimization is not ideal except for
>> x86_64, but no worse than the hand-coding that we currently do.
> 
> Given this statement....

I think you mis-read the "except for x86_64" part?

Anyway, these are simply bugs to be filed against clang, so that hopefully 
clang-12 will do a good job with the builtin.  And as I said, while the 
generated code is not ideal, it's no worse.

>> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
>> +{
>> +#if __has_builtin(__builtin_addcll)
>> +    unsigned long long c = *pcarry;
>> +    x = __builtin_addcll(x, y, c, &c);
> 
> what happens when unsigned long long isn't the same as uint64_t? Doesn't
> C99 only specify a minimum?

If you only look at C99, sure.  But looking at the set of supported hosts, 
unsigned long long is always a 64-bit type.

>> +    *pcarry = c & 1;
> 
> Why do we need to clamp it here? Shouldn't the compiler automatically do
> that due to the bool?

This produces a single AND insn, instead of CMP + SETcc.

r~

Alex Bennée May 12, 2021, 11:17 a.m. UTC | #3

Richard Henderson <richard.henderson@linaro.org> writes:

> On 5/10/21 7:57 AM, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> These builtins came in clang 3.8, but are not present in gcc through
>>> version 11.  Even in clang the optimization is not ideal except for
>>> x86_64, but no worse than the hand-coding that we currently do.
>> Given this statement....
>
> I think you mis-read the "except for x86_64" part?
>
> Anyway, these are simply bugs to be filed against clang, so that
> hopefully clang-12 will do a good job with the builtin.  And as I
> said, while the generated code is not ideal, it's no worse.
>
>>> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
>>> +{
>>> +#if __has_builtin(__builtin_addcll)
>>> +    unsigned long long c = *pcarry;
>>> +    x = __builtin_addcll(x, y, c, &c);
>> what happens when unsigned long long isn't the same as uint64_t?
>> Doesn't
>> C99 only specify a minimum?
>
> If you only look at C99, sure.  But looking at the set of supported
> hosts, unsigned long long is always a 64-bit type.

I guess I'm worrying about a theoretical future - but we don't worry
about it for other ll builtins so no biggy.

>
>>> +    *pcarry = c & 1;
>> Why do we need to clamp it here? Shouldn't the compiler
>> automatically do
>> that due to the bool?
>
> This produces a single AND insn, instead of CMP + SETcc.

Might be worth mentioning that in the commit message. 

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index fd76f0cbd3..2ea8b3000b 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -26,6 +26,7 @@ 
 #ifndef HOST_UTILS_H
 #define HOST_UTILS_H
 
+#include "qemu/compiler.h"
 #include "qemu/bswap.h"
 
 #ifdef CONFIG_INT128
@@ -581,6 +582,55 @@  static inline bool umul64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
 #endif
 }
 
+/**
+ * uadd64_carry - addition with carry-in and carry-out
+ * @x, @y: addends
+ * @pcarry: in-out carry value
+ *
+ * Computes @x + @y + *@pcarry, placing the carry-out back
+ * into *@pcarry and returning the 64-bit sum.
+ */
+static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
+{
+#if __has_builtin(__builtin_addcll)
+    unsigned long long c = *pcarry;
+    x = __builtin_addcll(x, y, c, &c);
+    *pcarry = c & 1;
+    return x;
+#else
+    bool c = *pcarry;
+    /* This is clang's internal expansion of __builtin_addc. */
+    c = uadd64_overflow(x, c, &x);
+    c |= uadd64_overflow(x, y, &x);
+    *pcarry = c;
+    return x;
+#endif
+}
+
+/**
+ * usub64_borrow - subtraction with borrow-in and borrow-out
+ * @x, @y: addends
+ * @pborrow: in-out borrow value
+ *
+ * Computes @x - @y - *@pborrow, placing the borrow-out back
+ * into *@pborrow and returning the 64-bit sum.
+ */
+static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
+{
+#if __has_builtin(__builtin_subcll)
+    unsigned long long b = *pborrow;
+    x = __builtin_subcll(x, y, b, &b);
+    *pborrow = b & 1;
+    return x;
+#else
+    bool b = *pborrow;
+    b = usub64_overflow(x, b, &x);
+    b |= usub64_overflow(x, y, &x);
+    *pborrow = b;
+    return x;
+#endif
+}
+
 /* Host type specific sizes of these routines.  */
 
 #if ULONG_MAX == UINT32_MAX

[03/72] qemu/host-utils: Add wrappers for carry builtins

Commit Message

Comments

Patch