Message ID | 1344706222-3018-1-git-send-email-svfuerst@gmail.com (mailing list archive) |
---|---|

State | New, archived |

Headers | show |

On Sam, 2012-08-11 at 10:30 -0700, Steven Fuerst wrote: > We use __fls() to find the most significant bit. Using that, the > loop can be avoided. A second trick is to use the behaviour of the > rotate instructions to expand the range of the unsigned int to float > conversion to the full 32 bits in a branchless way. > > The routine is now exact up to 2^24. Above that, we truncate which > is equivalent to rounding towards zero. > > Signed-off-by: Steven Fuerst <svfuerst@gmail.com> It might be better to reorder the series to use a shared int2float first and then optimize that. Either way though, although I haven't really looked into the floating point encoding aspects, the series is Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c index 3c031a4..326a8da 100644 --- a/drivers/gpu/drm/radeon/r600_blit.c +++ b/drivers/gpu/drm/radeon/r600_blit.c @@ -489,29 +489,35 @@ set_default_state(drm_radeon_private_t *dev_priv) ADVANCE_RING(); } -static uint32_t i2f(uint32_t input) +/* 23 bits of float fractional data */ +#define I2F_FRAC_BITS 23 +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1) + +/* + * Converts unsigned integer into 32-bit IEEE floating point representation. + * Will be exact from 0 to 2^24. Above that, we round towards zero + * as the fractional bits will not fit in a float. (It would be better to + * round towards even as the fpu does, but that is slower.) + */ +static uint32_t i2f(uint32_t x) { - u32 result, i, exponent, fraction; - - if ((input & 0x3fff) == 0) - result = 0; /* 0 is a special case */ - else { - exponent = 140; /* exponent biased by 127; */ - fraction = (input & 0x3fff) << 10; /* cheat and only - handle numbers below 2^^15 */ - for (i = 0; i < 14; i++) { - if (fraction & 0x800000) - break; - else { - fraction = fraction << 1; /* keep - shifting left until top bit = 1 */ - exponent = exponent - 1; - } - } - result = exponent << 23 | (fraction & 0x7fffff); /* mask - off top bit; assumed 1 */ - } - return result; + uint32_t msb, exponent, fraction; + + /* Zero is special */ + if (!x) return 0; + + /* Get location of the most significant bit */ + msb = __fls(x); + + /* + * Use a rotate instead of a shift because that works both leftwards + * and rightwards due to the mod(32) behaviour. This means we don't + * need to check to see if we are above 2^24 or not. + */ + fraction = ror32(x, (msb - I2F_FRAC_BITS) & 0x1f) & I2F_MASK; + exponent = (127 + msb) << I2F_FRAC_BITS; + + return fraction + exponent; }

`We use __fls() to find the most significant bit. Using that, the loop can be avoided. A second trick is to use the behaviour of the rotate instructions to expand the range of the unsigned int to float conversion to the full 32 bits in a branchless way. The routine is now exact up to 2^24. Above that, we truncate which is equivalent to rounding towards zero. Signed-off-by: Steven Fuerst <svfuerst@gmail.com> --- drivers/gpu/drm/radeon/r600_blit.c | 50 ++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 22 deletions(-)`