diff mbox

[v2,1/4] Replace i2f() in r600_blit.c with an optimized version.

Message ID 1344706222-3018-1-git-send-email-svfuerst@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Steven Fuerst Aug. 11, 2012, 5:30 p.m. UTC
We use __fls() to find the most significant bit.  Using that, the
loop can be avoided.  A second trick is to use the behaviour of the
rotate instructions to expand the range of the unsigned int to float
conversion to the full 32 bits in a branchless way.

The routine is now exact up to 2^24.  Above that, we truncate which
is equivalent to rounding towards zero.

Signed-off-by: Steven Fuerst <svfuerst@gmail.com>
---
 drivers/gpu/drm/radeon/r600_blit.c |   50 ++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 22 deletions(-)

Comments

Michel Dänzer Aug. 14, 2012, 10:33 a.m. UTC | #1
On Sam, 2012-08-11 at 10:30 -0700, Steven Fuerst wrote: 
> We use __fls() to find the most significant bit.  Using that, the
> loop can be avoided.  A second trick is to use the behaviour of the
> rotate instructions to expand the range of the unsigned int to float
> conversion to the full 32 bits in a branchless way.
> 
> The routine is now exact up to 2^24.  Above that, we truncate which
> is equivalent to rounding towards zero.
> 
> Signed-off-by: Steven Fuerst <svfuerst@gmail.com>

It might be better to reorder the series to use a shared int2float first
and then optimize that. Either way though, although I haven't really
looked into the floating point encoding aspects, the series is

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c
index 3c031a4..326a8da 100644
--- a/drivers/gpu/drm/radeon/r600_blit.c
+++ b/drivers/gpu/drm/radeon/r600_blit.c
@@ -489,29 +489,35 @@  set_default_state(drm_radeon_private_t *dev_priv)
 	ADVANCE_RING();
 }
 
-static uint32_t i2f(uint32_t input)
+/* 23 bits of float fractional data */
+#define I2F_FRAC_BITS	23
+#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)
+
+/*
+ * Converts unsigned integer into 32-bit IEEE floating point representation.
+ * Will be exact from 0 to 2^24.  Above that, we round towards zero
+ * as the fractional bits will not fit in a float.  (It would be better to
+ * round towards even as the fpu does, but that is slower.)
+ */
+static uint32_t i2f(uint32_t x)
 {
-	u32 result, i, exponent, fraction;
-
-	if ((input & 0x3fff) == 0)
-		result = 0; /* 0 is a special case */
-	else {
-		exponent = 140; /* exponent biased by 127; */
-		fraction = (input & 0x3fff) << 10; /* cheat and only
-						      handle numbers below 2^^15 */
-		for (i = 0; i < 14; i++) {
-			if (fraction & 0x800000)
-				break;
-			else {
-				fraction = fraction << 1; /* keep
-							     shifting left until top bit = 1 */
-				exponent = exponent - 1;
-			}
-		}
-		result = exponent << 23 | (fraction & 0x7fffff); /* mask
-								    off top bit; assumed 1 */
-	}
-	return result;
+	uint32_t msb, exponent, fraction;
+
+	/* Zero is special */
+	if (!x) return 0;
+
+	/* Get location of the most significant bit */
+	msb = __fls(x);
+
+	/*
+	 * Use a rotate instead of a shift because that works both leftwards
+	 * and rightwards due to the mod(32) behaviour.  This means we don't
+	 * need to check to see if we are above 2^24 or not.
+	 */
+	fraction = ror32(x, (msb - I2F_FRAC_BITS) & 0x1f) & I2F_MASK;
+	exponent = (127 + msb) << I2F_FRAC_BITS;
+
+	return fraction + exponent;
 }