drm/i915: Workaround hang with BSD and forcewake on SandyBridge
diff mbox

Message ID 1342341758-1261-1-git-send-email-chris@chris-wilson.co.uk
State New, archived
Headers show

Commit Message

Chris Wilson July 15, 2012, 8:42 a.m. UTC
For reasons that are not apparent to anybody, 990bbdadaba (drm/i915:
Group the GT routines together in both code and vtable) breaks the use
of the BitStream Decoder ring on SandyBridge. The active ingredient of
that patch is the conversion from a udelay(10) to a udelay(1) in the
busy-wait loop of waiting for the forcewake acknowledge. If we restore
that udelay(10) or insert another udelay(1) afterwards (or any wait
longer than 250ns) everything works again. An alternative is also to
remove any delay from the busy-wait loop.

Given that in the atomic sections we want to complete the wait as quick
as possible to avoid blocking the CPU for too long, it makes sense to
remove the delay altogether and simply spin on the exit condition until
it completes. So we replace the udelay(1) with cpu_relax().

Papers over regression from

commit 990bbdadabaa51828e475eda86ee5720a4910cc3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jul 2 11:51:02 2012 -0300

    drm/i915: Group the GT routines together in both code and vtable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_drv.h |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Comments

Daniel Vetter July 15, 2012, 3:05 p.m. UTC | #1
On Sun, Jul 15, 2012 at 09:42:38AM +0100, Chris Wilson wrote:
> For reasons that are not apparent to anybody, 990bbdadaba (drm/i915:
> Group the GT routines together in both code and vtable) breaks the use
> of the BitStream Decoder ring on SandyBridge. The active ingredient of
> that patch is the conversion from a udelay(10) to a udelay(1) in the
> busy-wait loop of waiting for the forcewake acknowledge. If we restore
> that udelay(10) or insert another udelay(1) afterwards (or any wait
> longer than 250ns) everything works again. An alternative is also to
> remove any delay from the busy-wait loop.
> 
> Given that in the atomic sections we want to complete the wait as quick
> as possible to avoid blocking the CPU for too long, it makes sense to
> remove the delay altogether and simply spin on the exit condition until
> it completes. So we replace the udelay(1) with cpu_relax().
> 
> Papers over regression from
> 
> commit 990bbdadabaa51828e475eda86ee5720a4910cc3
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Jul 2 11:51:02 2012 -0300
> 
>     drm/i915: Group the GT routines together in both code and vtable
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51738
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Dragon, iceberg or elephant, that's the question ...

Patch merged to dinq, thanks a lot for wrestling the strange things in
this dungeon.
-Daniel

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index d5a968c..c65134d 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -47,15 +47,16 @@ 
 })
 
 #define wait_for_atomic_us(COND, US) ({ \
-	int i, ret__ = -ETIMEDOUT;	\
-	for (i = 0; i < (US); i++) {	\
-		if ((COND)) {		\
-			ret__ = 0;	\
-			break;		\
-		}			\
-		udelay(1);		\
-	}				\
-	ret__;				\
+	unsigned long timeout__ = jiffies + usecs_to_jiffies(US);	\
+	int ret__ = 0;							\
+	while (!(COND)) {						\
+		if (time_after(jiffies, timeout__)) {			\
+			ret__ = -ETIMEDOUT;				\
+			break;						\
+		}							\
+		cpu_relax();						\
+	}								\
+	ret__;								\
 })
 
 #define wait_for(COND, MS) _wait_for(COND, MS, 1)