diff mbox

[12/18] drm/i915/gtt: Fill scratch page

Message ID 1435246520-14745-13-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala June 25, 2015, 3:35 p.m. UTC
During review of dynamic page tables series, I was able
to hit a lite restore bug with execlists. I assume that
due to incorrect pd, the batch run out of legit address space
and into the scratch page area. The ACTHD was increasing
due to scratch being all zeroes (MI_NOOPs). And as gen8
address space is quite large, the hangcheck happily waited
for a long long time, keeping the process effectively stuck.

According to Chris Wilson any modern gpu will grind to halt
if it encounters commands of all ones. This seemed to do the
trick and hang was declared promptly when the gpu wandered into
the scratch land.

v2: Use 0xffff00ff pattern (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chris Wilson June 25, 2015, 5:51 p.m. UTC | #1
On Thu, Jun 25, 2015 at 06:35:14PM +0300, Mika Kuoppala wrote:
> During review of dynamic page tables series, I was able
> to hit a lite restore bug with execlists. I assume that
> due to incorrect pd, the batch run out of legit address space
> and into the scratch page area. The ACTHD was increasing
> due to scratch being all zeroes (MI_NOOPs). And as gen8
> address space is quite large, the hangcheck happily waited
> for a long long time, keeping the process effectively stuck.
> 
> According to Chris Wilson any modern gpu will grind to halt
> if it encounters commands of all ones. This seemed to do the
> trick and hang was declared promptly when the gpu wandered into
> the scratch land.
> 
> v2: Use 0xffff00ff pattern (Chris)

Thinking about this, could we add a scratch page checker to hangcheck?
Just check the first/last u64 perhaps? Or random offset_in_page?
-Chris
Dave Gordon June 26, 2015, 5:31 p.m. UTC | #2
On 25/06/15 18:51, Chris Wilson wrote:
> On Thu, Jun 25, 2015 at 06:35:14PM +0300, Mika Kuoppala wrote:
>> During review of dynamic page tables series, I was able
>> to hit a lite restore bug with execlists. I assume that
>> due to incorrect pd, the batch run out of legit address space
>> and into the scratch page area. The ACTHD was increasing
>> due to scratch being all zeroes (MI_NOOPs). And as gen8
>> address space is quite large, the hangcheck happily waited
>> for a long long time, keeping the process effectively stuck.
>>
>> According to Chris Wilson any modern gpu will grind to halt
>> if it encounters commands of all ones. This seemed to do the
>> trick and hang was declared promptly when the gpu wandered into
>> the scratch land.
>>
>> v2: Use 0xffff00ff pattern (Chris)
> 
> Thinking about this, could we add a scratch page checker to hangcheck?
> Just check the first/last u64 perhaps? Or random offset_in_page?
> -Chris

I've suggested to Tomas that when running in 32-bit PPGTT mode, if ACTHD
is >4G then it's definitely broken. Doesn't help much once PPGTT space
expands to 48b though.

.Dave.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0cc0cf4..be6521f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2136,6 +2136,8 @@  void i915_global_gtt_cleanup(struct drm_device *dev)
 	vm->cleanup(vm);
 }
 
+#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
+
 static int alloc_scratch_page(struct i915_address_space *vm)
 {
 	struct i915_page_scratch *sp;
@@ -2153,6 +2155,7 @@  static int alloc_scratch_page(struct i915_address_space *vm)
 		return ret;
 	}
 
+	fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
 	set_pages_uc(px_page(sp), 1);
 
 	vm->scratch_page = sp;