drm/i915: Mark pending batches correctly on reset

Message ID	1477483679-14440-1-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Mika Kuoppala <mika.kuoppala@linux.intel.com> To: intel-gfx@lists.freedesktop.org Date: Wed, 26 Oct 2016 15:07:59 +0300 Message-Id: <1477483679-14440-1-git-send-email-mika.kuoppala@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915: Mark pending batches correctly on reset Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

1477483679-14440-1-git-send-email-mika.kuoppala@intel.com (mailing list archive)

State

New, archived

Headers

From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Wed, 26 Oct 2016 15:07:59 +0300
Message-Id: <1477483679-14440-1-git-send-email-mika.kuoppala@intel.com>
Subject: [Intel-gfx] [PATCH] drm/i915: Mark pending batches correctly on
	reset
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Mika Kuoppala Oct. 26, 2016, 12:07 p.m. UTC

For contexts that get their requests NOPed after a reset,
correctly count them as pending.

Testcase: igt/tests/gem_reset_stats
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Chris Wilson Oct. 26, 2016, 12:15 p.m. UTC | #1

On Wed, Oct 26, 2016 at 03:07:59PM +0300, Mika Kuoppala wrote:
> For contexts that get their requests NOPed after a reset,
> correctly count them as pending.
> 
> Testcase: igt/tests/gem_reset_stats
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

We agreed that this was an incorrect interpretation of the robustness
api, that neither handles tdr nor scales to multiple timlines.

Currently, we only mark as innocent the contexts/batch executing on the
hw on the good rings at the time of the reset.

>  drivers/gpu/drm/i915/i915_gem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 87018df9..e025542 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2615,6 +2615,7 @@ static void reset_request(struct drm_i915_gem_request *request)
>  		head = 0;
>  	}
>  	memset(vaddr + head, 0, request->postfix - head);
> +	i915_set_reset_status(request->ctx, false);

This does not handle all the multiple timelines we have that are not
even submitted yet.
-Chris

Mika Kuoppala Oct. 26, 2016, 12:32 p.m. UTC | #2

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Wed, Oct 26, 2016 at 03:07:59PM +0300, Mika Kuoppala wrote:
>> For contexts that get their requests NOPed after a reset,
>> correctly count them as pending.
>> 
>> Testcase: igt/tests/gem_reset_stats
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>
> We agreed that this was an incorrect interpretation of the robustness
> api, that neither handles tdr nor scales to multiple timlines.
>

I remember agreeing with the active one atleast. Perhaps being
ignorant on the multiple timelines case.

Is the reasoning here that there is no actual benefit of marking
batches pending as it is superflous in replay case. In another
words, the distinction between batch being queued before
submission and after, is a moot from userspace point of view?

> Currently, we only mark as innocent the contexts/batch executing on the
> hw on the good rings at the time of the reset.
>

I am ok with this. The interpretation of 'pending' changes but it
is more meaningful if one thinks pending on hardware.

-Mika

>>  drivers/gpu/drm/i915/i915_gem.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 87018df9..e025542 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2615,6 +2615,7 @@ static void reset_request(struct drm_i915_gem_request *request)
>>  		head = 0;
>>  	}
>>  	memset(vaddr + head, 0, request->postfix - head);
>> +	i915_set_reset_status(request->ctx, false);
>
> This does not handle all the multiple timelines we have that are not
> even submitted yet.
> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

Chris Wilson Oct. 26, 2016, 12:57 p.m. UTC | #3

On Wed, Oct 26, 2016 at 03:32:20PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Wed, Oct 26, 2016 at 03:07:59PM +0300, Mika Kuoppala wrote:
> >> For contexts that get their requests NOPed after a reset,
> >> correctly count them as pending.
> >> 
> >> Testcase: igt/tests/gem_reset_stats
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >
> > We agreed that this was an incorrect interpretation of the robustness
> > api, that neither handles tdr nor scales to multiple timlines.
> >
> 
> I remember agreeing with the active one atleast. Perhaps being
> ignorant on the multiple timelines case.
> 
> Is the reasoning here that there is no actual benefit of marking
> batches pending as it is superflous in replay case. In another
> words, the distinction between batch being queued before
> submission and after, is a moot from userspace point of view?

Yes. And it gives them information that they are not otherwise privy to.
 
> > Currently, we only mark as innocent the contexts/batch executing on the
> > hw on the good rings at the time of the reset.
> >
> 
> I am ok with this. The interpretation of 'pending' changes but it
> is more meaningful if one thinks pending on hardware.

That's my understanding as well. We only mark the affected batches -
either it is guilty and scrapped, or it is innocent and rerun. But we
may still see corruption in the innocent batch (as it may change state
internally but the initial state is not restored upon reset). Everyone
else should not be affected (there is always some dependencies as
corruption may propagate, but we do identify the root).
-Chris

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 87018df9..e025542 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2615,6 +2615,7 @@  static void reset_request(struct drm_i915_gem_request *request)
 		head = 0;
 	}
 	memset(vaddr + head, 0, request->postfix - head);
+	i915_set_reset_status(request->ctx, false);
 }
 
 static void i915_gem_reset_engine(struct intel_engine_cs *engine)

drm/i915: Mark pending batches correctly on reset

Commit Message

Comments

Patch