diff mbox series

[06/22] drm/i915/execlists: Do not propagate errors to dependent fences

Message ID 20210816135139.10060-7-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series Clean up GuC CI failures, simplify locking, and kernel DOC | expand

Commit Message

Matthew Brost Aug. 16, 2021, 1:51 p.m. UTC
Progagating errors to dependent fences is wrong, don't do it. Selftest
in following patch exposes this bug.

Fixes: 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status to children on unhold")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org>
---
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Daniel Vetter Aug. 17, 2021, 9:21 a.m. UTC | #1
On Mon, Aug 16, 2021 at 06:51:23AM -0700, Matthew Brost wrote:
> Progagating errors to dependent fences is wrong, don't do it. Selftest
> in following patch exposes this bug.

Please explain what "this bug" is, it's hard to read minds, especially at
a distance in spacetime :-)

> Fixes: 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status to children on unhold")

I think it would be better to outright revert this, instead of just
disabling it like this.

Also please cite the dma_fence error propagation revert from Jason:

commit 93a2711cddd5760e2f0f901817d71c93183c3b87
Author: Jason Ekstrand <jason@jlekstrand.net>
Date:   Wed Jul 14 14:34:16 2021 -0500

    Revert "drm/i915: Propagate errors on awaiting already signaled fences"

Maybe in full, if you need the justification.

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: <stable@vger.kernel.org>

Unless "this bug" is some real world impact thing I wouldn't put cc:
stable on this.
-Daniel
> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de5f9c86b9a4..cafb0608ffb4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
>  			if (p->flags & I915_DEPENDENCY_WEAK)
>  				continue;
>  
> -			/* Propagate any change in error status */
> -			if (rq->fence.error)
> -				i915_request_set_error_once(w, rq->fence.error);
> -
>  			if (w->engine != rq->engine)
>  				continue;
>  
> -- 
> 2.32.0
>
Matthew Brost Aug. 17, 2021, 3:08 p.m. UTC | #2
On Tue, Aug 17, 2021 at 11:21:27AM +0200, Daniel Vetter wrote:
> On Mon, Aug 16, 2021 at 06:51:23AM -0700, Matthew Brost wrote:
> > Progagating errors to dependent fences is wrong, don't do it. Selftest
> > in following patch exposes this bug.
> 
> Please explain what "this bug" is, it's hard to read minds, especially at
> a distance in spacetime :-)
> 

Not a very good explaination.

> > Fixes: 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status to children on unhold")
> 
> I think it would be better to outright revert this, instead of just
> disabling it like this.
>

I tried revert and git did some really odd things that I couldn't
resolve, hence the new patch.
 
> Also please cite the dma_fence error propagation revert from Jason:
> 
> commit 93a2711cddd5760e2f0f901817d71c93183c3b87
> Author: Jason Ekstrand <jason@jlekstrand.net>
> Date:   Wed Jul 14 14:34:16 2021 -0500
> 
>     Revert "drm/i915: Propagate errors on awaiting already signaled fences"
> 
> Maybe in full, if you need the justification.
>

Will site.

> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: <stable@vger.kernel.org>
> 
> Unless "this bug" is some real world impact thing I wouldn't put cc:
> stable on this.

Got it.

Matt

> -Daniel
> > ---
> >  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index de5f9c86b9a4..cafb0608ffb4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
> >  			if (p->flags & I915_DEPENDENCY_WEAK)
> >  				continue;
> >  
> > -			/* Propagate any change in error status */
> > -			if (rq->fence.error)
> > -				i915_request_set_error_once(w, rq->fence.error);
> > -
> >  			if (w->engine != rq->engine)
> >  				continue;
> >  
> > -- 
> > 2.32.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Aug. 17, 2021, 3:49 p.m. UTC | #3
On Tue, Aug 17, 2021 at 5:13 PM Matthew Brost <matthew.brost@intel.com> wrote:
> On Tue, Aug 17, 2021 at 11:21:27AM +0200, Daniel Vetter wrote:
> > On Mon, Aug 16, 2021 at 06:51:23AM -0700, Matthew Brost wrote:
> > > Progagating errors to dependent fences is wrong, don't do it. Selftest
> > > in following patch exposes this bug.
> >
> > Please explain what "this bug" is, it's hard to read minds, especially at
> > a distance in spacetime :-)
> >
>
> Not a very good explaination.
>
> > > Fixes: 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status to children on unhold")
> >
> > I think it would be better to outright revert this, instead of just
> > disabling it like this.
> >
>
> I tried revert and git did some really odd things that I couldn't
> resolve, hence the new patch.

If there's any conflict git just gives you your current code, and what
was there with the revert applied, with the block markers. Then it's
your job to manually apply that change.

Occasionally (when there's been ridiculous amounts of code movement)
it gets completely lost and puts these into very non-intuitive places.
In that case just delete it, keep the current code, and check what
change you're missing that needs to be manually reverted still. Also
sometimes there's a follow-up patch that you should revert first,
which makes the revert clean. In that case it's generally the right
thing to revert the follow-up first, and then apply your revert. Often
there's subtle functional dependencies hiding.
-Daniel

>
> > Also please cite the dma_fence error propagation revert from Jason:
> >
> > commit 93a2711cddd5760e2f0f901817d71c93183c3b87
> > Author: Jason Ekstrand <jason@jlekstrand.net>
> > Date:   Wed Jul 14 14:34:16 2021 -0500
> >
> >     Revert "drm/i915: Propagate errors on awaiting already signaled fences"
> >
> > Maybe in full, if you need the justification.
> >
>
> Will site.
>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Cc: <stable@vger.kernel.org>
> >
> > Unless "this bug" is some real world impact thing I wouldn't put cc:
> > stable on this.
>
> Got it.
>
> Matt
>
> > -Daniel
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ----
> > >  1 file changed, 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > index de5f9c86b9a4..cafb0608ffb4 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
> > >                     if (p->flags & I915_DEPENDENCY_WEAK)
> > >                             continue;
> > >
> > > -                   /* Propagate any change in error status */
> > > -                   if (rq->fence.error)
> > > -                           i915_request_set_error_once(w, rq->fence.error);
> > > -
> > >                     if (w->engine != rq->engine)
> > >                             continue;
> > >
> > > --
> > > 2.32.0
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de5f9c86b9a4..cafb0608ffb4 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2140,10 +2140,6 @@  static void __execlists_unhold(struct i915_request *rq)
 			if (p->flags & I915_DEPENDENCY_WEAK)
 				continue;
 
-			/* Propagate any change in error status */
-			if (rq->fence.error)
-				i915_request_set_error_once(w, rq->fence.error);
-
 			if (w->engine != rq->engine)
 				continue;