diff mbox

[1/2] drm/i915: kill set_need_resched

Message ID 1379001449-17380-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter Sept. 12, 2013, 3:57 p.m. UTC
This is just a remnant from the old days when our reset handling was
horribly racy, suffered from terribly locking issues and often happily
live-locked. Those days are now gone so we can drop the hacks and just
rip the reschedule-point out.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Comments

Peter Zijlstra Sept. 12, 2013, 3:59 p.m. UTC | #1
On Thu, Sep 12, 2013 at 05:57:28PM +0200, Daniel Vetter wrote:
> This is just a remnant from the old days when our reset handling was
> horribly racy, suffered from terribly locking issues and often happily
> live-locked. Those days are now gone so we can drop the hacks and just
> rip the reschedule-point out.
> 
> Reported-by: Peter Zijlstra <peterz@infradead.org>

Thanks!

Acked-by: Peter Zijlstra <peterz@infradead.org>
Daniel Vetter Sept. 12, 2013, 8:41 p.m. UTC | #2
On Thu, Sep 12, 2013 at 05:59:51PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 12, 2013 at 05:57:28PM +0200, Daniel Vetter wrote:
> > This is just a remnant from the old days when our reset handling was
> > horribly racy, suffered from terribly locking issues and often happily
> > live-locked. Those days are now gone so we can drop the hacks and just
> > rip the reschedule-point out.
> > 
> > Reported-by: Peter Zijlstra <peterz@infradead.org>
> 
> Thanks!
> 
> Acked-by: Peter Zijlstra <peterz@infradead.org>

Merged to drm-intel-fixes with Chris' irc-review that we indeed don't need
this any more to duct-tape over bugs.
-Daniel
Rob Clark Sept. 13, 2013, 12:59 a.m. UTC | #3
hmm, looks like I cargo-cult'd the same into msm.

I guess in i915 (and ttm) case, the issue arises due to need for CPU
access to buffer via GTT?  In which case I should be safe to drop the
set_need_resched() as well? (Since CPU always has direct access to the
pages.)  Or am I missing something about the original issue that
necessitated set_need_resched()?

BR,
-R


On Thu, Sep 12, 2013 at 11:57 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> This is just a remnant from the old days when our reset handling was
> horribly racy, suffered from terribly locking issues and often happily
> live-locked. Those days are now gone so we can drop the hacks and just
> rip the reschedule-point out.
>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d80f33d..3871060 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1389,14 +1389,11 @@ out:
>                 if (i915_terminally_wedged(&dev_priv->gpu_error))
>                         return VM_FAULT_SIGBUS;
>         case -EAGAIN:
> -               /* Give the error handler a chance to run and move the
> -                * objects off the GPU active list. Next time we service the
> -                * fault, we should be able to transition the page into the
> -                * GTT without touching the GPU (and so avoid further
> -                * EIO/EGAIN). If the GPU is wedged, then there is no issue
> -                * with coherency, just lost writes.
> +               /*
> +                * EAGAIN means the gpu is hung and we'll wait for the error
> +                * handler to reset everything when re-faulting in
> +                * i915_mutex_lock_interruptible.
>                  */
> -               set_need_resched();
>         case 0:
>         case -ERESTARTSYS:
>         case -EINTR:
> --
> 1.8.4.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Sept. 13, 2013, 8:46 a.m. UTC | #4
On Fri, Sep 13, 2013 at 2:59 AM, Rob Clark <robdclark@gmail.com> wrote:
> I guess in i915 (and ttm) case, the issue arises due to need for CPU
> access to buffer via GTT?  In which case I should be safe to drop the
> set_need_resched() as well? (Since CPU always has direct access to the
> pages.)  Or am I missing something about the original issue that
> necessitated set_need_resched()?

For drm/i915 the _only_ reason we've had  it was to avoid life-locking
with our gpu reset work when the gpu hung. We've fixed that properly
now by using a wait-queue to stall when a gpu reset is pending and
proper locking in the gpu reset handler (plus tons of evil tests to
make sure it doesn't break, there's rather fragile lock-dropping and
tricky ordering involved). So if you don't have i915's broken gpu
reset handling from yonder you don't need our cargo-cult.

ttm's usage with a trylock+yield is a different form of duct-tape to
paper over locking inversions between copy_*_user callsites and the
pagefault handler.

In any case there's no way it actually works properly ;-)

Cheers, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d80f33d..3871060 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1389,14 +1389,11 @@  out:
 		if (i915_terminally_wedged(&dev_priv->gpu_error))
 			return VM_FAULT_SIGBUS;
 	case -EAGAIN:
-		/* Give the error handler a chance to run and move the
-		 * objects off the GPU active list. Next time we service the
-		 * fault, we should be able to transition the page into the
-		 * GTT without touching the GPU (and so avoid further
-		 * EIO/EGAIN). If the GPU is wedged, then there is no issue
-		 * with coherency, just lost writes.
+		/*
+		 * EAGAIN means the gpu is hung and we'll wait for the error
+		 * handler to reset everything when re-faulting in
+		 * i915_mutex_lock_interruptible.
 		 */
-		set_need_resched();
 	case 0:
 	case -ERESTARTSYS:
 	case -EINTR: