Message ID | 1379001449-17380-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Sep 12, 2013 at 05:57:28PM +0200, Daniel Vetter wrote: > This is just a remnant from the old days when our reset handling was > horribly racy, suffered from terribly locking issues and often happily > live-locked. Those days are now gone so we can drop the hacks and just > rip the reschedule-point out. > > Reported-by: Peter Zijlstra <peterz@infradead.org> Thanks! Acked-by: Peter Zijlstra <peterz@infradead.org>
On Thu, Sep 12, 2013 at 05:59:51PM +0200, Peter Zijlstra wrote: > On Thu, Sep 12, 2013 at 05:57:28PM +0200, Daniel Vetter wrote: > > This is just a remnant from the old days when our reset handling was > > horribly racy, suffered from terribly locking issues and often happily > > live-locked. Those days are now gone so we can drop the hacks and just > > rip the reschedule-point out. > > > > Reported-by: Peter Zijlstra <peterz@infradead.org> > > Thanks! > > Acked-by: Peter Zijlstra <peterz@infradead.org> Merged to drm-intel-fixes with Chris' irc-review that we indeed don't need this any more to duct-tape over bugs. -Daniel
hmm, looks like I cargo-cult'd the same into msm. I guess in i915 (and ttm) case, the issue arises due to need for CPU access to buffer via GTT? In which case I should be safe to drop the set_need_resched() as well? (Since CPU always has direct access to the pages.) Or am I missing something about the original issue that necessitated set_need_resched()? BR, -R On Thu, Sep 12, 2013 at 11:57 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > This is just a remnant from the old days when our reset handling was > horribly racy, suffered from terribly locking issues and often happily > live-locked. Those days are now gone so we can drop the hacks and just > rip the reschedule-point out. > > Reported-by: Peter Zijlstra <peterz@infradead.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > drivers/gpu/drm/i915/i915_gem.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index d80f33d..3871060 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1389,14 +1389,11 @@ out: > if (i915_terminally_wedged(&dev_priv->gpu_error)) > return VM_FAULT_SIGBUS; > case -EAGAIN: > - /* Give the error handler a chance to run and move the > - * objects off the GPU active list. Next time we service the > - * fault, we should be able to transition the page into the > - * GTT without touching the GPU (and so avoid further > - * EIO/EGAIN). If the GPU is wedged, then there is no issue > - * with coherency, just lost writes. > + /* > + * EAGAIN means the gpu is hung and we'll wait for the error > + * handler to reset everything when re-faulting in > + * i915_mutex_lock_interruptible. > */ > - set_need_resched(); > case 0: > case -ERESTARTSYS: > case -EINTR: > -- > 1.8.4.rc3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Sep 13, 2013 at 2:59 AM, Rob Clark <robdclark@gmail.com> wrote: > I guess in i915 (and ttm) case, the issue arises due to need for CPU > access to buffer via GTT? In which case I should be safe to drop the > set_need_resched() as well? (Since CPU always has direct access to the > pages.) Or am I missing something about the original issue that > necessitated set_need_resched()? For drm/i915 the _only_ reason we've had it was to avoid life-locking with our gpu reset work when the gpu hung. We've fixed that properly now by using a wait-queue to stall when a gpu reset is pending and proper locking in the gpu reset handler (plus tons of evil tests to make sure it doesn't break, there's rather fragile lock-dropping and tricky ordering involved). So if you don't have i915's broken gpu reset handling from yonder you don't need our cargo-cult. ttm's usage with a trylock+yield is a different form of duct-tape to paper over locking inversions between copy_*_user callsites and the pagefault handler. In any case there's no way it actually works properly ;-) Cheers, Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d80f33d..3871060 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1389,14 +1389,11 @@ out: if (i915_terminally_wedged(&dev_priv->gpu_error)) return VM_FAULT_SIGBUS; case -EAGAIN: - /* Give the error handler a chance to run and move the - * objects off the GPU active list. Next time we service the - * fault, we should be able to transition the page into the - * GTT without touching the GPU (and so avoid further - * EIO/EGAIN). If the GPU is wedged, then there is no issue - * with coherency, just lost writes. + /* + * EAGAIN means the gpu is hung and we'll wait for the error + * handler to reset everything when re-faulting in + * i915_mutex_lock_interruptible. */ - set_need_resched(); case 0: case -ERESTARTSYS: case -EINTR:
This is just a remnant from the old days when our reset handling was horribly racy, suffered from terribly locking issues and often happily live-locked. Those days are now gone so we can drop the hacks and just rip the reschedule-point out. Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/i915_gem.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)