Message ID | 20190212102354.22284-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Recursive i915_reset_trylock() verboten | expand |
Chris Wilson <chris@chris-wilson.co.uk> writes: > We cannot nest i915_reset_trylock() as the inner may wait for the > I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is > waiting for our outermost lock. As we take the reset srcu around the > fence update, we have to defer taking it in i915_gem_fault() until after > we acquire the pin on the fence to avoid nesting. This is a little ugly, > but still works. If a reset occurs between i915_vma_pin_fence() and the > second reset lock, the reset will restore the fence register back to the > pinned value before the reset lock allows us to proceed (our mmap won't > be revoked as we haven't yet marked it as being a userfault as that > requires us to hold the reset lock), so the pagefault is still > serialised with the revocation in reset. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605 > Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index c8c355bec091..ae1467a74a08 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1923,16 +1923,16 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > if (ret) > goto err_unpin; > > + ret = i915_vma_pin_fence(vma); > + if (ret) > + goto err_unpin; > + As this is obviusness slipped past us, would it be worthwhile, in retrospect, to build a debug in i915_reset_trylock to be vocal about it failing to make progress? Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > srcu = i915_reset_trylock(dev_priv); > if (srcu < 0) { > ret = srcu; > - goto err_unpin; > + goto err_fence; > } > > - ret = i915_vma_pin_fence(vma); > - if (ret) > - goto err_reset; > - > /* Finally, remap it using the new GTT offset */ > ret = remap_io_mapping(area, > area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT), > @@ -1940,7 +1940,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > min_t(u64, vma->size, area->vm_end - area->vm_start), > &ggtt->iomap); > if (ret) > - goto err_fence; > + goto err_reset; > > /* Mark as being mmapped into userspace for later revocation */ > assert_rpm_wakelock_held(dev_priv); > @@ -1950,10 +1950,10 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > > i915_vma_set_ggtt_write(vma); > > -err_fence: > - i915_vma_unpin_fence(vma); > err_reset: > i915_reset_unlock(dev_priv, srcu); > +err_fence: > + i915_vma_unpin_fence(vma); > err_unpin: > __i915_vma_unpin(vma); > err_unlock: > -- > 2.20.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Mika Kuoppala (2019-02-12 11:12:05) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > We cannot nest i915_reset_trylock() as the inner may wait for the > > I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is > > waiting for our outermost lock. As we take the reset srcu around the > > fence update, we have to defer taking it in i915_gem_fault() until after > > we acquire the pin on the fence to avoid nesting. This is a little ugly, > > but still works. If a reset occurs between i915_vma_pin_fence() and the > > second reset lock, the reset will restore the fence register back to the > > pinned value before the reset lock allows us to proceed (our mmap won't > > be revoked as we haven't yet marked it as being a userfault as that > > requires us to hold the reset lock), so the pagefault is still > > serialised with the revocation in reset. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605 > > Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset") > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++-------- > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index c8c355bec091..ae1467a74a08 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -1923,16 +1923,16 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > > if (ret) > > goto err_unpin; > > > > + ret = i915_vma_pin_fence(vma); > > + if (ret) > > + goto err_unpin; > > + > > As this is obviusness slipped past us, would it > be worthwhile, in retrospect, to build a debug in > i915_reset_trylock to be vocal about it failing > to make progress? If we stick a timeout in there, we just send that back to userspace. Deadlock resolved just with a sporadic delay. It is interruptible so it's not a complete loss, and more obvious if it stalls? That's my thinking for not sending along the quick conversion to wait_event_interruptible_timeout(). What I think we can do is stick a might_lock() so we get the lockdep splat before the wait? -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c8c355bec091..ae1467a74a08 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1923,16 +1923,16 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) if (ret) goto err_unpin; + ret = i915_vma_pin_fence(vma); + if (ret) + goto err_unpin; + srcu = i915_reset_trylock(dev_priv); if (srcu < 0) { ret = srcu; - goto err_unpin; + goto err_fence; } - ret = i915_vma_pin_fence(vma); - if (ret) - goto err_reset; - /* Finally, remap it using the new GTT offset */ ret = remap_io_mapping(area, area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT), @@ -1940,7 +1940,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) min_t(u64, vma->size, area->vm_end - area->vm_start), &ggtt->iomap); if (ret) - goto err_fence; + goto err_reset; /* Mark as being mmapped into userspace for later revocation */ assert_rpm_wakelock_held(dev_priv); @@ -1950,10 +1950,10 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) i915_vma_set_ggtt_write(vma); -err_fence: - i915_vma_unpin_fence(vma); err_reset: i915_reset_unlock(dev_priv, srcu); +err_fence: + i915_vma_unpin_fence(vma); err_unpin: __i915_vma_unpin(vma); err_unlock:
We cannot nest i915_reset_trylock() as the inner may wait for the I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is waiting for our outermost lock. As we take the reset srcu around the fence update, we have to defer taking it in i915_gem_fault() until after we acquire the pin on the fence to avoid nesting. This is a little ugly, but still works. If a reset occurs between i915_vma_pin_fence() and the second reset lock, the reset will restore the fence register back to the pinned value before the reset lock allows us to proceed (our mmap won't be revoked as we haven't yet marked it as being a userfault as that requires us to hold the reset lock), so the pagefault is still serialised with the revocation in reset. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605 Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)