diff mbox

drm/i915: Use uninterruptible mutex_lock for userptr bo creation

Message ID 1431686541-21292-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 15, 2015, 10:42 a.m. UTC
Mika encountered one pathological scenario under X where acquiring all
the mm locks (required to insert a mmu notifier) was very slow, so slow
that by the time we tried to lock the struct_mutex with the usual call
to i915_mutex_lock_interruptible(), X's signal timer had fired causing
us to restart the ioctl (and so looped indefinitely).

While I suspect this is the result of another bug (something leaking mm
perhaps?) we can forgo the error checking and interuptible nature of the
lock here so we only have to pay the expense once and get on with it.
This does expose the userptr creation routine to a driver livelock
though by not being interruptible.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Tvrtko Ursulin May 15, 2015, 11:09 a.m. UTC | #1
Hi,

On 05/15/2015 11:42 AM, Chris Wilson wrote:
> Mika encountered one pathological scenario under X where acquiring all
> the mm locks (required to insert a mmu notifier) was very slow, so slow
> that by the time we tried to lock the struct_mutex with the usual call
> to i915_mutex_lock_interruptible(), X's signal timer had fired causing
> us to restart the ioctl (and so looped indefinitely).

Indefinite loop? Are you saying userptr creation endlessly fails to 
manages to finish in 10ms (or is it even 100ms, forgot what timer Xorg 
setups up)? The __mmu_notifier_register call?

> While I suspect this is the result of another bug (something leaking mm
> perhaps?) we can forgo the error checking and interuptible nature of the
> lock here so we only have to pay the expense once and get on with it.
> This does expose the userptr creation routine to a driver livelock
> though by not being interruptible.

How is this acceptable then if it can live-lock? How does that happen?

Regards,

Tvrtko
Daniel Vetter May 18, 2015, 8:30 a.m. UTC | #2
On Fri, May 15, 2015 at 12:09:00PM +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 05/15/2015 11:42 AM, Chris Wilson wrote:
> >Mika encountered one pathological scenario under X where acquiring all
> >the mm locks (required to insert a mmu notifier) was very slow, so slow
> >that by the time we tried to lock the struct_mutex with the usual call
> >to i915_mutex_lock_interruptible(), X's signal timer had fired causing
> >us to restart the ioctl (and so looped indefinitely).
> 
> Indefinite loop? Are you saying userptr creation endlessly fails to manages
> to finish in 10ms (or is it even 100ms, forgot what timer Xorg setups up)?
> The __mmu_notifier_register call?
> 
> >While I suspect this is the result of another bug (something leaking mm
> >perhaps?) we can forgo the error checking and interuptible nature of the
> >lock here so we only have to pay the expense once and get on with it.
> >This does expose the userptr creation routine to a driver livelock
> >though by not being interruptible.
> 
> How is this acceptable then if it can live-lock? How does that happen?

If the i915 driver somehow dies it's a lot nicer for the user to be able
to hit ^C and get out of trouble again and debug further than make
anything touching i915 be stuck forever.

But I think this is a justified exception. Queued for -next, thanks for
the patch.
-Daniel
Chris Wilson May 18, 2015, 1:18 p.m. UTC | #3
On Mon, May 18, 2015 at 10:30:06AM +0200, Daniel Vetter wrote:
> On Fri, May 15, 2015 at 12:09:00PM +0100, Tvrtko Ursulin wrote:
> > 
> > Hi,
> > 
> > On 05/15/2015 11:42 AM, Chris Wilson wrote:
> > >Mika encountered one pathological scenario under X where acquiring all
> > >the mm locks (required to insert a mmu notifier) was very slow, so slow
> > >that by the time we tried to lock the struct_mutex with the usual call
> > >to i915_mutex_lock_interruptible(), X's signal timer had fired causing
> > >us to restart the ioctl (and so looped indefinitely).
> > 
> > Indefinite loop? Are you saying userptr creation endlessly fails to manages
> > to finish in 10ms (or is it even 100ms, forgot what timer Xorg setups up)?
> > The __mmu_notifier_register call?

Yes. In this scenario it is taking longer than 100ms to take all the mm locks.
I presume it is simply due to there being a vast number of mm on Mika's machine.

> > >While I suspect this is the result of another bug (something leaking mm
> > >perhaps?) we can forgo the error checking and interuptible nature of the
> > >lock here so we only have to pay the expense once and get on with it.
> > >This does expose the userptr creation routine to a driver livelock
> > >though by not being interruptible.
> > 
> > How is this acceptable then if it can live-lock? How does that happen?
> 
> If the i915 driver somehow dies it's a lot nicer for the user to be able
> to hit ^C and get out of trouble again and debug further than make
> anything touching i915 be stuck forever.
> 
> But I think this is a justified exception. Queued for -next, thanks for
> the patch.

Oops, gcc didn't warn about ret being used uninitialized. I guess I am
used to using it as a crutch.
-Chris
Lespiau, Damien May 18, 2015, 5:10 p.m. UTC | #4
On Mon, May 18, 2015 at 10:30:06AM +0200, Daniel Vetter wrote:
> On Fri, May 15, 2015 at 12:09:00PM +0100, Tvrtko Ursulin wrote:
> > 
> > Hi,
> > 
> > On 05/15/2015 11:42 AM, Chris Wilson wrote:
> > >Mika encountered one pathological scenario under X where acquiring all
> > >the mm locks (required to insert a mmu notifier) was very slow, so slow
> > >that by the time we tried to lock the struct_mutex with the usual call
> > >to i915_mutex_lock_interruptible(), X's signal timer had fired causing
> > >us to restart the ioctl (and so looped indefinitely).
> > 
> > Indefinite loop? Are you saying userptr creation endlessly fails to manages
> > to finish in 10ms (or is it even 100ms, forgot what timer Xorg setups up)?
> > The __mmu_notifier_register call?
> > 
> > >While I suspect this is the result of another bug (something leaking mm
> > >perhaps?) we can forgo the error checking and interuptible nature of the
> > >lock here so we only have to pay the expense once and get on with it.
> > >This does expose the userptr creation routine to a driver livelock
> > >though by not being interruptible.
> > 
> > How is this acceptable then if it can live-lock? How does that happen?
> 
> If the i915 driver somehow dies it's a lot nicer for the user to be able
> to hit ^C and get out of trouble again and debug further than make
> anything touching i915 be stuck forever.
> 
> But I think this is a justified exception. Queued for -next, thanks for
> the patch.

This breaks my SKL. Reverting make it work again.

I can still boot in init 3 to see what happens. gem_render_copy (for
instance) will then cause an infinite loop when initializing libdrm: we
do call GEM_USERPTR in the bufmgr init that cycles in drmIoctl, the
ioctl() always returning -EGAIN.
Chris Wilson May 18, 2015, 8:21 p.m. UTC | #5
On Mon, May 18, 2015 at 06:10:07PM +0100, Damien Lespiau wrote:
> On Mon, May 18, 2015 at 10:30:06AM +0200, Daniel Vetter wrote:
> > On Fri, May 15, 2015 at 12:09:00PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > Hi,
> > > 
> > > On 05/15/2015 11:42 AM, Chris Wilson wrote:
> > > >Mika encountered one pathological scenario under X where acquiring all
> > > >the mm locks (required to insert a mmu notifier) was very slow, so slow
> > > >that by the time we tried to lock the struct_mutex with the usual call
> > > >to i915_mutex_lock_interruptible(), X's signal timer had fired causing
> > > >us to restart the ioctl (and so looped indefinitely).
> > > 
> > > Indefinite loop? Are you saying userptr creation endlessly fails to manages
> > > to finish in 10ms (or is it even 100ms, forgot what timer Xorg setups up)?
> > > The __mmu_notifier_register call?
> > > 
> > > >While I suspect this is the result of another bug (something leaking mm
> > > >perhaps?) we can forgo the error checking and interuptible nature of the
> > > >lock here so we only have to pay the expense once and get on with it.
> > > >This does expose the userptr creation routine to a driver livelock
> > > >though by not being interruptible.
> > > 
> > > How is this acceptable then if it can live-lock? How does that happen?
> > 
> > If the i915 driver somehow dies it's a lot nicer for the user to be able
> > to hit ^C and get out of trouble again and debug further than make
> > anything touching i915 be stuck forever.
> > 
> > But I think this is a justified exception. Queued for -next, thanks for
> > the patch.
> 
> This breaks my SKL. Reverting make it work again.
> 
> I can still boot in init 3 to see what happens. gem_render_copy (for
> instance) will then cause an infinite loop when initializing libdrm: we
> do call GEM_USERPTR in the bufmgr init that cycles in drmIoctl, the
> ioctl() always returning -EGAIN.

Too late, Daniel's already pushed the fixed version (missing int ret = 0);
and rewrote history to hide the embarassment.  Note that also recent
libdrm doesn't call has_userptr in its init (that alsp applies to Mika! ;-).
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index bc4d30deb6c3..00f068cf7303 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -221,9 +221,12 @@  i915_mmu_notifier_add(struct drm_device *dev,
 	struct interval_tree_node *it;
 	int ret;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
+	/* By this point we have already done a lot of expensive setup that
+	 * we do not want to repeat just because the caller (e.g. X) has a
+	 * signal pending (and partly because of that expensive setup, X
+	 * using an interrupt timer is likely to get stuck in an EINTR loop).
+	 */
+	mutex_lock(&dev->struct_mutex);
 
 	/* Make sure we drop the final active reference (and thereby
 	 * remove the objects from the interval tree) before we do