From patchwork Wed Jul 4 20:54:13 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 1157331 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by patchwork1.kernel.org (Postfix) with ESMTP id 728593FC36 for ; Wed, 4 Jul 2012 20:54:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 66C969E765 for ; Wed, 4 Jul 2012 13:54:35 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-we0-f177.google.com (mail-we0-f177.google.com [74.125.82.177]) by gabe.freedesktop.org (Postfix) with ESMTP id 4CD989E765 for ; Wed, 4 Jul 2012 13:54:22 -0700 (PDT) Received: by werb13 with SMTP id b13so2882455wer.36 for ; Wed, 04 Jul 2012 13:54:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references; bh=2X5A23dz+dBHYPQk3TM/z1RY5YHysLZK98RaQj3quQ0=; b=JhJr9L9EA1eBb6OYQiI/c250oFTcnuE94Nk2Wb8sk3DGbrDgNSMb8nyhVSBRl4MXW7 v2R67vXN3HreuJrFidNIZnb1PXCSY4pqIEEvGqVz0iCIeEfYnWqqOyRg6yWcl1xVDQTs 5W65kC+SxLsWoa3vE40PX65EHrc5+YzmKdrRQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=2X5A23dz+dBHYPQk3TM/z1RY5YHysLZK98RaQj3quQ0=; b=R8zePCUBLZNAu4sV6N3lTPUWfghAFYqm1Iwh5MqpIvyALWUlJprNJxxezMzlHQMPCd Jv1SoZHstmkK6FW6kJudND18j56BA4IiEvyWn2euwG2azN0juAUOOKYPTpo5B5khkJZT t/lRPZM+/I7HAw9XTWWKnwSyd6u003fgdTYtyeXT3L8N/quyGQIN6RSqPZarDMikwyxE TQfGDae8hyjaWtFwJMtO1IK3v8W4cXy12TavWyodkUBwf8jAeSEyLrk1c+HmWkA3D8bZ GlW3TKm9htSoFfyg0iyropNYnRwU8+ooslNq2AIRieOAve+kD3Sit5D9AfenmERmLVjQ WSuQ== Received: by 10.216.228.202 with SMTP id f52mr7126033weq.223.1341435261225; Wed, 04 Jul 2012 13:54:21 -0700 (PDT) Received: from aaron.ffwll.local (178-83-130-250.dynamic.hispeed.ch. [178.83.130.250]) by mx.google.com with ESMTPS id fu8sm38868457wib.5.2012.07.04.13.54.19 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 04 Jul 2012 13:54:20 -0700 (PDT) From: Daniel Vetter To: Intel Graphics Development Date: Wed, 4 Jul 2012 22:54:13 +0200 Message-Id: <1341435253-30529-1-git-send-email-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1341433123-23055-3-git-send-email-daniel.vetter@ffwll.ch> References: <1341433123-23055-3-git-send-email-daniel.vetter@ffwll.ch> X-Gm-Message-State: ALoCoQkDJdYtjH689caVZplwhVlukzD/eYoViF56DNnG77TK/rUiYM2FAVF4xqcxDjwF19j7pNI1 Cc: Daniel Vetter Subject: [Intel-gfx] [PATCH] drm/i915: non-interruptible sleeps can't handle -EAGAIN X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org So don't return -EAGAIN, even in the case of a gpu hang. Remap it to -EIO instead. Note that this isn't really an issue with interruptability, but more that we have quite a few codepaths (mostly around kms stuff) that simply can't handle any errors and hence not even -EAGAIN. Instead of adding proper failure paths so that we could restart these ioctls we've opted for the cheap way out of sleeping non-interruptibly. Which works everywhere but when the gpu dies, which this patch fixes. So essentially interruptible == false means 'wait for the gpu or die trying'.' This patch is a bit ugly because intel_ring_begin is all non-interruptible and hence only returns -EIO. But as the comment in there says, auditing all the callsites would be a pain. To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno and intel_wait_ring_buffer. Also use the opportunity to clarify the different cases in i915_gem_check_wedge a bit with comments. v2: Don't access dev_priv->mm.interruptible from check_wedge - we might not hold dev->struct_mutex, making this racy. Instead pass interruptible in as a parameter. I've noticed this because I've hit a BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been added in commit b4aca0106c466b5a0329318203f65bac2d91b682 Author: Ben Widawsky Date: Wed Apr 25 20:50:12 2012 -0700 drm/i915: extract some common olr+wedge code although that commit is missing any justification for this. I guess it's just copy&paste, because the same commit add the same BUG_ON check to check_olr, where it indeed makes sense. But in check_wedge everything we access is protected by other means, so this is superflous. And because it now gets in the way (we add a new caller in __wait_seqno, which can be called without dev->struct_mutext) let's just remove it. v3: Group all the i915_gem_check_wedge refactoring into this patch, so that this patch here is all about not returning -EAGAIN to callsites that can't handle syscall restarting. v4: Add clarification what interuptible == fales means in our code, requested by Ben Widawsky. v5: Fix EAGAIN mispell noticed by Chris Wilson. Reviewed-by: Ben Widawsky Signed-Off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++-------- drivers/gpu/drm/i915/intel_ringbuffer.c | 6 ++++-- 3 files changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d839e4c..6c3a0bb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1336,6 +1336,8 @@ i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj) void i915_gem_retire_requests(struct drm_device *dev); void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring); +int __must_check i915_gem_check_wedge(struct drm_i915_private *dev_priv, + bool interruptible); void i915_gem_reset(struct drm_device *dev); void i915_gem_clflush_object(struct drm_i915_gem_object *obj); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 6a98c06..af6a510 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1863,11 +1863,10 @@ i915_gem_retire_work_handler(struct work_struct *work) mutex_unlock(&dev->struct_mutex); } -static int -i915_gem_check_wedge(struct drm_i915_private *dev_priv) +int +i915_gem_check_wedge(struct drm_i915_private *dev_priv, + bool interruptible) { - BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); - if (atomic_read(&dev_priv->mm.wedged)) { struct completion *x = &dev_priv->error_completion; bool recovery_complete; @@ -1878,7 +1877,16 @@ i915_gem_check_wedge(struct drm_i915_private *dev_priv) recovery_complete = x->done > 0; spin_unlock_irqrestore(&x->wait.lock, flags); - return recovery_complete ? -EIO : -EAGAIN; + /* Non-interruptible callers can't handle -EAGAIN, hence return + * -EIO unconditionally for these. */ + if (!interruptible) + return -EIO; + + /* Recovery complete, but still wedged means reset failure. */ + if (recovery_complete) + return -EIO; + + return -EAGAIN; } return 0; @@ -1932,6 +1940,7 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno, unsigned long timeout_jiffies; long end; bool wait_forever = true; + int ret; if (i915_seqno_passed(ring->get_seqno(ring), seqno)) return 0; @@ -1963,8 +1972,9 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno, end = wait_event_timeout(ring->irq_queue, EXIT_COND, timeout_jiffies); - if (atomic_read(&dev_priv->mm.wedged)) - end = -EAGAIN; + ret = i915_gem_check_wedge(dev_priv, interruptible); + if (ret) + end = ret; } while (end == 0 && wait_forever); getrawmonotonic(&now); @@ -2004,7 +2014,7 @@ i915_wait_seqno(struct intel_ring_buffer *ring, uint32_t seqno) BUG_ON(seqno == 0); - ret = i915_gem_check_wedge(dev_priv); + ret = i915_gem_check_wedge(dev_priv, dev_priv->mm.interruptible); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index dce4d1a..cd35ad4 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1228,8 +1228,10 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) } msleep(1); - if (atomic_read(&dev_priv->mm.wedged)) - return -EAGAIN; + + ret = i915_gem_check_wedge(dev_priv, dev_priv->mm.interruptible); + if (ret) + return ret; } while (!time_after(jiffies, end)); trace_i915_ring_wait_end(ring); return -EBUSY;