From patchwork Tue Sep 4 21:57:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10588047 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24BA2920 for ; Tue, 4 Sep 2018 21:58:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1834729CBA for ; Tue, 4 Sep 2018 21:58:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C9BA29D78; Tue, 4 Sep 2018 21:58:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A0BD429D75 for ; Tue, 4 Sep 2018 21:58:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6EF366E289; Tue, 4 Sep 2018 21:58:50 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E74656E282 for ; Tue, 4 Sep 2018 21:58:47 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 13555791-1500050 for multiple; Tue, 04 Sep 2018 22:57:54 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 4 Sep 2018 22:57:42 +0100 Message-Id: <20180904215742.20276-23-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.19.0.rc1 In-Reply-To: <20180904215742.20276-1-chris@chris-wilson.co.uk> References: <20180904215742.20276-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 23/23] drm/i915: Serialise concurrent calls to i915_gem_set_wedged() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Make i915_gem_set_wedged() and i915_gem_unset_wedged() behaviour more consistently if called concurrently. Signed-off-by: Chris Wilson Cc: Mika Kuoppala --- drivers/gpu/drm/i915/i915_gem.c | 32 ++++++++++++++----- drivers/gpu/drm/i915/i915_gpu_error.h | 4 ++- .../gpu/drm/i915/selftests/mock_gem_device.c | 1 + 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 05a4cd15a81e..e1e8fae123e8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3323,10 +3323,15 @@ static void nop_complete_submit_request(struct i915_request *request) void i915_gem_set_wedged(struct drm_i915_private *i915) { + struct i915_gpu_error *error = &i915->gpu_error; struct intel_engine_cs *engine; enum intel_engine_id id; - GEM_TRACE("start\n"); + mutex_lock(&error->wedge_mutex); + if (test_bit(I915_WEDGED, &error->flags)) { + mutex_unlock(&error->wedge_mutex); + return; + } if (GEM_SHOW_DEBUG()) { struct drm_printer p = drm_debug_printer(__func__); @@ -3335,8 +3340,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) intel_engine_dump(engine, &p, "%s\n", engine->name); } - if (test_and_set_bit(I915_WEDGED, &i915->gpu_error.flags)) - goto out; + GEM_TRACE("start\n"); /* * First, stop submission to hw, but do not yet complete requests by @@ -3396,20 +3400,28 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) i915_gem_reset_finish_engine(engine); } -out: + smp_mb__before_atomic(); + set_bit(I915_WEDGED, &error->flags); + GEM_TRACE("end\n"); + mutex_unlock(&error->wedge_mutex); - wake_up_all(&i915->gpu_error.reset_queue); + wake_up_all(&error->reset_queue); } bool i915_gem_unset_wedged(struct drm_i915_private *i915) { + struct i915_gpu_error *error = &i915->gpu_error; struct i915_timeline *tl; + bool ret = false; lockdep_assert_held(&i915->drm.struct_mutex); - if (!test_bit(I915_WEDGED, &i915->gpu_error.flags)) + + if (!test_bit(I915_WEDGED, &error->flags)) return true; + mutex_lock(&error->wedge_mutex); + GEM_TRACE("start\n"); /* @@ -3443,7 +3455,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) */ if (dma_fence_default_wait(&rq->fence, true, MAX_SCHEDULE_TIMEOUT) < 0) - return false; + goto unlock; } i915_retire_requests(i915); GEM_BUG_ON(i915->gt.active_requests); @@ -3464,8 +3476,11 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) smp_mb__before_atomic(); /* complete takeover before enabling execbuf */ clear_bit(I915_WEDGED, &i915->gpu_error.flags); + ret = true; +unlock: + mutex_unlock(&i915->gpu_error.wedge_mutex); - return true; + return ret; } static void @@ -5792,6 +5807,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv) i915_gem_idle_work_handler); init_waitqueue_head(&dev_priv->gpu_error.wait_queue); init_waitqueue_head(&dev_priv->gpu_error.reset_queue); + mutex_init(&dev_priv->gpu_error.wedge_mutex); atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index 1c1bc0c23468..4f5c5be56002 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -269,8 +269,8 @@ struct i915_gpu_error { #define I915_RESET_BACKOFF 0 #define I915_RESET_HANDOFF 1 #define I915_RESET_MODESET 2 +#define I915_RESET_ENGINE 3 #define I915_WEDGED (BITS_PER_LONG - 1) -#define I915_RESET_ENGINE (I915_WEDGED - I915_NUM_ENGINES) /** Number of times an engine has been reset */ u32 reset_engine_count[I915_NUM_ENGINES]; @@ -281,6 +281,8 @@ struct i915_gpu_error { /** Reason for the current *global* reset */ const char *reason; + struct mutex wedge_mutex; /* serialises wedging/unwedging */ + /** * Waitqueue to signal when a hang is detected. Used to for waiters * to release the struct_mutex for the reset to procede. diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index 0eb283e7fc96..a4cd459c320d 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -188,6 +188,7 @@ struct drm_i915_private *mock_gem_device(void) init_waitqueue_head(&i915->gpu_error.wait_queue); init_waitqueue_head(&i915->gpu_error.reset_queue); + mutex_init(&i915->gpu_error.wedge_mutex); i915->wq = alloc_ordered_workqueue("mock", 0); if (!i915->wq)