From patchwork Thu Jan 30 17:04:43 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 3558231 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 76649C02DC for ; Thu, 30 Jan 2014 17:06:21 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4CBB820114 for ; Thu, 30 Jan 2014 17:06:20 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 0FA60201BC for ; Thu, 30 Jan 2014 17:06:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B1A211F6BF for ; Thu, 30 Jan 2014 09:06:18 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id 88A0611F6B0 for ; Thu, 30 Jan 2014 09:05:30 -0800 (PST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 30 Jan 2014 09:05:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.95,751,1384329600"; d="scan'208";a="473375213" Received: from rosetta.fi.intel.com (HELO rosetta) ([10.237.72.60]) by fmsmga002.fm.intel.com with ESMTP; 30 Jan 2014 09:04:48 -0800 Received: by rosetta (Postfix, from userid 1000) id C7B108009C; Thu, 30 Jan 2014 19:04:47 +0200 (EET) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Thu, 30 Jan 2014 19:04:43 +0200 Message-Id: <1391101484-5478-1-git-send-email-mika.kuoppala@intel.com> X-Mailer: git-send-email 1.7.9.5 Subject: [Intel-gfx] [PATCH 1/2] drm/i915: Use hangcheck score to find guilty context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With full ppgtt using acthd is not enough to find guilty batch buffer. We get multiple false positives as acthd is per vm. Instead of scanning which vm was running on a ring, to find corressponding context, use a different, simpler, strategy of finding batches that caused gpu hang: If hangcheck has declared ring to be hung, find first non complete request on that ring and claim it was guilty. v2: Rebase Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Suggested-by: Chris Wilson Signed-off-by: Mika Kuoppala Reviewed-by: Ben Widawsky (v1) --- drivers/gpu/drm/i915/i915_gem.c | 42 +++++++++++++++++++++---------- drivers/gpu/drm/i915/i915_irq.c | 3 +-- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++ 3 files changed, 32 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 08331e1..37c2ea4 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2332,9 +2332,10 @@ static bool i915_context_is_banned(struct drm_device *dev, static void i915_set_reset_status(struct intel_ring_buffer *ring, struct drm_i915_gem_request *request, - u32 acthd) + const bool guilty) { - bool inside, guilty; + const u32 acthd = intel_ring_get_active_head(ring); + bool inside; unsigned long offset = 0; struct i915_hw_context *ctx = request->ctx; struct i915_ctx_hang_stats *hs; @@ -2342,14 +2343,11 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, if (WARN_ON(!ctx)) return; - /* Innocent until proven guilty */ - guilty = false; - if (request->batch_obj) offset = i915_gem_obj_offset(request->batch_obj, request_to_vm(request)); - if (ring->hangcheck.action != HANGCHECK_WAIT && + if (guilty && i915_request_guilty(request, acthd, &inside)) { DRM_DEBUG("%s hung %s bo (0x%lx ctx %d) at 0x%x\n", ring->name, @@ -2357,8 +2355,6 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, offset, ctx->id, acthd); - - guilty = true; } WARN_ON(!ctx->last_ring); @@ -2385,19 +2381,39 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request) kfree(request); } -static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv, - struct intel_ring_buffer *ring) +static struct drm_i915_gem_request * +i915_gem_find_first_non_complete(struct intel_ring_buffer *ring) { - u32 completed_seqno = ring->get_seqno(ring, false); - u32 acthd = intel_ring_get_active_head(ring); struct drm_i915_gem_request *request; + const u32 completed_seqno = ring->get_seqno(ring, false); list_for_each_entry(request, &ring->request_list, list) { if (i915_seqno_passed(completed_seqno, request->seqno)) continue; - i915_set_reset_status(ring, request, acthd); + return request; } + + return NULL; +} + +static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv, + struct intel_ring_buffer *ring) +{ + struct drm_i915_gem_request *request; + bool ring_hung; + + request = i915_gem_find_first_non_complete(ring); + + if (request == NULL) + return; + + ring_hung = ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG; + + i915_set_reset_status(ring, request, ring_hung); + + list_for_each_entry_continue(request, &ring->request_list, list) + i915_set_reset_status(ring, request, false); } static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index b226ae6..ec9eec4 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2532,7 +2532,6 @@ static void i915_hangcheck_elapsed(unsigned long data) #define BUSY 1 #define KICK 5 #define HUNG 20 -#define FIRE 30 if (!i915.enable_hangcheck) return; @@ -2616,7 +2615,7 @@ static void i915_hangcheck_elapsed(unsigned long data) } for_each_ring(ring, dev_priv, i) { - if (ring->hangcheck.score > FIRE) { + if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) { DRM_INFO("%s on %s\n", stuck[i] ? "stuck" : "no progress", ring->name); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 71a73f4..38c757e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -41,6 +41,8 @@ enum intel_ring_hangcheck_action { HANGCHECK_HUNG, }; +#define HANGCHECK_SCORE_RING_HUNG 31 + struct intel_ring_hangcheck { bool deadlock; u32 seqno;