From patchwork Tue Oct 21 10:45:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dheeraj Jamwal X-Patchwork-Id: 5114491 Return-Path: X-Original-To: patchwork-ltsi-dev@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 3DBA9C11AC for ; Tue, 21 Oct 2014 11:09:34 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 44FF120123 for ; Tue, 21 Oct 2014 11:09:33 +0000 (UTC) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 38C1C2011E for ; Tue, 21 Oct 2014 11:09:32 +0000 (UTC) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 53657B85; Tue, 21 Oct 2014 11:00:14 +0000 (UTC) X-Original-To: ltsi-dev@lists.linuxfoundation.org Delivered-To: ltsi-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 5F72CB85 for ; Tue, 21 Oct 2014 11:00:13 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by smtp1.linuxfoundation.org (Postfix) with ESMTP id B2B061FA97 for ; Tue, 21 Oct 2014 11:00:12 +0000 (UTC) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 21 Oct 2014 04:00:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,761,1406617200"; d="scan'208";a="617792408" Received: from ubuntu-desktop.png.intel.com ([10.221.122.25]) by fmsmga002.fm.intel.com with ESMTP; 21 Oct 2014 04:00:08 -0700 From: Dheeraj Jamwal To: ltsi-dev@lists.linuxfoundation.org Date: Tue, 21 Oct 2014 18:45:49 +0800 Message-Id: <1413889294-31328-150-git-send-email-dheerajx.s.jamwal@intel.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1413889294-31328-1-git-send-email-dheerajx.s.jamwal@intel.com> References: <1413889294-31328-1-git-send-email-dheerajx.s.jamwal@intel.com> X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org Subject: [LTSI-dev] [PATCH 0149/1094] drm/i915: Use hangcheck score to find guilty context X-BeenThere: ltsi-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "A list to discuss patches, development, and other things related to the LTSI project" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ltsi-dev-bounces@lists.linuxfoundation.org Errors-To: ltsi-dev-bounces@lists.linuxfoundation.org X-Virus-Scanned: ClamAV using ClamSMTP From: Mika Kuoppala With full ppgtt using acthd is not enough to find guilty batch buffer. We get multiple false positives as acthd is per vm. Instead of scanning which vm was running on a ring, to find corressponding context, use a different, simpler, strategy of finding batches that caused gpu hang: If hangcheck has declared ring to be hung, find first non complete request on that ring and claim it was guilty. v2: Rebase Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Suggested-by: Chris Wilson Signed-off-by: Mika Kuoppala Reviewed-by: Ben Widawsky (v1) Signed-off-by: Daniel Vetter (cherry picked from commit b6b0fac04de9ae9b1559eddf8e9490f3c9a01885) Signed-off-by: Dheeraj Jamwal --- drivers/gpu/drm/i915/i915_gem.c | 42 +++++++++++++++++++++---------- drivers/gpu/drm/i915/i915_irq.c | 3 +-- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++ 3 files changed, 32 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 08331e1..37c2ea4 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2332,9 +2332,10 @@ static bool i915_context_is_banned(struct drm_device *dev, static void i915_set_reset_status(struct intel_ring_buffer *ring, struct drm_i915_gem_request *request, - u32 acthd) + const bool guilty) { - bool inside, guilty; + const u32 acthd = intel_ring_get_active_head(ring); + bool inside; unsigned long offset = 0; struct i915_hw_context *ctx = request->ctx; struct i915_ctx_hang_stats *hs; @@ -2342,14 +2343,11 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, if (WARN_ON(!ctx)) return; - /* Innocent until proven guilty */ - guilty = false; - if (request->batch_obj) offset = i915_gem_obj_offset(request->batch_obj, request_to_vm(request)); - if (ring->hangcheck.action != HANGCHECK_WAIT && + if (guilty && i915_request_guilty(request, acthd, &inside)) { DRM_DEBUG("%s hung %s bo (0x%lx ctx %d) at 0x%x\n", ring->name, @@ -2357,8 +2355,6 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, offset, ctx->id, acthd); - - guilty = true; } WARN_ON(!ctx->last_ring); @@ -2385,19 +2381,39 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request) kfree(request); } -static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv, - struct intel_ring_buffer *ring) +static struct drm_i915_gem_request * +i915_gem_find_first_non_complete(struct intel_ring_buffer *ring) { - u32 completed_seqno = ring->get_seqno(ring, false); - u32 acthd = intel_ring_get_active_head(ring); struct drm_i915_gem_request *request; + const u32 completed_seqno = ring->get_seqno(ring, false); list_for_each_entry(request, &ring->request_list, list) { if (i915_seqno_passed(completed_seqno, request->seqno)) continue; - i915_set_reset_status(ring, request, acthd); + return request; } + + return NULL; +} + +static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv, + struct intel_ring_buffer *ring) +{ + struct drm_i915_gem_request *request; + bool ring_hung; + + request = i915_gem_find_first_non_complete(ring); + + if (request == NULL) + return; + + ring_hung = ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG; + + i915_set_reset_status(ring, request, ring_hung); + + list_for_each_entry_continue(request, &ring->request_list, list) + i915_set_reset_status(ring, request, false); } static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2ac5840..2a17963 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2544,7 +2544,6 @@ static void i915_hangcheck_elapsed(unsigned long data) #define BUSY 1 #define KICK 5 #define HUNG 20 -#define FIRE 30 if (!i915.enable_hangcheck) return; @@ -2628,7 +2627,7 @@ static void i915_hangcheck_elapsed(unsigned long data) } for_each_ring(ring, dev_priv, i) { - if (ring->hangcheck.score > FIRE) { + if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) { DRM_INFO("%s on %s\n", stuck[i] ? "stuck" : "no progress", ring->name); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 0b243ce..08b91c6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -41,6 +41,8 @@ enum intel_ring_hangcheck_action { HANGCHECK_HUNG, }; +#define HANGCHECK_SCORE_RING_HUNG 31 + struct intel_ring_hangcheck { bool deadlock; u32 seqno;