From patchwork Wed Nov 16 15:20:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 9431811 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B2D7C60476 for ; Wed, 16 Nov 2016 15:21:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4A0428416 for ; Wed, 16 Nov 2016 15:21:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 967A42899E; Wed, 16 Nov 2016 15:21:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2139228416 for ; Wed, 16 Nov 2016 15:21:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AEF4D89ED6; Wed, 16 Nov 2016 15:21:22 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id A66FD89E75 for ; Wed, 16 Nov 2016 15:21:20 +0000 (UTC) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP; 16 Nov 2016 07:21:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,500,1473145200"; d="scan'208";a="902102241" Received: from rosetta.fi.intel.com ([10.237.72.176]) by orsmga003.jf.intel.com with ESMTP; 16 Nov 2016 07:21:18 -0800 Received: by rosetta.fi.intel.com (Postfix, from userid 1000) id 4C6A8840011; Wed, 16 Nov 2016 17:20:38 +0200 (EET) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Wed, 16 Nov 2016 17:20:31 +0200 Message-Id: <1479309634-28574-3-git-send-email-mika.kuoppala@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1479309634-28574-1-git-send-email-mika.kuoppala@intel.com> References: <1479309634-28574-1-git-send-email-mika.kuoppala@intel.com> Subject: [Intel-gfx] [PATCH 3/6] drm/i915: Use request retirement as context progress X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP As hangcheck score was removed, the active decay of score was removed also. This removed feature for hangcheck to detect if the gpu client was accidentally or maliciously causing intermittent hangs. Reinstate the scoring as a per context property, so that if one context starts to act unfavourably, ban it. v2: ban_period_secs as a gate to score check (Chris) v3: decay in proper spot. scores as tunables (Chris) Cc: Chris Wilson Signed-off-by: Mika Kuoppala Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 5 ++++ drivers/gpu/drm/i915/i915_gem.c | 44 ++++++++++++++++++++++----------- drivers/gpu/drm/i915/i915_gem_request.c | 4 +++ 3 files changed, 39 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4562a39..9f24957 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -914,6 +914,11 @@ struct i915_ctx_hang_stats { /* This context is banned to submit more work */ bool banned; + +#define CONTEXT_SCORE_GUILTY 10 +#define CONTEXT_SCORE_BAN_THRESHOLD 40 + /* Accumulated score of hangs caused by this context */ + int ban_score; }; /* This must match up with the value previously used for execbuf2.rsvd1. */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ae2a219..5948f09 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2620,33 +2620,45 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, static bool i915_context_is_banned(const struct i915_gem_context *ctx) { + const struct i915_ctx_hang_stats *hs = &ctx->hang_stats; unsigned long elapsed; - if (ctx->hang_stats.banned) + if (hs->banned) return true; - elapsed = get_seconds() - ctx->hang_stats.guilty_ts; - if (ctx->hang_stats.ban_period_seconds && - elapsed <= ctx->hang_stats.ban_period_seconds) { + if (!hs->ban_period_seconds) + return false; + + elapsed = get_seconds() - hs->guilty_ts; + if (elapsed <= hs->ban_period_seconds) { DRM_DEBUG("context hanging too fast, banning!\n"); return true; } + if (hs->ban_score >= CONTEXT_SCORE_BAN_THRESHOLD) { + DRM_DEBUG("context hanging too often, banning!\n"); + return true; + } + return false; } -static void i915_set_reset_status(struct i915_gem_context *ctx, - const bool guilty) +static void i915_gem_context_mark_guilty(struct i915_gem_context *ctx) { struct i915_ctx_hang_stats *hs = &ctx->hang_stats; - if (guilty) { - hs->banned = i915_context_is_banned(ctx); - hs->batch_active++; - hs->guilty_ts = get_seconds(); - } else { - hs->batch_pending++; - } + hs->ban_score += CONTEXT_SCORE_GUILTY; + + hs->banned = i915_context_is_banned(ctx); + hs->batch_active++; + hs->guilty_ts = get_seconds(); +} + +static void i915_gem_context_mark_innocent(struct i915_gem_context *ctx) +{ + struct i915_ctx_hang_stats *hs = &ctx->hang_stats; + + hs->batch_pending++; } struct drm_i915_gem_request * @@ -2714,7 +2726,11 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine) ring_hung = false; } - i915_set_reset_status(request->ctx, ring_hung); + if (ring_hung) + i915_gem_context_mark_guilty(request->ctx); + else + i915_gem_context_mark_innocent(request->ctx); + if (!ring_hung) return; diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index b9b5253..b31d18e 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -255,6 +255,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) request->engine); } + /* Retirement decays the ban score as it is a sign of ctx progress */ + if (request->ctx->hang_stats.ban_score > 0) + request->ctx->hang_stats.ban_score--; + i915_gem_context_put(request->ctx); dma_fence_signal(&request->fence);