From patchwork Wed Aug 1 13:56:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Bartminski, Jakub" X-Patchwork-Id: 10552351 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F14D114E2 for ; Wed, 1 Aug 2018 13:57:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E14412B5C2 for ; Wed, 1 Aug 2018 13:57:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D4CD72AC1E; Wed, 1 Aug 2018 13:57:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5814E2B592 for ; Wed, 1 Aug 2018 13:57:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B75156E462; Wed, 1 Aug 2018 13:57:16 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 03F2C6E462 for ; Wed, 1 Aug 2018 13:57:14 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Aug 2018 06:57:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,432,1526367600"; d="scan'208";a="61213789" Received: from jbartmin-desk.igk.intel.com ([172.28.181.141]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2018 06:57:08 -0700 From: =?utf-8?q?Jakub_Bartmi=C5=84ski?= To: intel-gfx@lists.freedesktop.org Date: Wed, 1 Aug 2018 15:56:11 +0200 Message-Id: <20180801135611.20666-1-jakub.bartminski@intel.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Subject: [Intel-gfx] [RFC] drm/i915: Don't reset on preemptible workloads X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The current behaviour of the hangcheck is that if we detect that a request is not making any forward progress, the driver will attempt the engine reset. If that's not successful, we fall back to a full device reset. This patch would change it so that if hangcheck encounters a low-priority workload, it will attempt to preempt it before declaring a hang. If the preemption is successful, we allow the workload to continue "in background" (until the next hangcheck run, and the next attempt to preempt it). If the context was closed, we're simply skipping the workload's execution. This new behaviour would allow the user to define intentionally large or passive workloads, that would normally be affected by the hangcheck, without having to divide them into smaller work. Suggested-by: Michał Winiarski Signed-off-by: Jakub Bartmiński Cc: Chris Wilson Cc: Joonas Lahtinen Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_hangcheck.c | 29 +++++++++++++++++++ drivers/gpu/drm/i915/intel_lrc.c | 37 +++++++++++++++++++++++-- drivers/gpu/drm/i915/intel_lrc.h | 1 + drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 65 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c index 2fc7a0dd0df9..5ebd3ca74855 100644 --- a/drivers/gpu/drm/i915/intel_hangcheck.c +++ b/drivers/gpu/drm/i915/intel_hangcheck.c @@ -398,6 +398,28 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915, return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg); } +static bool hangcheck_preempt_workload(struct intel_engine_cs *engine) +{ + struct i915_request *active_request; + int workload_priority; + + /* We have already tried preempting, but the hardware did not react */ + if (engine->hangcheck.try_preempt) + return false; + + active_request = i915_gem_find_active_request(engine); + workload_priority = active_request->gem_context->sched.priority; + + if (workload_priority == I915_CONTEXT_MIN_USER_PRIORITY) { + engine->hangcheck.try_preempt = true; + engine->hangcheck.active_request = active_request; + intel_lr_inject_preempt_context(engine); + return true; + } + + return false; +} + /* * This is called when the chip hasn't reported back with completed * batchbuffers in a long time. We keep track per ring seqno progress and @@ -440,6 +462,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work) hangcheck_store_sample(engine, &hc); if (engine->hangcheck.stalled) { + /* + * Try preempting the current workload before + * declaring the engine hung. + */ + if (hangcheck_preempt_workload(engine)) + continue; + hung |= intel_engine_flag(engine); if (hc.action != ENGINE_DEAD) stuck |= intel_engine_flag(engine); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index fad689efb67a..3ec8dcf64000 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -326,15 +326,39 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) { struct i915_request *rq, *rn; struct i915_priolist *uninitialized_var(p); + struct i915_gem_context *active_context = NULL; + bool skip_seqno = false; + u32 new_seqno = 0; int last_prio = I915_PRIORITY_INVALID; lockdep_assert_held(&engine->timeline.lock); + if (engine->hangcheck.try_preempt) { + rq = engine->hangcheck.active_request; + GEM_BUG_ON(!rq); + + active_context = rq->gem_context; + GEM_BUG_ON(!active_context); + + /* + * If the workload is preemptible but its context was closed + * we force the engine to skip its execution instead. + */ + if (i915_gem_context_is_closed(active_context)) + skip_seqno = true; + } + list_for_each_entry_safe_reverse(rq, rn, &engine->timeline.requests, link) { if (i915_request_completed(rq)) - return; + break; + + if (skip_seqno && rq->gem_context == active_context) { + new_seqno = max(new_seqno, + i915_request_global_seqno(rq)); + continue; + } __i915_request_unsubmit(rq); unwind_wa_tail(rq); @@ -348,6 +372,11 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) GEM_BUG_ON(p->priority != rq_prio(rq)); list_add(&rq->sched.link, &p->requests); } + + if (skip_seqno) { + intel_write_status_page(engine, I915_GEM_HWS_INDEX, new_seqno); + engine->timeline.seqno = new_seqno; + } } void @@ -532,7 +561,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq) port_set(port, port_pack(i915_request_get(rq), port_count(port))); } -static void inject_preempt_context(struct intel_engine_cs *engine) +void intel_lr_inject_preempt_context(struct intel_engine_cs *engine) { struct intel_engine_execlists *execlists = &engine->execlists; struct intel_context *ce = @@ -632,7 +661,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) return; if (need_preempt(engine, last, execlists->queue_priority)) { - inject_preempt_context(engine); + intel_lr_inject_preempt_context(engine); return; } @@ -981,6 +1010,8 @@ static void process_csb(struct intel_engine_cs *engine) buf[2*head + 1] == execlists->preempt_complete_status) { GEM_TRACE("%s preempt-idle\n", engine->name); complete_preempt_context(execlists); + /* We tried and succeeded in preempting the engine */ + engine->hangcheck.try_preempt = false; continue; } diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index f5a5502ecf70..164cd9e7ce05 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -101,6 +101,7 @@ struct drm_i915_private; struct i915_gem_context; void intel_lr_context_resume(struct drm_i915_private *dev_priv); +void intel_lr_inject_preempt_context(struct intel_engine_cs *engine); void intel_execlists_set_default_submission(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 57f3787ed6ec..eb38c1bec96b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -124,6 +124,7 @@ struct intel_engine_hangcheck { struct i915_request *active_request; bool stalled:1; bool wedged:1; + bool try_preempt:1; }; struct intel_ring {