From patchwork Thu Jan 19 06:49:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13107479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A68C4C38142 for ; Thu, 19 Jan 2023 06:50:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8753F10E8BE; Thu, 19 Jan 2023 06:50:40 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9BDCF10E8B2; Thu, 19 Jan 2023 06:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674111033; x=1705647033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PC1u/s3rUs5S4gs/JOT+D60NG4j/0HY+7M6DY6zg/CE=; b=aqn1P5yztnXTLzU+gT7eSaimFKrNgETnRCJi9dFwCvO5YSo7bNNuh0iT BZ2zuBSg7zwRyccLdQfiKcYxuTLgGTS4SDfpZolItcDl2Hgthhuq5eyXp tdDfiNP+mm4ZQUzYXcaLOa8G9xWt+RPVnX/YcSLF7sbh78iTpLb6YL5zN nNDw2S/hbV6P6BpgAo+E5ijJbul92ZvTf9utBM1o6BWDLyerVkK3regRA aYScpWRdVIgUZ5L7hxEXGIcDVLXigtOgU9yymqcpFW9hmGsyb9tsFwlHh xdun59xaxycGW4EBz4odgaHl7Jo6SbF6vP0DvXZuyVBmyJ8k9215+jbbi Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="323897861" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="323897861" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 22:50:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="723385735" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="723385735" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmsmga008.fm.intel.com with ESMTP; 18 Jan 2023 22:50:20 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v3 1/6] drm/i915: Fix request locking during error capture & debugfs dump Date: Wed, 18 Jan 2023 22:49:55 -0800 Message-Id: <20230119065000.1661857-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119065000.1661857-1-John.C.Harrison@Intel.com> References: <20230119065000.1661857-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Brost , Tvrtko Ursulin , Andy Shevchenko , Michael Cheng , Aravind Iddamsetty , Alan Previn , Umesh Nerlige Ramappa , intel-gfx@lists.freedesktop.org, Lucas De Marchi , Bruce Chang , Daniele Ceraolo Spurio , DRI-Devel@Lists.FreeDesktop.Org, Andrzej Hajda , Rodrigo Vivi , Tejas Upadhyay , John Harrison , Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison When GuC support was added to error capture, the locking around the request object was broken. Fix it up. The context based search manages the spinlocking around the search internally. So it needs to grab the reference count internally as well. The execlist only request based search relies on external locking, so it needs an external reference count but within the spinlock not outside it. The only other caller of the context based search is the code for dumping engine state to debugfs. That code wasn't previously getting an explicit reference at all as it does everything while holding the execlist specific spinlock. So, that needs updaing as well as that spinlock doesn't help when using GuC submission. Rather than trying to conditionally get/put depending on submission model, just change it to always do the get/put. In addition, intel_guc_find_hung_context() was not acquiring the correct spinlock before searching the request list. So fix that up too. While at it, add some extra whitespace padding for readability. v2: Explicitly document adding an extra blank line in some dense code (Andy Shevchenko). Fix multiple potential null pointer derefs in case of no request found (some spotted by Tvrtko, but there was more!). Also fix a leaked request in case of !started and another in __guc_reset_context now that intel_context_find_active_request is actually reference counting the returned request. Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset") Cc: Matthew Brost Cc: John Harrison Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rodrigo Vivi Cc: Tvrtko Ursulin Cc: Daniele Ceraolo Spurio Cc: Andrzej Hajda Cc: Matthew Auld Cc: Matt Roper Cc: Umesh Nerlige Ramappa Cc: Michael Cheng Cc: Lucas De Marchi Cc: Tejas Upadhyay Cc: Andy Shevchenko Cc: Aravind Iddamsetty Cc: Alan Previn Cc: Bruce Chang Cc: intel-gfx@lists.freedesktop.org Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_context.c | 2 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 8 +++++++- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 +++++++++++++ drivers/gpu/drm/i915/i915_gpu_error.c | 12 ++++++------ 4 files changed, 28 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index e94365b08f1ef..e7c5509c48ef1 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -552,6 +552,8 @@ struct i915_request *intel_context_find_active_request(struct intel_context *ce) active = rq; } + if (active) + active = i915_request_get_rcu(active); spin_unlock_irqrestore(&parent->guc_state.lock, flags); return active; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 922f1bb22dc68..6a082658d0082 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -2236,10 +2236,14 @@ static void engine_dump_active_requests(struct intel_engine_cs *engine, struct d guc = intel_uc_uses_guc_submission(&engine->gt->uc); if (guc) { ce = intel_engine_get_hung_context(engine); - if (ce) + if (ce) { + /* This will reference count the request (if found) */ hung_rq = intel_context_find_active_request(ce); + } } else { hung_rq = intel_engine_execlist_find_hung_request(engine); + if (hung_rq) + hung_rq = i915_request_get_rcu(hung_rq); } if (hung_rq) @@ -2250,6 +2254,8 @@ static void engine_dump_active_requests(struct intel_engine_cs *engine, struct d else intel_engine_dump_active_requests(&engine->sched_engine->requests, hung_rq, m); + if (hung_rq) + i915_request_put(hung_rq); } void intel_engine_dump(struct intel_engine_cs *engine, diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b436dd7f12e42..d123cbd90a919 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1702,6 +1702,7 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st goto next_context; guilty = false; + /* NB: This gets a reference to the request */ rq = intel_context_find_active_request(ce); if (!rq) { head = ce->ring->tail; @@ -1715,6 +1716,7 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, guilty); + i915_request_put(rq); out_replay: guc_reset_state(ce, head, guilty); next_context: @@ -4820,6 +4822,8 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) xa_lock_irqsave(&guc->context_lookup, flags); xa_for_each(&guc->context_lookup, index, ce) { + bool found; + if (!kref_get_unless_zero(&ce->ref)) continue; @@ -4836,10 +4840,18 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) goto next; } + found = false; + spin_lock(&ce->guc_state.lock); list_for_each_entry(rq, &ce->guc_state.requests, sched.link) { if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE) continue; + found = true; + break; + } + spin_unlock(&ce->guc_state.lock); + + if (found) { intel_engine_set_hung_context(engine, ce); /* Can only cope with one hang at a time... */ @@ -4847,6 +4859,7 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) xa_lock(&guc->context_lookup); goto done; } + next: intel_context_put(ce); xa_lock(&guc->context_lookup); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 9d5d5a397b64e..7ea36478ee52d 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1607,6 +1607,7 @@ capture_engine(struct intel_engine_cs *engine, ce = intel_engine_get_hung_context(engine); if (ce) { intel_engine_clear_hung_context(engine); + /* This will reference count the request (if found) */ rq = intel_context_find_active_request(ce); if (!rq || !i915_request_started(rq)) goto no_request_capture; @@ -1618,21 +1619,18 @@ capture_engine(struct intel_engine_cs *engine, if (!intel_uc_uses_guc_submission(&engine->gt->uc)) { spin_lock_irqsave(&engine->sched_engine->lock, flags); rq = intel_engine_execlist_find_hung_request(engine); + if (rq) + rq = i915_request_get_rcu(rq); spin_unlock_irqrestore(&engine->sched_engine->lock, flags); } } - if (rq) - rq = i915_request_get_rcu(rq); - if (!rq) goto no_request_capture; capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); - if (!capture) { - i915_request_put(rq); + if (!capture) goto no_request_capture; - } if (dump_flags & CORE_DUMP_FLAG_IS_GUC_CAPTURE) intel_guc_capture_get_matching_node(engine->gt, ee, ce); @@ -1642,6 +1640,8 @@ capture_engine(struct intel_engine_cs *engine, return ee; no_request_capture: + if (rq) + i915_request_put(rq); kfree(ee); return NULL; } From patchwork Thu Jan 19 06:49:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13107478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BE85FC00A5A for ; Thu, 19 Jan 2023 06:50:56 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A28310E8BA; Thu, 19 Jan 2023 06:50:39 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 52AE010E8B2; Thu, 19 Jan 2023 06:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674111033; x=1705647033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nMEoSsoIqXDCaa1iQqwk4Pvogs8QEAdsWyvU0+xnc08=; b=Gmr387dLyTqMdHPuR8vw0TPgF/oUcK3USKur0vRUenVJje3ZSMTTSDkF tawc5sufSr1zQXh0JqqvGOprIxgjGgv4TIxrMKT4JeBRb++S4o9OBGYLZ iVXI6EK5Y1Tlx7WlVTQjMTzkPYThipdY7K2nUxF/GveGcIdHM9MD+rbPw hwm+d4DHr61mDXeQBfNnRVRI0Xl533wpT723+HzazihRgx2Sf4ytlLWC9 YbvMsQMt8oBlF1f5V0sNW6Z8P09eNyR6ZS+xijGOOU960HTYCymbTeVNu a4+HclYGFnMm2fCkCJIU2RJRyBhV1Mnxh8yivgp/6jEZ2+KCOTYOMBsmJ Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="323897862" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="323897862" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 22:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="723385738" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="723385738" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmsmga008.fm.intel.com with ESMTP; 18 Jan 2023 22:50:20 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v3 2/6] drm/i915: Fix up locking around dumping requests lists Date: Wed, 18 Jan 2023 22:49:56 -0800 Message-Id: <20230119065000.1661857-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119065000.1661857-1-John.C.Harrison@Intel.com> References: <20230119065000.1661857-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison The debugfs dump of requests was confused about what state requires the execlist lock versus the GuC lock. There was also a bunch of duplicated messy code between it and the error capture code. So refactor the hung request search into a re-usable function. And reduce the span of the execlist state lock to only the execlist specific code paths. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_context.c | 29 +++++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 51 +++++++++++------------ drivers/gpu/drm/i915/i915_gpu_error.c | 27 ++---------- 4 files changed, 60 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index e7c5509c48ef1..a61f052092ed9 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -559,6 +559,35 @@ struct i915_request *intel_context_find_active_request(struct intel_context *ce) return active; } +void intel_get_hung_entity(struct intel_engine_cs *engine, + struct intel_context **ce, struct i915_request **rq) +{ + unsigned long flags; + + *ce = intel_engine_get_hung_context(engine); + if (*ce) { + intel_engine_clear_hung_context(engine); + + /* This will reference count the request (if found) */ + *rq = intel_context_find_active_request(*ce); + + return; + } + + /* + * Getting here with GuC enabled means it is a forced error capture + * with no actual hang. So, no need to attempt the execlist search. + */ + if (intel_uc_uses_guc_submission(&engine->gt->uc)) + return; + + spin_lock_irqsave(&engine->sched_engine->lock, flags); + *rq = intel_engine_execlist_find_hung_request(engine); + if (*rq) + *rq = i915_request_get_rcu(*rq); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); +} + void intel_context_bind_parent_child(struct intel_context *parent, struct intel_context *child) { diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index fb62b7b8cbcda..ca50f3312a941 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -271,6 +271,9 @@ struct i915_request *intel_context_create_request(struct intel_context *ce); struct i915_request * intel_context_find_active_request(struct intel_context *ce); +void intel_get_hung_entity(struct intel_engine_cs *engine, + struct intel_context **ce, struct i915_request **rq); + static inline bool intel_context_is_barrier(const struct intel_context *ce) { return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 6a082658d0082..5e173dfc8849e 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -2216,11 +2216,27 @@ void intel_engine_dump_active_requests(struct list_head *requests, } } -static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m) +static void execlist_dump_active_requests(struct intel_engine_cs *engine, + struct i915_request *hung_rq, + struct drm_printer *m) { + unsigned long flags; + + spin_lock_irqsave(&engine->sched_engine->lock, flags); + + intel_engine_dump_active_requests(&engine->sched_engine->requests, hung_rq, m); + + drm_printf(m, "\tOn hold?: %lu\n", + list_count(&engine->sched_engine->hold)); + + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); +} + +static void engine_dump_active_requests(struct intel_engine_cs *engine, + struct drm_printer *m) +{ + struct intel_context *hung_ce = NULL; struct i915_request *hung_rq = NULL; - struct intel_context *ce; - bool guc; /* * No need for an engine->irq_seqno_barrier() before the seqno reads. @@ -2229,31 +2245,20 @@ static void engine_dump_active_requests(struct intel_engine_cs *engine, struct d * But the intention here is just to report an instantaneous snapshot * so that's fine. */ - lockdep_assert_held(&engine->sched_engine->lock); + intel_get_hung_entity(engine, &hung_ce, &hung_rq); drm_printf(m, "\tRequests:\n"); - guc = intel_uc_uses_guc_submission(&engine->gt->uc); - if (guc) { - ce = intel_engine_get_hung_context(engine); - if (ce) { - /* This will reference count the request (if found) */ - hung_rq = intel_context_find_active_request(ce); - } - } else { - hung_rq = intel_engine_execlist_find_hung_request(engine); - if (hung_rq) - hung_rq = i915_request_get_rcu(hung_rq); - } - if (hung_rq) engine_dump_request(hung_rq, m, "\t\thung"); + else if (hung_ce) + drm_printf(m, "\t\tGot hung ce but no hung rq!\n"); - if (guc) + if (intel_uc_uses_guc_submission(&engine->gt->uc)) intel_guc_dump_active_requests(engine, hung_rq, m); else - intel_engine_dump_active_requests(&engine->sched_engine->requests, - hung_rq, m); + execlist_dump_active_requests(engine, hung_rq, m); + if (hung_rq) i915_request_put(hung_rq); } @@ -2265,7 +2270,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, struct i915_gpu_error * const error = &engine->i915->gpu_error; struct i915_request *rq; intel_wakeref_t wakeref; - unsigned long flags; ktime_t dummy; if (header) { @@ -2302,13 +2306,8 @@ void intel_engine_dump(struct intel_engine_cs *engine, i915_reset_count(error)); print_properties(engine, m); - spin_lock_irqsave(&engine->sched_engine->lock, flags); engine_dump_active_requests(engine, m); - drm_printf(m, "\tOn hold?: %lu\n", - list_count(&engine->sched_engine->hold)); - spin_unlock_irqrestore(&engine->sched_engine->lock, flags); - drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base); wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm); if (wakeref) { diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 7ea36478ee52d..78cf95e4dd230 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1596,36 +1596,15 @@ capture_engine(struct intel_engine_cs *engine, { struct intel_engine_capture_vma *capture = NULL; struct intel_engine_coredump *ee; - struct intel_context *ce; + struct intel_context *ce = NULL; struct i915_request *rq = NULL; - unsigned long flags; ee = intel_engine_coredump_alloc(engine, ALLOW_FAIL, dump_flags); if (!ee) return NULL; - ce = intel_engine_get_hung_context(engine); - if (ce) { - intel_engine_clear_hung_context(engine); - /* This will reference count the request (if found) */ - rq = intel_context_find_active_request(ce); - if (!rq || !i915_request_started(rq)) - goto no_request_capture; - } else { - /* - * Getting here with GuC enabled means it is a forced error capture - * with no actual hang. So, no need to attempt the execlist search. - */ - if (!intel_uc_uses_guc_submission(&engine->gt->uc)) { - spin_lock_irqsave(&engine->sched_engine->lock, flags); - rq = intel_engine_execlist_find_hung_request(engine); - if (rq) - rq = i915_request_get_rcu(rq); - spin_unlock_irqrestore(&engine->sched_engine->lock, - flags); - } - } - if (!rq) + intel_get_hung_entity(engine, &ce, &rq); + if (!rq || !i915_request_started(rq)) goto no_request_capture; capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); From patchwork Thu Jan 19 06:49:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13107482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C6617C38142 for ; Thu, 19 Jan 2023 06:51:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A938810E8C2; Thu, 19 Jan 2023 06:50:41 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id DAD8710E8B4; Thu, 19 Jan 2023 06:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674111033; x=1705647033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lN1PKyfSRaitUElCbMEHcFcMq/tB6Np8kNukyO4+dTY=; b=l6KykHWbbelKwcjmIjdUDGPi9XqjRLGc/DHeGRxN5v/cOB3CRaeEcYg8 H8JzwnB0aZlpiZlE0mA/PynZ+AcBCmuR1k/nFRDRTVZUkyXJ0mELYEKNj 4B3RkKk8rVI5uSBPpKLLi3uAa+h7xbDbT3QUuDiU3uLy33yYZGSkk8fvj oF8FfZfG8IJ7ZHIHGbdQUz9hwpPNXNZnMYc/q/r4y+11jh8rIS/y5NXxB Jcp5Qb8ZAZrsCqRtetU+Ok3hlrGm5vHJ3FIEo0XsY2B8U6Ofay5pzgkoB iT1Nd/7hElJ+MglPUuf/FDmd/IU3fKTS17OtrZsE/AMjQ8bSMXas1lR31 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="323897863" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="323897863" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 22:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="723385741" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="723385741" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmsmga008.fm.intel.com with ESMTP; 18 Jan 2023 22:50:21 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v3 3/6] drm/i915: Allow error capture without a request Date: Wed, 18 Jan 2023 22:49:57 -0800 Message-Id: <20230119065000.1661857-4-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119065000.1661857-1-John.C.Harrison@Intel.com> References: <20230119065000.1661857-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Umesh Nerlige Ramappa , John Harrison , DRI-Devel@Lists.FreeDesktop.Org, Tvrtko Ursulin Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notification' from GuC. The problem was not reproducible. However, it is possible to happen if the context in question has no active requests. For example, if the hang was in the context switch itself then the breadcrumb write would have occurred and the KMD would see an idle context. In the interests of attempting to provide as much information as possible about a hang, it seems wise to include the engine info regardless of whether a request was found or not. As opposed to just prentending there was no hang at all. So update the error capture code to always record engine information if a context is given. Which means updating record_context() to take a context instead of a request (which it only ever used to find the context anyway). And split the request agnostic parts of intel_engine_coredump_add_request() out into a seaprate function. v2: Remove a duplicate 'if' statement (Umesh) and fix a put of a null pointer. v3: Tidy up request locking code flow (Tvrtko) Signed-off-by: John Harrison Reviewed-by: Umesh Nerlige Ramappa Acked-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gpu_error.c | 70 ++++++++++++++++++--------- 1 file changed, 48 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 78cf95e4dd230..743614fff5472 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1370,14 +1370,14 @@ static void engine_record_execlists(struct intel_engine_coredump *ee) } static bool record_context(struct i915_gem_context_coredump *e, - const struct i915_request *rq) + struct intel_context *ce) { struct i915_gem_context *ctx; struct task_struct *task; bool simulated; rcu_read_lock(); - ctx = rcu_dereference(rq->context->gem_context); + ctx = rcu_dereference(ce->gem_context); if (ctx && !kref_get_unless_zero(&ctx->ref)) ctx = NULL; rcu_read_unlock(); @@ -1396,8 +1396,8 @@ static bool record_context(struct i915_gem_context_coredump *e, e->guilty = atomic_read(&ctx->guilty_count); e->active = atomic_read(&ctx->active_count); - e->total_runtime = intel_context_get_total_runtime_ns(rq->context); - e->avg_runtime = intel_context_get_avg_runtime_ns(rq->context); + e->total_runtime = intel_context_get_total_runtime_ns(ce); + e->avg_runtime = intel_context_get_avg_runtime_ns(ce); simulated = i915_gem_context_no_error_capture(ctx); @@ -1532,15 +1532,37 @@ intel_engine_coredump_alloc(struct intel_engine_cs *engine, gfp_t gfp, u32 dump_ return ee; } +static struct intel_engine_capture_vma * +engine_coredump_add_context(struct intel_engine_coredump *ee, + struct intel_context *ce, + gfp_t gfp) +{ + struct intel_engine_capture_vma *vma = NULL; + + ee->simulated |= record_context(&ee->context, ce); + if (ee->simulated) + return NULL; + + /* + * We need to copy these to an anonymous buffer + * as the simplest method to avoid being overwritten + * by userspace. + */ + vma = capture_vma(vma, ce->ring->vma, "ring", gfp); + vma = capture_vma(vma, ce->state, "HW context", gfp); + + return vma; +} + struct intel_engine_capture_vma * intel_engine_coredump_add_request(struct intel_engine_coredump *ee, struct i915_request *rq, gfp_t gfp) { - struct intel_engine_capture_vma *vma = NULL; + struct intel_engine_capture_vma *vma; - ee->simulated |= record_context(&ee->context, rq); - if (ee->simulated) + vma = engine_coredump_add_context(ee, rq->context, gfp); + if (!vma) return NULL; /* @@ -1550,8 +1572,6 @@ intel_engine_coredump_add_request(struct intel_engine_coredump *ee, */ vma = capture_vma_snapshot(vma, rq->batch_res, gfp, "batch"); vma = capture_user(vma, rq, gfp); - vma = capture_vma(vma, rq->ring->vma, "ring", gfp); - vma = capture_vma(vma, rq->context->state, "HW context", gfp); ee->rq_head = rq->head; ee->rq_post = rq->postfix; @@ -1604,25 +1624,31 @@ capture_engine(struct intel_engine_cs *engine, return NULL; intel_get_hung_entity(engine, &ce, &rq); - if (!rq || !i915_request_started(rq)) - goto no_request_capture; + if (rq && !i915_request_started(rq)) { + drm_info(&engine->gt->i915->drm, "Got hung context on %s with no active request!\n", + engine->name); + i915_request_put(rq); + rq = NULL; + } + + if (rq) { + capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); + i915_request_put(rq); + } else if (ce) { + capture = engine_coredump_add_context(ee, ce, ATOMIC_MAYFAIL); + } - capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); - if (!capture) - goto no_request_capture; if (dump_flags & CORE_DUMP_FLAG_IS_GUC_CAPTURE) intel_guc_capture_get_matching_node(engine->gt, ee, ce); - intel_engine_coredump_add_vma(ee, capture, compress); - i915_request_put(rq); + if (capture) { + intel_engine_coredump_add_vma(ee, capture, compress); + } else { + kfree(ee); + ee = NULL; + } return ee; - -no_request_capture: - if (rq) - i915_request_put(rq); - kfree(ee); - return NULL; } static void From patchwork Thu Jan 19 06:49:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13107481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2E49CC46467 for ; Thu, 19 Jan 2023 06:51:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 54E8810E8BB; Thu, 19 Jan 2023 06:50:40 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7BE6310E8B1; Thu, 19 Jan 2023 06:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674111033; x=1705647033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ApBBVQEaos4a6GRX2pfjG8CAy0S+vGQGh3ubPkQY46g=; b=Ug9MTBVsx028ogDDhSYVOtRQg66cuQvcbbasmR2Cb3SU5DLEFRbh6XRH qjWfl2B3Xn9ZklgU4SyFmDOq4VIsGfF/FpI15hVKraEMLQ6Myr023+hzY 7tBJFyKWp6346Gbpe6jezJDcWcUGJ1qRGt+RYHF6yoMwGYOaly0M7Zmer nXNc4L7pp0jtgrVm2+dXvyxQEXVmkaZvUJP9HjGgbVCoakBsuxdjj5Ndi bEkuR9WzAt7ojCspJKP3qUqGDpVUBevL8GQ4a1VXViQj6M1jv0d1/blnq VsyNYKNn7jItR0uSFqbORwmlbzpKoeMsibvbH+kyQab7HecCAbukK/Jt7 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="323897864" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="323897864" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 22:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="723385745" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="723385745" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmsmga008.fm.intel.com with ESMTP; 18 Jan 2023 22:50:21 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v3 4/6] drm/i915: Allow error capture of a pending request Date: Wed, 18 Jan 2023 22:49:58 -0800 Message-Id: <20230119065000.1661857-5-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119065000.1661857-1-John.C.Harrison@Intel.com> References: <20230119065000.1661857-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org, Tvrtko Ursulin Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison A hang situation has been observed where the only requests on the context were either completed or not yet started according to the breaadcrumbs. However, the register state claimed a batch was (maybe) in progress. So, allow capture of the pending request on the grounds that this might be better than nothing. v2: Reword 'not started' warning message (Tvrtko) Signed-off-by: John Harrison Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gpu_error.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 743614fff5472..0d1e68830b75e 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1624,12 +1624,9 @@ capture_engine(struct intel_engine_cs *engine, return NULL; intel_get_hung_entity(engine, &ce, &rq); - if (rq && !i915_request_started(rq)) { - drm_info(&engine->gt->i915->drm, "Got hung context on %s with no active request!\n", - engine->name); - i915_request_put(rq); - rq = NULL; - } + if (rq && !i915_request_started(rq)) + drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); if (rq) { capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); From patchwork Thu Jan 19 06:49:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13107480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73EBEC00A5A for ; Thu, 19 Jan 2023 06:51:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2102010E8BD; Thu, 19 Jan 2023 06:50:40 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id C17A910E8B1; Thu, 19 Jan 2023 06:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674111033; x=1705647033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IRNpJAplXrAkE+VpJNxNY4TJS6fGcFNcbcR4PeKJmO0=; b=iA+QekpQ4ba7KUDirwXII1lroPCapUK7XjDiI74DYM5Ge6DEiwwD71BG MZtiS2DWH93DnKIjfyEVw8a6n+GTK0kqfiswKkn1I1n2704yctmS3zb1k ExoF5rHzyRePksaFOqSA53bjBC1qVXV1R+pOj0RupT4wvvzfz6WmF/B0Q L64pGNV9DXRYjaUO0sc0v8NVvAYFPMhxXu3xqEiMnp8p4y4GgVwuxPUs2 olxiIkdFPlihDk+3jmTXxxCdObCIyWSYCTKNlNgHGQkqpATxhVUoBP5vG /zPsDMYputJrHaI+qp1T41LOzTiFqVVq4AQYSu/qPW4RyUTv2U8cOEjIa g==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="323897865" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="323897865" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 22:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="723385748" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="723385748" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmsmga008.fm.intel.com with ESMTP; 18 Jan 2023 22:50:21 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v3 5/6] drm/i915/guc: Look for a guilty context when an engine reset fails Date: Wed, 18 Jan 2023 22:49:59 -0800 Message-Id: <20230119065000.1661857-6-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119065000.1661857-1-John.C.Harrison@Intel.com> References: <20230119065000.1661857-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org, Tvrtko Ursulin Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Engine resets are supposed to never fail. But in the case when one does (due to unknown reasons that normally come down to a missing w/a), it is useful to get as much information out of the system as possible. Given that the GuC intentionally dies on such a situation, it is not possible to get a guilty context notification back. So do a manual search instead. Given that GuC is dead, this is safe because GuC won't be changing the engine state asynchronously. v2: Change comment to be less alarming (Tvrtko) Signed-off-by: John Harrison Acked-by: Tvrtko Ursulin Reviewed-by: Daniele Ceraolo Spurio --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d123cbd90a919..7c5ea66218443 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4756,11 +4756,24 @@ static void reset_fail_worker_func(struct work_struct *w) guc->submission_state.reset_fail_mask = 0; spin_unlock_irqrestore(&guc->submission_state.lock, flags); - if (likely(reset_fail_mask)) + if (likely(reset_fail_mask)) { + struct intel_engine_cs *engine; + enum intel_engine_id id; + + /* + * GuC is toast at this point - it dead loops after sending the failed + * reset notification. So need to manually determine the guilty context. + * Note that it should be reliable to do this here because the GuC is + * toast and will not be scheduling behind the KMD's back. + */ + for_each_engine_masked(engine, gt, reset_fail_mask, id) + intel_guc_find_hung_context(engine); + intel_gt_handle_error(gt, reset_fail_mask, I915_ERROR_CAPTURE, - "GuC failed to reset engine mask=0x%x\n", + "GuC failed to reset engine mask=0x%x", reset_fail_mask); + } } int intel_guc_engine_failure_process_msg(struct intel_guc *guc, From patchwork Thu Jan 19 06:50:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13107477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F3DAFC6379F for ; Thu, 19 Jan 2023 06:50:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A897510E8B9; Thu, 19 Jan 2023 06:50:38 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id B65A810E8B3; Thu, 19 Jan 2023 06:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674111033; x=1705647033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xbbJ1YgtFuol2H6ICruayFiqlhvGc5yaiK2Rt4k4F7U=; b=bGISL+UAcog2nreexesGYN4mSFuqknGNZKidqSQce8lxm/LTo4bJqfHs lvluuh9rFi6m4icrrdXcKLl8HzDblbvhbJd9gm8VvIHtc/h9CDe9SGYrd pwiFazBM+lqJDersaI6P7CctSqnXzhMaWtuDT3b/LYiPL/jL8GZxIwm4M M0axeZGYh8K4Gr72981mEWVDKKzDCwjpSy+gxTlI+ImioJkRcY0L7BO+V h9otObmSt9WBrfErF2/kvYc5qAVA1YDyB0cypj6+/HXiUGNs4jSTqO1M9 ZY4v1UEN0dvG8bMqqlywLTALl/7s5lm/CtoyN6uxw6N8jXUyTRSw4qlrO w==; X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="323897866" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="323897866" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 22:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10594"; a="723385752" X-IronPort-AV: E=Sophos;i="5.97,228,1669104000"; d="scan'208";a="723385752" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmsmga008.fm.intel.com with ESMTP; 18 Jan 2023 22:50:21 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH v3 6/6] drm/i915/guc: Add a debug print on GuC triggered reset Date: Wed, 18 Jan 2023 22:50:00 -0800 Message-Id: <20230119065000.1661857-7-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119065000.1661857-1-John.C.Harrison@Intel.com> References: <20230119065000.1661857-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org, Tvrtko Ursulin Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison For understanding bug reports, it can be useful to have an explicit dmesg print when a reset notification is received from GuC. As opposed to simply inferring that this happened from other messages. Signed-off-by: John Harrison Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 7c5ea66218443..30f79d333ae9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4667,6 +4667,10 @@ static void guc_handle_context_reset(struct intel_guc *guc, { trace_intel_context_reset(ce); + drm_dbg(&guc_to_gt(guc)->i915->drm, "Got GuC reset of 0x%04X, exiting = %d, banned = %d\n", + ce->guc_id.id, test_bit(CONTEXT_EXITING, &ce->flags), + test_bit(CONTEXT_BANNED, &ce->flags)); + if (likely(intel_context_is_schedulable(ce))) { capture_error_state(guc, ce); guc_context_replay(ce);