From patchwork Thu Jul 28 02:42:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12930976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C328C19F28 for ; Thu, 28 Jul 2022 02:43:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 85B2610E625; Thu, 28 Jul 2022 02:43:14 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4BBFC10E070; Thu, 28 Jul 2022 02:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658976147; x=1690512147; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xHVpB6zkDipSCn0X1Zi2jWukcxVUmAzUsjYeZHrSmk0=; b=ZD8U2Ue6c+KZcECXX2sMQ8vnGTBWo+5ji47nvxp1Z03abF0lFDmOni9i CYpIBr0Xr1KsdYaOU4UMdsMLp2wOQgSQW7ysP0s69YtxAbbMW7no/7s7C 2mxdmSJkxb1nRPLP/tBEx8M3DefEbyfZgab5y9iUTpNTqZq3xD542knOX qmm5dfHzd+ubqT71OX3va26IJP8QdihvIs9ObuuH6D7KTlQQXM7VyYx4F 8qBS7WOdu9Xj/s7ZNzhDBQoz9Id3Kxq444drU/yXbP83/8KlKH9OdfrI/ 5PV4Zq3SwjX5MRGq6FNSmguTE6/FpalfgXMS7LCI9mnuJlwoKDcmTlqt4 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10421"; a="271443539" X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="271443539" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 19:42:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="690096039" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by FMSMGA003.fm.intel.com with ESMTP; 27 Jul 2022 19:42:25 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Wed, 27 Jul 2022 19:42:20 -0700 Message-Id: <20220728024225.2363663-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728024225.2363663-1-John.C.Harrison@Intel.com> References: <20220728024225.2363663-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 1/6] drm/i915/guc: Route semaphores to GuC for Gen12+ X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Micha=C5=82_Winiarski?= , DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Michał Winiarski In GuC submission mode, there is an option to use auto-switch out semaphores and have GuC auto-switch in a waiting context. This requires routing the semaphore interrupt to GuC. Signed-off-by: Michał Winiarski Signed-off-by: John Harrison Reviewed-by: Matthew Brost Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h | 4 ++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h index 8dc063f087eb1..a7092f711e9cd 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h @@ -102,6 +102,10 @@ #define GUC_SEND_TRIGGER (1<<0) #define GEN11_GUC_HOST_INTERRUPT _MMIO(0x1901f0) +#define GEN12_GUC_SEM_INTR_ENABLES _MMIO(0xc71c) +#define GUC_SEM_INTR_ROUTE_TO_GUC BIT(31) +#define GUC_SEM_INTR_ENABLE_ALL (0xff) + #define GUC_NUM_DOORBELLS 256 /* format of the HW-monitored doorbell cacheline */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 76916aed897ad..0b8c6450fa344 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4191,13 +4191,27 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) void intel_guc_submission_enable(struct intel_guc *guc) { + struct intel_gt *gt = guc_to_gt(guc); + + /* Enable and route to GuC */ + if (GRAPHICS_VER(gt->i915) >= 12) + intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, + GUC_SEM_INTR_ROUTE_TO_GUC | + GUC_SEM_INTR_ENABLE_ALL); + guc_init_lrc_mapping(guc); guc_init_engine_stats(guc); } void intel_guc_submission_disable(struct intel_guc *guc) { + struct intel_gt *gt = guc_to_gt(guc); + /* Note: By the time we're here, GuC may have already been reset */ + + /* Disable and route to host */ + if (GRAPHICS_VER(gt->i915) >= 12) + intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 0x0); } static bool __guc_submission_supported(struct intel_guc *guc) From patchwork Thu Jul 28 02:42:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12930977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 80B45C04A68 for ; Thu, 28 Jul 2022 02:43:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 89CEB10E271; Thu, 28 Jul 2022 02:43:15 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id A78D610EA51; Thu, 28 Jul 2022 02:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658976147; x=1690512147; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=E8B9qejhPS7QqWDfrnsVMTo6dJ/YDEkwJYeHMmIQxh0=; b=iVZGZ758/Po0SNuH+JVdET+BJN+xjk/fWa/5K/4xw67JSsW/8IFXxDgT 34rWMOJW1ORVRrln+mkQE1Be0V6apca5TOXhUe1BiPfEohqwmZLNg0NHT 7fUHhBtAfTA3RHE5tV1iaebUeMegq25OpkShbmCTOXyvZ7QnQ1mV3a44V ZT2uIHje3l6xB33C7cubeTEs8A7GD1v6/D8ZyRBezQ1OwBMhKgaGy6bJc T/eKMvQS9aWwe3ftsuGeDZqbrlsIm+Q97JkaGH+q5tYm7tqC/ocREibQB V+yC5e6TWgdYEVJic0MUT42RWU3zEZNR90e2KbkASXDGP2gktliJet8qz w==; X-IronPort-AV: E=McAfee;i="6400,9594,10421"; a="271443540" X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="271443540" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 19:42:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="690096043" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by FMSMGA003.fm.intel.com with ESMTP; 27 Jul 2022 19:42:25 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Wed, 27 Jul 2022 19:42:21 -0700 Message-Id: <20220728024225.2363663-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728024225.2363663-1-John.C.Harrison@Intel.com> References: <20220728024225.2363663-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 2/6] drm/i915/guc: Fix issues with live_preempt_cancel X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Matthew Brost Having semaphores results in different behavior when a dependent request is cancelled. In the case of semaphores the request could be on the HW and complete successfully while without the request is held in the driver and the error from the dependent request is propagated. Fix live_preempt_cancel to take this behavior into account. Also update live_preempt_cancel to use new function intel_context_ban rather than intel_context_set_banned. Signed-off-by: Matthew Brost Signed-off-by: John Harrison Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/selftest_execlists.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 02fc97a0ab502..015f8cd3463e2 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -2087,7 +2087,7 @@ static int __cancel_active0(struct live_preempt_cancel *arg) goto out; } - intel_context_set_banned(rq->context); + intel_context_ban(rq->context, rq); err = intel_engine_pulse(arg->engine); if (err) goto out; @@ -2146,7 +2146,7 @@ static int __cancel_active1(struct live_preempt_cancel *arg) if (err) goto out; - intel_context_set_banned(rq[1]->context); + intel_context_ban(rq[1]->context, rq[1]); err = intel_engine_pulse(arg->engine); if (err) goto out; @@ -2229,7 +2229,7 @@ static int __cancel_queued(struct live_preempt_cancel *arg) if (err) goto out; - intel_context_set_banned(rq[2]->context); + intel_context_ban(rq[2]->context, rq[2]); err = intel_engine_pulse(arg->engine); if (err) goto out; @@ -2244,7 +2244,13 @@ static int __cancel_queued(struct live_preempt_cancel *arg) goto out; } - if (rq[1]->fence.error != 0) { + /* + * The behavior between having semaphores and not is different. With + * semaphores the subsequent request is on the hardware and not cancelled + * while without the request is held in the driver and cancelled. + */ + if (intel_engine_has_semaphores(rq[1]->engine) && + rq[1]->fence.error != 0) { pr_err("Normal inflight1 request did not complete\n"); err = -EINVAL; goto out; @@ -2292,7 +2298,7 @@ static int __cancel_hostile(struct live_preempt_cancel *arg) goto out; } - intel_context_set_banned(rq->context); + intel_context_ban(rq->context, rq); err = intel_engine_pulse(arg->engine); /* force reset */ if (err) goto out; From patchwork Thu Jul 28 02:42:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12930973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 14553C04A68 for ; Thu, 28 Jul 2022 02:43:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 15C5310FA13; Thu, 28 Jul 2022 02:42:45 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id DF78E10EB2B; Thu, 28 Jul 2022 02:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658976148; x=1690512148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sUmrtSVmqja3EyXPTvjmzBtKrj3P0erg0cENYVHyKbA=; b=Snmgot6b0DBRGh/f1tFsJtic5z02uYinYEjfM7QU74WXvLjNPauNPVSj bSUJbKBXJ3F0L6o4nTEik2eS8JeyFk+o1i0LopdznshPTut8agSI6rg8q SXaZdMXRAGWpyj7j+OXQg1/nxaDM5xSV2oF0OBd7CXFk359kZxbI7rTzc iU+FV9V217qS8FHvdzpCl1Lauvuk+vYWGcJk4ZlrPIkk/aWjoqVB4qg/z 0vGWtJk51rex+fsJtA3ge82o5lxFJnYuFyhXzehNCVhqSbINx0DQzZhba cSlgOpNOyjl2VWH6kKX0ojK5fUYPl6j4prsMeJrAiM2supk82elWPH9MP g==; X-IronPort-AV: E=McAfee;i="6400,9594,10421"; a="271443541" X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="271443541" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 19:42:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="690096047" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by FMSMGA003.fm.intel.com with ESMTP; 27 Jul 2022 19:42:26 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Wed, 27 Jul 2022 19:42:22 -0700 Message-Id: <20220728024225.2363663-4-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728024225.2363663-1-John.C.Harrison@Intel.com> References: <20220728024225.2363663-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 3/6] drm/i915/guc: Add selftest for a hung GuC X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org, Rahul Kumar Singh Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Rahul Kumar Singh Add a test to check that the hangcheck will recover from a submission hang in the GuC. Signed-off-by: Rahul Kumar Singh Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 + .../drm/i915/gt/uc/selftest_guc_hangcheck.c | 159 ++++++++++++++++++ .../drm/i915/selftests/i915_live_selftests.h | 1 + 3 files changed, 161 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0b8c6450fa344..ff205c4125857 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -5177,4 +5177,5 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve) #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_guc.c" #include "selftest_guc_multi_lrc.c" +#include "selftest_guc_hangcheck.c" #endif diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c new file mode 100644 index 0000000000000..af913c4b09d37 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c @@ -0,0 +1,159 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright �� 2019 Intel Corporation + */ + +#include "selftests/igt_spinner.h" +#include "selftests/igt_reset.h" +#include "selftests/intel_scheduler_helpers.h" +#include "gt/intel_engine_heartbeat.h" +#include "gem/selftests/mock_context.h" + +#define BEAT_INTERVAL 100 + +static struct i915_request *nop_request(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = intel_engine_create_kernel_request(engine); + if (IS_ERR(rq)) + return rq; + + i915_request_get(rq); + i915_request_add(rq); + + return rq; +} + +static int intel_hang_guc(void *arg) +{ + struct intel_gt *gt = arg; + int ret = 0; + struct i915_gem_context *ctx; + struct intel_context *ce; + struct igt_spinner spin; + struct i915_request *rq; + intel_wakeref_t wakeref; + struct i915_gpu_error *global = >->i915->gpu_error; + struct intel_engine_cs *engine; + unsigned int reset_count; + u32 guc_status; + u32 old_beat; + + ctx = kernel_context(gt->i915, NULL); + if (IS_ERR(ctx)) { + pr_err("Failed get kernel context: %ld\n", PTR_ERR(ctx)); + return PTR_ERR(ctx); + } + + wakeref = intel_runtime_pm_get(gt->uncore->rpm); + + ce = intel_context_create(gt->engine[BCS0]); + if (IS_ERR(ce)) { + ret = PTR_ERR(ce); + pr_err("Failed to create spinner request: %d\n", ret); + goto err; + } + + engine = ce->engine; + reset_count = i915_reset_count(global); + + old_beat = engine->props.heartbeat_interval_ms; + ret = intel_engine_set_heartbeat(engine, BEAT_INTERVAL); + if (ret) { + pr_err("Failed to boost heatbeat interval: %d\n", ret); + goto err; + } + + ret = igt_spinner_init(&spin, engine->gt); + if (ret) { + pr_err("Failed to create spinner: %d\n", ret); + goto err; + } + + rq = igt_spinner_create_request(&spin, ce, MI_NOOP); + intel_context_put(ce); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + pr_err("Failed to create spinner request: %d\n", ret); + goto err_spin; + } + + ret = request_add_spin(rq, &spin); + if (ret) { + i915_request_put(rq); + pr_err("Failed to add Spinner request: %d\n", ret); + goto err_spin; + } + + ret = intel_reset_guc(gt); + if (ret) { + i915_request_put(rq); + pr_err("Failed to reset GuC, ret = %d\n", ret); + goto err_spin; + } + + guc_status = intel_uncore_read(gt->uncore, GUC_STATUS); + if (!(guc_status & GS_MIA_IN_RESET)) { + i915_request_put(rq); + pr_err("GuC failed to reset: status = 0x%08X\n", guc_status); + ret = -EIO; + goto err_spin; + } + + /* Wait for the heartbeat to cause a reset */ + ret = intel_selftest_wait_for_rq(rq); + i915_request_put(rq); + if (ret) { + pr_err("Request failed to complete: %d\n", ret); + goto err_spin; + } + + if (i915_reset_count(global) == reset_count) { + pr_err("Failed to record a GPU reset\n"); + ret = -EINVAL; + goto err_spin; + } + +err_spin: + igt_spinner_end(&spin); + igt_spinner_fini(&spin); + intel_engine_set_heartbeat(engine, old_beat); + + if (ret == 0) { + rq = nop_request(engine); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + goto err; + } + + ret = intel_selftest_wait_for_rq(rq); + i915_request_put(rq); + if (ret) { + pr_err("No-op failed to complete: %d\n", ret); + goto err; + } + } + +err: + intel_runtime_pm_put(gt->uncore->rpm, wakeref); + kernel_context_close(ctx); + + return ret; +} + +int intel_guc_hang_check(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(intel_hang_guc), + }; + struct intel_gt *gt = to_gt(i915); + + if (intel_gt_is_wedged(gt)) + return 0; + + if (!intel_uc_uses_guc_submission(>->uc)) + return 0; + + return intel_gt_live_subtests(tests, gt); +} diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h index bdd290f2bf3cd..aaf8a380e5c78 100644 --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h @@ -49,5 +49,6 @@ selftest(perf, i915_perf_live_selftests) selftest(slpc, intel_slpc_live_selftests) selftest(guc, intel_guc_live_selftests) selftest(guc_multi_lrc, intel_guc_multi_lrc_live_selftests) +selftest(guc_hang, intel_guc_hang_check) /* Here be dragons: keep last to run last! */ selftest(late_gt_pm, intel_gt_pm_late_selftests) From patchwork Thu Jul 28 02:42:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12930975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 56AD9C04A68 for ; Thu, 28 Jul 2022 02:43:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6D38210FE22; Thu, 28 Jul 2022 02:42:50 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 084F910EB80; Thu, 28 Jul 2022 02:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658976148; x=1690512148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b8Ga4xLUDAoz8bO6/jyAZfRFGJ0ZnpxJbrvYzSyP1ZM=; b=O/RqvEMUnYPijPWn6oExgHMVhoyUdWs6/AMKf2L43rP80umgrwqV9RbV V4QIiXINkwKrgueoq/0PvJgsoKCnxdg1Lvn1OrvcXNApTjws999XJCcsy oR4WP4oiannsu1nNRSlJOsd/wi6l/u/tFSPnL4sxU29aztuj4xQYJu/EB xdP0vsV5oHrlR7EQV3R/1KQ98F8FxtBBXl0tZX6k1JYmhBl8XRkjshqHS dlS/jXgIXUETldmi7GdemgrDA+ni1xtTtRpORQ8M+eBSGPNjtVWtuJtCD W9NcQPSVnB7C1whVYCWakkYs1IvFgjcXH1aXrjZoKHb+60hL0WQGHbOSW w==; X-IronPort-AV: E=McAfee;i="6400,9594,10421"; a="271443542" X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="271443542" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 19:42:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="690096050" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by FMSMGA003.fm.intel.com with ESMTP; 27 Jul 2022 19:42:26 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Wed, 27 Jul 2022 19:42:23 -0700 Message-Id: <20220728024225.2363663-5-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728024225.2363663-1-John.C.Harrison@Intel.com> References: <20220728024225.2363663-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 4/6] drm/i915/selftest: Cope with not having an RCS engine X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison It is no longer guaranteed that there will always be an RCS engine. So, use the helper function for finding the first available engine that can be used for general purpose selftets. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 6493265d5f642..7f3bb1d34dfbf 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1302,13 +1302,15 @@ static int igt_reset_wait(void *arg) { struct intel_gt *gt = arg; struct i915_gpu_error *global = >->i915->gpu_error; - struct intel_engine_cs *engine = gt->engine[RCS0]; + struct intel_engine_cs *engine; struct i915_request *rq; unsigned int reset_count; struct hang h; long timeout; int err; + engine = intel_selftest_find_any_engine(gt); + if (!engine || !intel_engine_can_store_dword(engine)) return 0; @@ -1432,7 +1434,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, int (*fn)(void *), unsigned int flags) { - struct intel_engine_cs *engine = gt->engine[RCS0]; + struct intel_engine_cs *engine; struct drm_i915_gem_object *obj; struct task_struct *tsk = NULL; struct i915_request *rq; @@ -1444,6 +1446,8 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, if (!gt->ggtt->num_fences && flags & EXEC_OBJECT_NEEDS_FENCE) return 0; + engine = intel_selftest_find_any_engine(gt); + if (!engine || !intel_engine_can_store_dword(engine)) return 0; @@ -1819,12 +1823,14 @@ static int igt_handle_error(void *arg) { struct intel_gt *gt = arg; struct i915_gpu_error *global = >->i915->gpu_error; - struct intel_engine_cs *engine = gt->engine[RCS0]; + struct intel_engine_cs *engine; struct hang h; struct i915_request *rq; struct i915_gpu_coredump *error; int err; + engine = intel_selftest_find_any_engine(gt); + /* Check that we can issue a global GPU and engine reset */ if (!intel_has_reset_engine(gt)) From patchwork Thu Jul 28 02:42:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12930978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6D305C19F28 for ; Thu, 28 Jul 2022 02:43:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3B03610EA51; Thu, 28 Jul 2022 02:43:27 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 31A4C10F0A9; Thu, 28 Jul 2022 02:42:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658976148; x=1690512148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=E3zw/8Ewz/98dvMQHULYaunVqKOfKMRyM4xLoBNF/uk=; b=UqoSERA9e2zVL9al30NtcEG3atV6RLtHXxPcZ2fjxUz++NIDuFmFSrSI gbtWab/V9S7HaC3d9nBQp8F0XlE/rEF8IkoRpNvsrJs2p0fUPUWiGkw/F vD//90jFCGnueFpNRHR4Tstl3VJzomohbUE/cNyl3E1UTFiIu19Dkgtnd nUXABlWKqLK53QKZgPDUlYZOV/JqaOu9N8lS0VkytQvY8MistHgwNZhXi chsfMlS2mVYC1fraYvNwXoq5xpeMzftjfsv686XEuX9ryRDaSGB7oWNVm C0PbtCHEMuf+JoOfMR+hMJtBQgGYdvDccMrkxoHPAPSke9uG0CkLvILmC g==; X-IronPort-AV: E=McAfee;i="6400,9594,10421"; a="271443543" X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="271443543" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 19:42:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="690096054" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by FMSMGA003.fm.intel.com with ESMTP; 27 Jul 2022 19:42:26 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Wed, 27 Jul 2022 19:42:24 -0700 Message-Id: <20220728024225.2363663-6-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728024225.2363663-1-John.C.Harrison@Intel.com> References: <20220728024225.2363663-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 5/6] drm/i915/guc: Support larger contexts on newer hardware X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Matthew Brost The GuC needs a copy of a golden context for implementing watchdog resets (aka media resets). This context is larger on newer platforms. So adjust the size being allocated/copied accordingly. Signed-off-by: Matthew Brost Signed-off-by: John Harrison Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index ba7541f3ca610..74cbe8eaf5318 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -464,7 +464,11 @@ static void fill_engine_enable_masks(struct intel_gt *gt, } #define LR_HW_CONTEXT_SIZE (80 * sizeof(u32)) -#define LRC_SKIP_SIZE (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE) +#define XEHP_LR_HW_CONTEXT_SIZE (96 * sizeof(u32)) +#define LR_HW_CONTEXT_SZ(i915) (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50) ? \ + XEHP_LR_HW_CONTEXT_SIZE : \ + LR_HW_CONTEXT_SIZE) +#define LRC_SKIP_SIZE(i915) (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SZ(i915)) static int guc_prep_golden_context(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); @@ -525,7 +529,7 @@ static int guc_prep_golden_context(struct intel_guc *guc) * on all engines). */ ads_blob_write(guc, ads.eng_state_size[guc_class], - real_size - LRC_SKIP_SIZE); + real_size - LRC_SKIP_SIZE(gt->i915)); ads_blob_write(guc, ads.golden_context_lrca[guc_class], addr_ggtt); @@ -599,7 +603,7 @@ static void guc_init_golden_context(struct intel_guc *guc) } GEM_BUG_ON(ads_blob_read(guc, ads.eng_state_size[guc_class]) != - real_size - LRC_SKIP_SIZE); + real_size - LRC_SKIP_SIZE(gt->i915)); GEM_BUG_ON(ads_blob_read(guc, ads.golden_context_lrca[guc_class]) != addr_ggtt); addr_ggtt += alloc_size; From patchwork Thu Jul 28 02:42:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12930974 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3B4B7C19F28 for ; Thu, 28 Jul 2022 02:43:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6205810FC09; Thu, 28 Jul 2022 02:42:50 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4A27510F27F; Thu, 28 Jul 2022 02:42:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658976148; x=1690512148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R3DFCS0YA+EucRtMbS6/AzTUE+QIYg+vGi/QxLZYT5A=; b=jGH8oSBRCISWc9m35jnS+2xt9e77XONbnqK+o4H8K1tvuQJFm3W0dkbi fnAkQuuEtskNiv1ckenJsv0R3V0+KlHj9svNR1PY2r81l9735sYYtngRA CE7uOcl44O7KznKm4/1D2shMUcVwUhFzUGGVVdlQJhLXUVziOArEMuSGM gwRjFNnHeZN/RMzhlzF0UCKwdsZ9JHZYCbm7qtLEVLRQCPqaXDjKGXbAI MkSjDGDcrvlt/8j9KvMWsdJjRuU6ixeEtpmA9nTOFJlYrecrNvwaiH9Cn NtPJVlIAx1+3e2ty4U9PexvM13t1N1pgMcSCcvud5zDTl47tFAzYoOkAs Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10421"; a="271443544" X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="271443544" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 19:42:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,196,1654585200"; d="scan'208";a="690096057" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by FMSMGA003.fm.intel.com with ESMTP; 27 Jul 2022 19:42:26 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Wed, 27 Jul 2022 19:42:25 -0700 Message-Id: <20220728024225.2363663-7-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728024225.2363663-1-John.C.Harrison@Intel.com> References: <20220728024225.2363663-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 6/6] drm/i915/guc: Don't abort on CTB_UNUSED status X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison When the KMD sends a CLIENT_RESET request to GuC (as part of the suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the KMD then checked the CTB queue, it would see a non-zero status value and report the buffer as corrupted. Technically, no G2H messages should be received once the CLIENT_RESET has been sent. However, if a context was outstanding on an engine then it would get reset and a reset notification would be sent. So, don't actually treat UNUSED as a catastrophic error. Just flag it up as unexpected and keep going. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- .../i915/gt/uc/abi/guc_communication_ctb_abi.h | 8 +++++--- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 ++++++++++++++++-- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h index df83c1cc7c7a6..28b8387f97b77 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h @@ -37,6 +37,7 @@ * | | | - _`GUC_CTB_STATUS_OVERFLOW` = 1 (head/tail too large) | * | | | - _`GUC_CTB_STATUS_UNDERFLOW` = 2 (truncated message) | * | | | - _`GUC_CTB_STATUS_MISMATCH` = 4 (head/tail modified) | + * | | | - _`GUC_CTB_STATUS_UNUSED` = 8 (CTB is not in use) | * +---+-------+--------------------------------------------------------------+ * |...| | RESERVED = MBZ | * +---+-------+--------------------------------------------------------------+ @@ -49,9 +50,10 @@ struct guc_ct_buffer_desc { u32 tail; u32 status; #define GUC_CTB_STATUS_NO_ERROR 0 -#define GUC_CTB_STATUS_OVERFLOW (1 << 0) -#define GUC_CTB_STATUS_UNDERFLOW (1 << 1) -#define GUC_CTB_STATUS_MISMATCH (1 << 2) +#define GUC_CTB_STATUS_OVERFLOW BIT(0) +#define GUC_CTB_STATUS_UNDERFLOW BIT(1) +#define GUC_CTB_STATUS_MISMATCH BIT(2) +#define GUC_CTB_STATUS_UNUSED BIT(3) u32 reserved[13]; } __packed; static_assert(sizeof(struct guc_ct_buffer_desc) == 64); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index f01325cd1b625..11b5d4ddb19ce 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -816,8 +816,22 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(ctb->broken)) return -EPIPE; - if (unlikely(desc->status)) - goto corrupted; + if (unlikely(desc->status)) { + u32 status = desc->status; + + if (status & GUC_CTB_STATUS_UNUSED) { + /* + * Potentially valid if a CLIENT_RESET request resulted in + * contexts/engines being reset. But should never happen as + * no contexts should be active when CLIENT_RESET is sent. + */ + CT_ERROR(ct, "Unexpected G2H after GuC has stopped!\n"); + status &= ~GUC_CTB_STATUS_UNUSED; + } + + if (status) + goto corrupted; + } GEM_BUG_ON(head > size);