From patchwork Mon Nov 6 23:59:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13447602 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 35BC3C4332F for ; Mon, 6 Nov 2023 23:59:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8402710E46C; Mon, 6 Nov 2023 23:59:12 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id D014010E468; Mon, 6 Nov 2023 23:59:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699315147; x=1730851147; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FoY1/BJWU4j/BGqN5scS6DUUNT2AJNwrQeSR50wd7dM=; b=aPgH5qPB/ouPwKSNW4jccv1UBu/Aha/aclHbvyzUEQljWg1WvmE87GwD ZahepdTTZHiU3OHrAT93Ooyz3QUzRh/qT5uruGL8ebTK/aF9DFsuPU1Qg 8ivNfkDM3TcpayGtWduYXh4CsJXtbw4XX6aeH16GJVNEUQKcISy1fQp9c fXqmpdQP79EYYuZ36dtL5qP4PrEb21YaaXwDeT/VyFu0ykVEcRmDZdOVh LA3SjCcHWWgOritrh+HJkt0rzxomUXEkxaf3VxK/w8sLnUtoMSJU/yVcS 0TWGS+8iKxy5snc7jPIVBe3qZhZ3BOGV61VxO1Px1ox3wfAPO55/YOUKE w==; X-IronPort-AV: E=McAfee;i="6600,9927,10886"; a="455871437" X-IronPort-AV: E=Sophos;i="6.03,282,1694761200"; d="scan'208";a="455871437" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2023 15:59:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,282,1694761200"; d="scan'208";a="3789320" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa002.fm.intel.com with ESMTP; 06 Nov 2023 15:59:06 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH 1/2] drm/i915/guc: Fix for potential false positives in GuC hang selftest Date: Mon, 6 Nov 2023 15:59:27 -0800 Message-ID: <20231106235929.454983-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231106235929.454983-1-John.C.Harrison@Intel.com> References: <20231106235929.454983-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Noticed that the hangcheck selftest is submitting a non-preemptoble spinner. That means that even if the GuC does not die, the heartbeat will still kick in and trigger a reset. Which is rather defeating the purpose of the test - to verify that the heartbeat will kick in if the GuC itself has died. The test is deliberately killing the GuC, so it should never hit the case of a non-dead GuC. But it is not impossible that the kill might fail at some future point due to other driver re-work. So, make the spinner pre-emptible. That way the heartbeat can get through if the GuC is alive and context switching. Thus a reset only happens if the GuC dies. Thus, if the kill should stop working the test will now fail rather than claim to pass. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c index 34b5d952e2bcb..26fdc392fce6c 100644 --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c @@ -74,7 +74,7 @@ static int intel_hang_guc(void *arg) goto err; } - rq = igt_spinner_create_request(&spin, ce, MI_NOOP); + rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK); intel_context_put(ce); if (IS_ERR(rq)) { ret = PTR_ERR(rq); From patchwork Mon Nov 6 23:59:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13447601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D4546C0032E for ; Mon, 6 Nov 2023 23:59:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7852110E46B; Mon, 6 Nov 2023 23:59:12 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1C05E10E467; Mon, 6 Nov 2023 23:59:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699315148; x=1730851148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZKixas2mR9aepgmG63a1N+R64rh7ErCLJToCcVTHYuU=; b=fPCwP5iWqIdAXNhdL07kLWeJwludPmAz94icq3uKBP9WKDfe5rHo84CF EiQ52tD8mwWtYuT0LIVxO6i8Fi820z1EoUKCaBbl7mu0YWY65ZSeSDEgF Y/xg5oUKP8fjyRBLRFpZQfT0Neax5kM24/Z7ltQ0E/C+Q1qpy+SFrgKMI sRiKNWJUBDWKc9YlEtTVR49Udsf394G23Cn7jfoKQUdkJaMRF9WKWVuFz K0YMQfF7wSQnajb/7tFQ2VPjm4R2/rvGyTIRTrbOfHhor5u50cZPDoxOk uFVpwOO4d/PeYDBAXr5YwP4/tANRVfIV7eGM9Nj8a7yx8oxUAPTFmCmcX Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10886"; a="455871439" X-IronPort-AV: E=Sophos;i="6.03,282,1694761200"; d="scan'208";a="455871439" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2023 15:59:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,282,1694761200"; d="scan'208";a="3789323" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by fmviesa002.fm.intel.com with ESMTP; 06 Nov 2023 15:59:06 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH 2/2] drm/i915/guc: Add a selftest for FAST_REQUEST errors Date: Mon, 6 Nov 2023 15:59:28 -0800 Message-ID: <20231106235929.454983-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231106235929.454983-1-John.C.Harrison@Intel.com> References: <20231106235929.454983-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 ++ drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 122 ++++++++++++++++++++++ 3 files changed, 135 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 2b6dfe62c8f2a..e22c12ce245ad 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -297,6 +297,10 @@ struct intel_guc { * @number_guc_id_stolen: The number of guc_ids that have been stolen */ int number_guc_id_stolen; + /** + * @fast_response_selftest: Backdoor to CT handler for fast response selftest + */ + u32 fast_response_selftest; #endif }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 89e314b3756bb..9d958afb78b7f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1076,6 +1076,15 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r found = true; break; } + +#ifdef CONFIG_DRM_I915_SELFTEST + if (!found && ct_to_guc(ct)->fast_response_selftest) { + CT_DEBUG(ct, "Assuming unsolicited response due to FAST_REQUEST selftest\n"); + ct_to_guc(ct)->fast_response_selftest++; + found = 1; + } +#endif + if (!found) { CT_ERROR(ct, "Unsolicited response message: len %u, data %#x (fence %u, last %u)\n", len, hxg[0], fence, ct->requests.last_fence); diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c index bfb72143566f6..97fbbb396336c 100644 --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c @@ -286,11 +286,133 @@ static int intel_guc_steal_guc_ids(void *arg) return ret; } +/* + * Send a context schedule H2G message with an invalid context id. + * This should generate a GUC_RESULT_INVALID_CONTEXT response. + */ +static int bad_h2g(struct intel_guc *guc) +{ + u32 action[3], len = 0; + + action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; + action[len++] = 0x12345678; + + return intel_guc_send_nb(guc, action, len, 0); +} + +/* + * Set a spinner running to make sure the system is alive and active, + * then send a bad but asynchronous H2G command and wait to see if an + * error response is returned. If no response is received or if the + * spinner dies then the test will fail. + */ +#define FAST_RESPONSE_TIMEOUT_MS 1000 +static int intel_guc_fast_request(void *arg) +{ + struct intel_gt *gt = arg; + struct intel_context *ce; + struct igt_spinner spin; + struct i915_request *rq; + intel_wakeref_t wakeref; + struct intel_engine_cs *engine = intel_selftest_find_any_engine(gt); + ktime_t before, now, delta; + bool spinning = false; + u64 delta_ms; + int ret = 0; + + if (!engine) + return 0; + + wakeref = intel_runtime_pm_get(gt->uncore->rpm); + + ce = intel_context_create(engine); + if (IS_ERR(ce)) { + ret = PTR_ERR(ce); + gt_err(gt, "Failed to create spinner request: %pe\n", ce); + goto err_pm; + } + + ret = igt_spinner_init(&spin, engine->gt); + if (ret) { + gt_err(gt, "Failed to create spinner: %pe\n", ERR_PTR(ret)); + goto err_pm; + } + spinning = true; + + rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK); + intel_context_put(ce); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + gt_err(gt, "Failed to create spinner request: %pe\n", rq); + goto err_spin; + } + + ret = request_add_spin(rq, &spin); + if (ret) { + gt_err(gt, "Failed to add Spinner request: %pe\n", ERR_PTR(ret)); + goto err_rq; + } + + gt->uc.guc.fast_response_selftest = 1; + + ret = bad_h2g(>->uc.guc); + if (ret) { + gt_err(gt, "Failed to send H2G: %pe\n", ERR_PTR(ret)); + goto err_rq; + } + + before = ktime_get(); + while (gt->uc.guc.fast_response_selftest == 1) { + ret = i915_request_wait(rq, 0, 1); + if (ret != -ETIME) { + gt_err(gt, "Request wait failed: %pe\n", ERR_PTR(ret)); + goto err_rq; + } + now = ktime_get(); + delta = ktime_sub(now, before); + delta_ms = ktime_to_ms(delta); + + if (delta_ms > FAST_RESPONSE_TIMEOUT_MS) { + gt_err(gt, "Timed out waiting for fast request error!\n"); + ret = -ETIME; + goto err_rq; + } + } + + if (gt->uc.guc.fast_response_selftest != 2) { + gt_err(gt, "Unexpected fast response count: %d\n", + gt->uc.guc.fast_response_selftest); + goto err_rq; + } + + igt_spinner_end(&spin); + spinning = false; + + ret = intel_selftest_wait_for_rq(rq); + if (ret) { + gt_err(gt, "Request failed to complete: %pe\n", ERR_PTR(ret)); + goto err_rq; + } + +err_rq: + i915_request_put(rq); + +err_spin: + if (spinning) + igt_spinner_end(&spin); + igt_spinner_fini(&spin); + +err_pm: + intel_runtime_pm_put(gt->uncore->rpm, wakeref); + return ret; +} + int intel_guc_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { SUBTEST(intel_guc_scrub_ctbs), SUBTEST(intel_guc_steal_guc_ids), + SUBTEST(intel_guc_fast_request), }; struct intel_gt *gt = to_gt(i915);