From patchwork Tue Dec 14 17:04:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B943AC433EF for ; Tue, 14 Dec 2021 17:14:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1393910E23C; Tue, 14 Dec 2021 17:14:32 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id C9FF410E117; Tue, 14 Dec 2021 17:14:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502069; x=1671038069; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RR4OQyl93P6Xjlr+tous2lVRL+UUgd6EMt1s2adyaVM=; b=Mwy7RiRAKJZNNHms1SqWw8ddg9FU1V6M10Hkmg9H/PBfs7XHpkQpOq2Q ofNRNgETwdLwGrqcS3s+7cHIC+gHKCf88cLMWEeUcrIqODX3iUlUc5LrN +SLP7ku3t3CxwbUDNgF5SyGROZKKPNxaKlpfXxN6tkMxAWgmXIi2Z3z2V Z5Hj+aalx3F3EZBZmnJB88aaWL3q/nWMf6PuA1DAIfXkNzOE63lMHOZP/ qVZLw+5Gen+F4JEKtgcPqtqGSAVJFUoZsOfm8MqvUBsfNXr7N2S452TBH xaSQ1FavpAMsboAiydB8GuIqt8skgcNNGQu81etkEbQ8Qw9Lv7M5HexEj g==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="219043199" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="219043199" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:27 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357541" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:25 -0800 From: Matthew Brost To: , Subject: [PATCH 1/7] drm/i915/guc: Use correct context lock when callig clr_context_registered Date: Tue, 14 Dec 2021 09:04:54 -0800 Message-Id: <20211214170500.28569-2-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" s/ce/cn/ when grabbing guc_state.lock before calling clr_context_registered. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 1f9d4fde421f..9b7b4f4e0d91 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1937,9 +1937,9 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce) list_del_init(&cn->guc_id.link); ce->guc_id = cn->guc_id; - spin_lock(&ce->guc_state.lock); + spin_lock(&cn->guc_state.lock); clr_context_registered(cn); - spin_unlock(&ce->guc_state.lock); + spin_unlock(&cn->guc_state.lock); set_context_guc_id_invalid(cn); From patchwork Tue Dec 14 17:04:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A412EC433EF for ; Tue, 14 Dec 2021 17:14:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A87910E195; Tue, 14 Dec 2021 17:14:33 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id D7DE910E16E; Tue, 14 Dec 2021 17:14:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502071; x=1671038071; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xs4Uezz0ZR31ss+ptKygyEXbVGcCanPcDYRuGPs0HPE=; b=ns6ySsaH0qeaJTcYHI0vo2CnoJB1IV2Zo5nEojUHf8reVu7T+bjoVMGa 1Fo/5kiNzVSk9+5RPiLlEpuD8DfKIkPh2Nu340IBmaPKTTTKQh6FnZqUQ oDyJOZVC0hL6QmJ1ehil1YHIY64pKtokjxrKuIxfG70fZJZoY94+8gbtf RCX8Xap7nGKxf8ZokhfsyKPVN9xGY4eS3HKOJnDoC8Sg92qhy7oFZ29mn kq4LT7AseRiOhwGhWUegw2W/+aFSGy6IgyFEjhLpnUEf4XrBqseD6blv5 920iDOs86fhZuGnJpX/2pzujShYIhzeQa2FXz2q7VMdSWZo5qf8hLQuke Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="219043201" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="219043201" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357546" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:26 -0800 From: Matthew Brost To: , Subject: [PATCH 2/7] drm/i915/guc: Only assign guc_id.id when stealing guc_id Date: Tue, 14 Dec 2021 09:04:55 -0800 Message-Id: <20211214170500.28569-3-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Previously assigned whole guc_id structure (list, spin lock) which is incorrect, only assign the guc_id.id. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9b7b4f4e0d91..0fb2eeff0262 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1935,7 +1935,7 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce) GEM_BUG_ON(intel_context_is_parent(cn)); list_del_init(&cn->guc_id.link); - ce->guc_id = cn->guc_id; + ce->guc_id.id = cn->guc_id.id; spin_lock(&cn->guc_state.lock); clr_context_registered(cn); From patchwork Tue Dec 14 17:04:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8AE62C433EF for ; Tue, 14 Dec 2021 17:14:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D5DD810E16E; Tue, 14 Dec 2021 17:14:32 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id AA4B910E126; Tue, 14 Dec 2021 17:14:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502070; x=1671038070; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+t+IYa+onmrRcxvOJjKYFtf3l6LDynZbl3ClATBruNA=; b=mcMN2wQ2Mn/ztjcPn9UJG3jJI8wjRXMoGkmonpVm9msXV6+FaMvQsaKj fECx3pwB9pY2Iz8GqKuje87uh+YpQmG7hE5Tw+4sWYW6qeKbwCPERKTmX 75WXYRkhS8W3KAj+WPp8z4BnBMeUEkKWNWE596asifMmWy7HKb1vGfqT6 yyzc7HULGgnuWx0o/8ISePJGnmxexOgd9IeWMjiqr0gFBovMHtrVs1Hjo NwsFhs70B+S3WyKP9LyHKmesRwNtSmHk2uDOmR0ZyomWajQFvzMYhhZL7 OleHbvQSfFIP+/6X/5Bg/DSgdBUqKsQTI756VB7nl2hFrrL8aWojHa4+N Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="219043202" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="219043202" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357549" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:26 -0800 From: Matthew Brost To: , Subject: [PATCH 3/7] drm/i915/guc: Remove racey GEM_BUG_ON Date: Tue, 14 Dec 2021 09:04:56 -0800 Message-Id: <20211214170500.28569-4-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" A full GT reset can race with the last context put resulting in the context ref count being zero but the destroyed bit not yet being set. Remove GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the destroyed bit must be set in ref count is zero. Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0fb2eeff0262..36c2965db49b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) spin_unlock(&ce->guc_state.lock); - GEM_BUG_ON(!do_put && !destroyed); - if (pending_enable || destroyed || deregister) { decr_outstanding_submission_g2h(guc); if (deregister) From patchwork Tue Dec 14 17:04:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BE46CC433EF for ; Tue, 14 Dec 2021 17:14:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ADCED10E1FA; Tue, 14 Dec 2021 17:14:43 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 78E1010E1CF; Tue, 14 Dec 2021 17:14:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502073; x=1671038073; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jaCS0XMFCrae8Su33E7J0Rx/nsJ2glq4igpMu1fVzX0=; b=FHzoPngdMHX2HcqPWfPUZGELbw+07PE/XBI6RgJ7bbiCHrmVKzy/s9ZA FkhhWqm0h6B2tVCxfwXLb84L2WdbeZymptZyoy24Fbn4RIo4sM7J7wUN1 ZuDqGnlZ7QvFy+QavU+R214q0zvy+/VLZej9r99UBv++GP5PagmTH5Qid tGmFwxlUKuwDljFF/cEW0MRrpDqM1MWKD6h8hgS4LZ8l0V3IjLIkExar0 3eopdOKyVkJvjt9XxWNwVFAQd/o4ucBW/XuR3HiUdNBVnjE1U+go3Tx5S qx99ZBcf5VGbhdV9RKig2/qkgcxvz+gLMf6orSzg3OCxj29BZLyh/NgTv A==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="302405484" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="302405484" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357553" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:26 -0800 From: Matthew Brost To: , Subject: [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Date: Tue, 14 Dec 2021 09:04:57 -0800 Message-Id: <20211214170500.28569-5-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison While attempting to debug a CT deadlock issue in various CI failures (most easily reproduced with gem_ctx_create/basic-files), I was seeing CPU deadlock errors being reported. This were because the context destroy loop was blocking waiting on H2G space from inside an IRQ spinlock. There no was deadlock as such, it's just that the H2G queue was full of context destroy commands and GuC was taking a long time to process them. However, the kernel was seeing the large amount of time spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would then happen (heartbeat failures, CT deadlock errors, outstanding H2G WARNs, etc.). Re-working the loop to only acquire the spinlock around the list management (which is all it is meant to protect) rather than the entire destroy operation seems to fix all the above issues. v2: (John Harrison) - Fix typo in comment message Signed-off-by: John Harrison Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 36c2965db49b..96fcf869e3ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) unsigned long flags; bool disabled; - lockdep_assert_held(&guc->submission_state.lock); GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id)); @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) } spin_unlock_irqrestore(&ce->guc_state.lock, flags); if (unlikely(disabled)) { - __release_guc_id(guc, ce); + release_guc_id(guc, ce); __guc_context_destroy(ce); return; } @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce) static void guc_flush_destroyed_contexts(struct intel_guc *guc) { - struct intel_context *ce, *cn; + struct intel_context *ce; unsigned long flags; GEM_BUG_ON(!submission_disabled(guc) && guc_submission_initialized(guc)); - spin_lock_irqsave(&guc->submission_state.lock, flags); - list_for_each_entry_safe(ce, cn, - &guc->submission_state.destroyed_contexts, - destroyed_link) { - list_del_init(&ce->destroyed_link); - __release_guc_id(guc, ce); + while (!list_empty(&guc->submission_state.destroyed_contexts)) { + spin_lock_irqsave(&guc->submission_state.lock, flags); + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, + struct intel_context, + destroyed_link); + if (ce) + list_del_init(&ce->destroyed_link); + spin_unlock_irqrestore(&guc->submission_state.lock, flags); + + if (!ce) + break; + + release_guc_id(guc, ce); __guc_context_destroy(ce); } - spin_unlock_irqrestore(&guc->submission_state.lock, flags); } static void deregister_destroyed_contexts(struct intel_guc *guc) { - struct intel_context *ce, *cn; + struct intel_context *ce; unsigned long flags; - spin_lock_irqsave(&guc->submission_state.lock, flags); - list_for_each_entry_safe(ce, cn, - &guc->submission_state.destroyed_contexts, - destroyed_link) { - list_del_init(&ce->destroyed_link); + while (!list_empty(&guc->submission_state.destroyed_contexts)) { + spin_lock_irqsave(&guc->submission_state.lock, flags); + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, + struct intel_context, + destroyed_link); + if (ce) + list_del_init(&ce->destroyed_link); + spin_unlock_irqrestore(&guc->submission_state.lock, flags); + + if (!ce) + break; + guc_lrc_desc_unpin(ce); } - spin_unlock_irqrestore(&guc->submission_state.lock, flags); } static void destroyed_worker_func(struct work_struct *w) From patchwork Tue Dec 14 17:04:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8B904C433EF for ; Tue, 14 Dec 2021 17:14:56 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D585610E259; Tue, 14 Dec 2021 17:14:44 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 85DA110E1D4; Tue, 14 Dec 2021 17:14:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502074; x=1671038074; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EAI1zzp+pVh0LIc0SItFf3X1uGyMPd6uOR9y7CfFiI8=; b=KC/PJCN00yMpdyybucmfjhcy0YjWWMofqLHG3wOI827jZF6crD+ieD/U 4GizUqkpgtQNSzIpyPPoQhof5echFv/FNNoSjdLVQqRyNDWDWzWsLrpJA rnEyef9KI479qwPlrkFDrjq3R2nryAbhBzc/lAk6YeR7LZVK8hCCfCikR uO5FJPvIIYTotluCZ4mZcH+BrGdY2n+x2elObCW915ZiARI3QcqM3Nuh0 Ew4s0aeyNiWv1H8l3JtTJdOTBYFURn3h41m6JeTlOttaBmRNJZ46lK/AS SvjvJ1E8XG+tlUbMSaB3IiMa941SqP8QH0QuT20eUXWHFWewQHU2wSsnd Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="302405485" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="302405485" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357556" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:26 -0800 From: Matthew Brost To: , Subject: [PATCH 5/7] drm/i915/guc: Add extra debug on CT deadlock Date: Tue, 14 Dec 2021 09:04:58 -0800 Message-Id: <20211214170500.28569-6-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Print CT state (H2G + G2H head / tail pointers, credits) on CT deadlock. v2: (John Harrison) - Add units to debug messages Reviewed-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a0cc34be7b56..741be9abab68 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -523,6 +523,15 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n", ktime_ms_delta(ktime_get(), ct->stall_time), send->status, recv->status); + CT_ERROR(ct, "H2G Space: %u (Bytes)\n", + atomic_read(&ct->ctbs.send.space) * 4); + CT_ERROR(ct, "Head: %u (Dwords)\n", ct->ctbs.send.desc->head); + CT_ERROR(ct, "Tail: %u (Dwords)\n", ct->ctbs.send.desc->tail); + CT_ERROR(ct, "G2H Space: %u (Bytes)\n", + atomic_read(&ct->ctbs.recv.space) * 4); + CT_ERROR(ct, "Head: %u\n (Dwords)", ct->ctbs.recv.desc->head); + CT_ERROR(ct, "Tail: %u\n (Dwords)", ct->ctbs.recv.desc->tail); + ct->ctbs.send.broken = true; } From patchwork Tue Dec 14 17:04:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 78411C433EF for ; Tue, 14 Dec 2021 17:14:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1EF2E10E18F; Tue, 14 Dec 2021 17:14:35 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7966610E1E7; Tue, 14 Dec 2021 17:14:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502073; x=1671038073; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V7mhzQQhvb0lTvzXO4pMuWvuZcuZfU52N2KDqvUj19Q=; b=FxgGXpVQjg9u2ibULBCJ5bX5lxBsma+7n4igqBYF0rDvxBb5XHUqZoAy NjeAr5Nq3HbrzKiG6094pi4PaLiOVwFzmRa5vRZgW9g/67dQBI44UdZC0 NXQXOhPGpbnITstIIQGMPHx50fBSloxYXoVhkISNNlGmvOm5NWt2DdVjZ tDWJ8ZL5O1HPqzYp/tqORXkpvq8NWzaiw3EKmOyesWfAIE50wZQC9+Z4U 0itZhzUIt8B2qDK7xtSjZZEQstSEeO+fONkyhoVAyKjc64fk6Lv1HsuYn PsN+F9Yzte0VAxCaXbZ5D4lT8kFIr6j6F3bLWW+rqXNX09DTjcgrxMXLv w==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="302405486" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="302405486" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357559" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:26 -0800 From: Matthew Brost To: , Subject: [PATCH 6/7] drm/i915/guc: Kick G2H tasklet if no credits Date: Tue, 14 Dec 2021 09:04:59 -0800 Message-Id: <20211214170500.28569-7-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Let's be paranoid and kick the G2H tasklet, which dequeues messages, if G2H credits are exhausted. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 741be9abab68..aa6dd6415202 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -591,12 +591,19 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) { + bool h2g = h2g_has_room(ct, h2g_dw); + bool g2h = g2h_has_room(ct, g2h_dw); + lockdep_assert_held(&ct->ctbs.send.lock); - if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { + if (unlikely(!h2g || !g2h)) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); + /* Be paranoid and kick G2H tasklet to free credits */ + if (!g2h) + tasklet_hi_schedule(&ct->receive_tasklet); + if (unlikely(ct_deadlocked(ct))) return -EPIPE; else From patchwork Tue Dec 14 17:05:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12676483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3965C433F5 for ; Tue, 14 Dec 2021 17:14:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 053F610E126; Tue, 14 Dec 2021 17:14:35 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 29E2A10E126; Tue, 14 Dec 2021 17:14:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639502073; x=1671038073; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3MmtLKz/sULUsoK9osKYuoD2/nBendk6OQJRqekpl6Q=; b=QjC6EHWLRX3scsOo7ZjfH17L/DNayKkcWv60hg+OIE6WZzusutiGHu3/ HIuEtE+/V/GTmLzhoPpJMkn5mz1tfSU5IHsUU7heerD8CAAy6lqhPai3L YTU0Hzjmm5W2nlwu1nvhBwdTe/8E4g44qVHKMZGm2SOdjLX5y8eImoceR NppW4bwDmvFcvnfc0TmoeA4H4Uol3I9pYIxc2wKeae2/3pDnECf/XUGFl VqjSkJ7//uYtrx4GNjFLrb5+eRO98EVz/1Tj6g9v3Ykf95WWiC+Fz6Xkd zB9uYNGfJ+EgMshZ2L7Npmx3SRr88vxFkJd6FcHNZa/BIm94JFlYtJhxn Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10197"; a="302405487" X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="302405487" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,205,1635231600"; d="scan'208";a="614357562" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2021 09:10:26 -0800 From: Matthew Brost To: , Subject: [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids Date: Tue, 14 Dec 2021 09:05:00 -0800 Message-Id: <20211214170500.28569-8-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211214170500.28569-1-matthew.brost@intel.com> References: <20211214170500.28569-1-matthew.brost@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Testing the stealing of guc ids is hard from user space as we have 64k guc_ids. Add a selftest, which artificially reduces the number of guc ids, and forces a steal. The test creates a spinner which is used to block all subsequent submissions until it completes. Next, a loop creates a context and a NOP request each iteration until the guc_ids are exhausted (request creation returns -EAGAIN). The spinner is ended, unblocking all requests created in the loop. At this point all guc_ids are exhausted but are available to steal. Try to create another request which should successfully steal a guc_id. Wait on last request to complete, idle GPU, verify a guc_id was stolen via a counter, and exit the test. Test also artificially reduces the number of guc_ids so the test runs in a timely manner. v2: (John Harrison) - s/stole/stolen - Fix some wording in test description - Rework indexing into context array - Add test description to commit message - Fix typo in commit message (Checkpatch) - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID v3: (John Harrison) - Set array value to NULL after extracting error - Fix a few typos in comments / error messages - Delete redundant comment in commit message Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 +- drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 173 ++++++++++++++++++ 3 files changed, 196 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 1cb46098030d..f9240d4baa69 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -94,6 +94,11 @@ struct intel_guc { * @guc_ids: used to allocate new guc_ids, single-lrc */ struct ida guc_ids; + /** + * @num_guc_ids: Number of guc_ids, selftest feature to be able + * to reduce this number while testing. + */ + int num_guc_ids; /** * @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */ @@ -202,6 +207,13 @@ struct intel_guc { */ struct delayed_work work; } timestamp; + +#ifdef CONFIG_DRM_I915_SELFTEST + /** + * @number_guc_id_stolen: The number of guc_ids that have been stolen + */ + int number_guc_id_stolen; +#endif }; static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 96fcf869e3ff..99414b49ca6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -145,7 +145,8 @@ guc_create_parallel(struct intel_engine_cs **engines, * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for * multi-lrc. */ -#define NUMBER_MULTI_LRC_GUC_ID (GUC_MAX_LRC_DESCRIPTORS / 16) +#define NUMBER_MULTI_LRC_GUC_ID(guc) \ + ((guc)->submission_state.num_guc_ids / 16) /* * Below is a set of functions which control the GuC scheduling state which @@ -1775,7 +1776,7 @@ int intel_guc_submission_init(struct intel_guc *guc) destroyed_worker_func); guc->submission_state.guc_ids_bitmap = - bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL); + bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL); if (!guc->submission_state.guc_ids_bitmap) return -ENOMEM; @@ -1869,13 +1870,13 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) if (intel_context_is_parent(ce)) ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap, - NUMBER_MULTI_LRC_GUC_ID, + NUMBER_MULTI_LRC_GUC_ID(guc), order_base_2(ce->parallel.number_children + 1)); else ret = ida_simple_get(&guc->submission_state.guc_ids, - NUMBER_MULTI_LRC_GUC_ID, - GUC_MAX_LRC_DESCRIPTORS, + NUMBER_MULTI_LRC_GUC_ID(guc), + guc->submission_state.num_guc_ids, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(ret < 0)) @@ -1941,6 +1942,10 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce) set_context_guc_id_invalid(cn); +#ifdef CONFIG_DRM_I915_SELFTEST + guc->number_guc_id_stolen++; +#endif + return 0; } else { return -EAGAIN; @@ -3779,6 +3784,7 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) { + guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS; guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); } diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c index fb0e4a7bd8ca..2ae414446112 100644 --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c @@ -3,8 +3,21 @@ * Copyright �� 2021 Intel Corporation */ +#include "selftests/igt_spinner.h" #include "selftests/intel_scheduler_helpers.h" +static int request_add_spin(struct i915_request *rq, struct igt_spinner *spin) +{ + int err = 0; + + i915_request_get(rq); + i915_request_add(rq); + if (spin && !igt_wait_for_spinner(spin, rq)) + err = -ETIMEDOUT; + + return err; +} + static struct i915_request *nop_user_request(struct intel_context *ce, struct i915_request *from) { @@ -110,10 +123,170 @@ static int intel_guc_scrub_ctbs(void *arg) return ret; } +/* + * intel_guc_steal_guc_ids - Test to exhaust all guc_ids and then steal one + * + * This test creates a spinner which is used to block all subsequent submissions + * until it completes. Next, a loop creates a context and a NOP request each + * iteration until the guc_ids are exhausted (request creation returns -EAGAIN). + * The spinner is ended, unblocking all requests created in the loop. At this + * point all guc_ids are exhausted but are available to steal. Try to create + * another request which should successfully steal a guc_id. Wait on last + * request to complete, idle GPU, verify a guc_id was stolen via a counter, and + * exit the test. Test also artificially reduces the number of guc_ids so the + * test runs in a timely manner. + */ +static int intel_guc_steal_guc_ids(void *arg) +{ + struct intel_gt *gt = arg; + struct intel_guc *guc = >->uc.guc; + int ret, sv, context_index = 0; + intel_wakeref_t wakeref; + struct intel_engine_cs *engine; + struct intel_context **ce; + struct igt_spinner spin; + struct i915_request *spin_rq = NULL, *rq, *last = NULL; + int number_guc_id_stolen = guc->number_guc_id_stolen; + + ce = kzalloc(sizeof(*ce) * GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL); + if (!ce) { + pr_err("Context array allocation failed\n"); + return -ENOMEM; + } + + wakeref = intel_runtime_pm_get(gt->uncore->rpm); + engine = intel_selftest_find_any_engine(gt); + sv = guc->submission_state.num_guc_ids; + guc->submission_state.num_guc_ids = 4096; + + /* Create spinner to block requests in below loop */ + ce[context_index] = intel_context_create(engine); + if (IS_ERR(ce[context_index])) { + ret = PTR_ERR(ce[context_index]); + ce[context_index] = NULL; + pr_err("Failed to create context: %d\n", ret); + goto err_wakeref; + } + ret = igt_spinner_init(&spin, engine->gt); + if (ret) { + pr_err("Failed to create spinner: %d\n", ret); + goto err_contexts; + } + spin_rq = igt_spinner_create_request(&spin, ce[context_index], + MI_ARB_CHECK); + if (IS_ERR(spin_rq)) { + ret = PTR_ERR(spin_rq); + pr_err("Failed to create spinner request: %d\n", ret); + goto err_contexts; + } + ret = request_add_spin(spin_rq, &spin); + if (ret) { + pr_err("Failed to add Spinner request: %d\n", ret); + goto err_spin_rq; + } + + /* Use all guc_ids */ + while (ret != -EAGAIN) { + ce[++context_index] = intel_context_create(engine); + if (IS_ERR(ce[context_index])) { + ret = PTR_ERR(ce[context_index--]); + ce[context_index] = NULL; + pr_err("Failed to create context: %d\n", ret); + goto err_spin_rq; + } + + rq = nop_user_request(ce[context_index], spin_rq); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + rq = NULL; + if (ret != -EAGAIN) { + pr_err("Failed to create request, %d: %d\n", + context_index, ret); + goto err_spin_rq; + } + } else { + if (last) + i915_request_put(last); + last = rq; + } + } + + /* Release blocked requests */ + igt_spinner_end(&spin); + ret = intel_selftest_wait_for_rq(spin_rq); + if (ret) { + pr_err("Spin request failed to complete: %d\n", ret); + i915_request_put(last); + goto err_spin_rq; + } + i915_request_put(spin_rq); + igt_spinner_fini(&spin); + spin_rq = NULL; + + /* Wait for last request */ + ret = i915_request_wait(last, 0, HZ * 30); + i915_request_put(last); + if (ret < 0) { + pr_err("Last request failed to complete: %d\n", ret); + goto err_spin_rq; + } + + /* Try to steal guc_id */ + rq = nop_user_request(ce[context_index], NULL); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + pr_err("Failed to steal guc_id, %d: %d\n", context_index, ret); + goto err_spin_rq; + } + + /* Wait for request with stolen guc_id */ + ret = i915_request_wait(rq, 0, HZ); + i915_request_put(rq); + if (ret < 0) { + pr_err("Request with stolen guc_id failed to complete: %d\n", + ret); + goto err_spin_rq; + } + + /* Wait for idle */ + ret = intel_gt_wait_for_idle(gt, HZ * 30); + if (ret < 0) { + pr_err("GT failed to idle: %d\n", ret); + goto err_spin_rq; + } + + /* Verify a guc_id was stolen */ + if (guc->number_guc_id_stolen == number_guc_id_stolen) { + pr_err("No guc_id was stolen"); + ret = -EINVAL; + } else { + ret = 0; + } + +err_spin_rq: + if (spin_rq) { + igt_spinner_end(&spin); + intel_selftest_wait_for_rq(spin_rq); + i915_request_put(spin_rq); + igt_spinner_fini(&spin); + intel_gt_wait_for_idle(gt, HZ * 30); + } +err_contexts: + for (; context_index >= 0 && ce[context_index]; --context_index) + intel_context_put(ce[context_index]); +err_wakeref: + intel_runtime_pm_put(gt->uncore->rpm, wakeref); + kfree(ce); + guc->submission_state.num_guc_ids = sv; + + return ret; +} + int intel_guc_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { SUBTEST(intel_guc_scrub_ctbs), + SUBTEST(intel_guc_steal_guc_ids), }; struct intel_gt *gt = &i915->gt;