From patchwork Sat Dec 11 17:35:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BB702C433EF for ; Sat, 11 Dec 2021 17:41:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CE5B610E330; Sat, 11 Dec 2021 17:41:15 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 339A510E330; Sat, 11 Dec 2021 17:41:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244474; x=1670780474; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RR4OQyl93P6Xjlr+tous2lVRL+UUgd6EMt1s2adyaVM=; b=C8YQrm/VCuJmERGsbrhBvRFfqH8B6QE8OSuxfORwhhB73jpgp+yu0VXB 7S1ZMvVDnv22rgu0GiBBBc7ReJMaMk7JhNegI612HB7CiknVs7kRd5+M0 geD475GWpwJ0cv6z0Phxcp1t6KRrr0LSCR+64DnHxpuekPJShw0Sql65P Q9SK1EW0vTcTAxfu0Tp9tsSMZD+3HgxmjSLqs/pULWgIuP6MKNPoeXc20 kMaPtCAYlwURI1SkJM5OFyXIPn8s7hGfmp1yp4sf96O1iHmzjyn4eM9iH hfUk8lJGUNgdq0yzkZTXjyNJ5Mer4pNb9gg1j6QcI8aNLZ4POGwJjVwI2 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493209" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493209" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548102" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:39 -0800 Message-Id: <20211211173545.23536-2-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 1/7] drm/i915/guc: Use correct context lock when callig clr_context_registered X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" s/ce/cn/ when grabbing guc_state.lock before calling clr_context_registered. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 1f9d4fde421f..9b7b4f4e0d91 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1937,9 +1937,9 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce) list_del_init(&cn->guc_id.link); ce->guc_id = cn->guc_id; - spin_lock(&ce->guc_state.lock); + spin_lock(&cn->guc_state.lock); clr_context_registered(cn); - spin_unlock(&ce->guc_state.lock); + spin_unlock(&cn->guc_state.lock); set_context_guc_id_invalid(cn); From patchwork Sat Dec 11 17:35:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 710DEC433EF for ; Sat, 11 Dec 2021 17:41:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 728BC10E5FF; Sat, 11 Dec 2021 17:41:16 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 65B2910E2E4; Sat, 11 Dec 2021 17:41:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244474; x=1670780474; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xs4Uezz0ZR31ss+ptKygyEXbVGcCanPcDYRuGPs0HPE=; b=YOH/sc1JyUmTepJ0Xq6vKa9fgLFb1Tz2ZjDszoF17DJ0k+FygRGBeegK O/flQ7e5NrEKaJu8sfKxVlGRSt+hNvunE94fHLmUcO21R8MVeTPePxOML Mw/CVVHC1n8aTanTau8EJ9r4Igwy0jKshSJdiU6ETJy3lWVNauOiwDBXr KgyH+KMoqpXygSZSMqr6Ikn+ROojwq/e8hS1i4shyjo0s1Rw8f+PzQ62A CucNiRecNWJhq+bJ+JR8ED5QlvqUE9JhCOV08bBrpBNTeHApOYCJ3iefk jNGHjU3gTB+ownmL2bjipL5+yP/UyHjBRYptYxdw2Gr0C+KOi0jziouWs Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493210" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493210" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548105" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:40 -0800 Message-Id: <20211211173545.23536-3-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 2/7] drm/i915/guc: Only assign guc_id.id when stealing guc_id X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Previously assigned whole guc_id structure (list, spin lock) which is incorrect, only assign the guc_id.id. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9b7b4f4e0d91..0fb2eeff0262 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1935,7 +1935,7 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce) GEM_BUG_ON(intel_context_is_parent(cn)); list_del_init(&cn->guc_id.link); - ce->guc_id = cn->guc_id; + ce->guc_id.id = cn->guc_id.id; spin_lock(&cn->guc_state.lock); clr_context_registered(cn); From patchwork Sat Dec 11 17:35:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0C246C433F5 for ; Sat, 11 Dec 2021 17:41:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A48FD10E602; Sat, 11 Dec 2021 17:41:15 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9C77F10E330; Sat, 11 Dec 2021 17:41:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244474; x=1670780474; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+t+IYa+onmrRcxvOJjKYFtf3l6LDynZbl3ClATBruNA=; b=bknpqkvGSRXJdpq2gD9V+AxXR3rAu4I/uYczksk/tPgDZO8bc8/rRuNv 4g+n1LwMwTdpse1ID+gdDFZAonmltHmppp0UywkH4ZlHC8EnLjtyGg1wv lUczSPq0Y3zepl8Mxb51lsypdtttsPUFgAqd/5ubabmcFHvVXBXTp4+Jy 7TlrC0Bhi+6Eq7Im60AsxWjQi7kgHQxF3fCKwO9zTTReKQNIrYaX8rmhs vs5ZAmBRi/rcGMlzXmWtelVB7BKxtd8qC0N3rrcuP/gro6xo7outbeqUl BEo19kVnvony2elVSUqri5hv6rNxvuJsTqNk7AkyoLCOxaXoQo6wrt065 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493211" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493211" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548108" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:41 -0800 Message-Id: <20211211173545.23536-4-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Remove racey GEM_BUG_ON X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" A full GT reset can race with the last context put resulting in the context ref count being zero but the destroyed bit not yet being set. Remove GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the destroyed bit must be set in ref count is zero. Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0fb2eeff0262..36c2965db49b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1040,8 +1040,6 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) spin_unlock(&ce->guc_state.lock); - GEM_BUG_ON(!do_put && !destroyed); - if (pending_enable || destroyed || deregister) { decr_outstanding_submission_g2h(guc); if (deregister) From patchwork Sat Dec 11 17:35:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CC833C433F5 for ; Sat, 11 Dec 2021 17:41:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 30D8310E631; Sat, 11 Dec 2021 17:41:17 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 454D310E5FF; Sat, 11 Dec 2021 17:41:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244475; x=1670780475; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jaCS0XMFCrae8Su33E7J0Rx/nsJ2glq4igpMu1fVzX0=; b=eHDeXScZR7IQG/7tRhbsrRE2ORqnHHVl6MMxyzo/EN49GbyC8YH42opo F1ScB9Sb5UikSzfa1ho3Q5ksSXh3tfqZAn4c9QgoJk9SanzaP/1PqFyga XPlLcbFGukui2GulujrY+oTcep0AgEp5ihxS1izRwF10RrE9KQaqyuNR2 rwt+ygqLVv1194F3ydwQui2zE2zDToxryCJvitC4QLcqTlDUP7/abGSt3 CgtgX4ZAWklolMeuf3LO4I7AhmS+VkYLpkP5JVdyhmebp7XpbsBLLu0vI 3MbH3nPHToiWMq2XvRZdgLu1lSCdx51TjT3DfiKiwIFXSVMBfJ011IoIK A==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493215" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493215" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548114" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:42 -0800 Message-Id: <20211211173545.23536-5-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison While attempting to debug a CT deadlock issue in various CI failures (most easily reproduced with gem_ctx_create/basic-files), I was seeing CPU deadlock errors being reported. This were because the context destroy loop was blocking waiting on H2G space from inside an IRQ spinlock. There no was deadlock as such, it's just that the H2G queue was full of context destroy commands and GuC was taking a long time to process them. However, the kernel was seeing the large amount of time spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would then happen (heartbeat failures, CT deadlock errors, outstanding H2G WARNs, etc.). Re-working the loop to only acquire the spinlock around the list management (which is all it is meant to protect) rather than the entire destroy operation seems to fix all the above issues. v2: (John Harrison) - Fix typo in comment message Signed-off-by: John Harrison Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 36c2965db49b..96fcf869e3ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) unsigned long flags; bool disabled; - lockdep_assert_held(&guc->submission_state.lock); GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id)); @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) } spin_unlock_irqrestore(&ce->guc_state.lock, flags); if (unlikely(disabled)) { - __release_guc_id(guc, ce); + release_guc_id(guc, ce); __guc_context_destroy(ce); return; } @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce) static void guc_flush_destroyed_contexts(struct intel_guc *guc) { - struct intel_context *ce, *cn; + struct intel_context *ce; unsigned long flags; GEM_BUG_ON(!submission_disabled(guc) && guc_submission_initialized(guc)); - spin_lock_irqsave(&guc->submission_state.lock, flags); - list_for_each_entry_safe(ce, cn, - &guc->submission_state.destroyed_contexts, - destroyed_link) { - list_del_init(&ce->destroyed_link); - __release_guc_id(guc, ce); + while (!list_empty(&guc->submission_state.destroyed_contexts)) { + spin_lock_irqsave(&guc->submission_state.lock, flags); + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, + struct intel_context, + destroyed_link); + if (ce) + list_del_init(&ce->destroyed_link); + spin_unlock_irqrestore(&guc->submission_state.lock, flags); + + if (!ce) + break; + + release_guc_id(guc, ce); __guc_context_destroy(ce); } - spin_unlock_irqrestore(&guc->submission_state.lock, flags); } static void deregister_destroyed_contexts(struct intel_guc *guc) { - struct intel_context *ce, *cn; + struct intel_context *ce; unsigned long flags; - spin_lock_irqsave(&guc->submission_state.lock, flags); - list_for_each_entry_safe(ce, cn, - &guc->submission_state.destroyed_contexts, - destroyed_link) { - list_del_init(&ce->destroyed_link); + while (!list_empty(&guc->submission_state.destroyed_contexts)) { + spin_lock_irqsave(&guc->submission_state.lock, flags); + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, + struct intel_context, + destroyed_link); + if (ce) + list_del_init(&ce->destroyed_link); + spin_unlock_irqrestore(&guc->submission_state.lock, flags); + + if (!ce) + break; + guc_lrc_desc_unpin(ce); } - spin_unlock_irqrestore(&guc->submission_state.lock, flags); } static void destroyed_worker_func(struct work_struct *w) From patchwork Sat Dec 11 17:35:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 756A7C433F5 for ; Sat, 11 Dec 2021 17:41:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 030BA10E60D; Sat, 11 Dec 2021 17:41:17 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id CE09B10E41B; Sat, 11 Dec 2021 17:41:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244474; x=1670780474; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EAI1zzp+pVh0LIc0SItFf3X1uGyMPd6uOR9y7CfFiI8=; b=MamXc0OOUL2QlS6tkl1Wu8ZgMgBIgUpxp7JO5p0XVdwcZiGEQixxq8z4 6SO63X12Vp27oCb+UrTBp0b9bteGMSOvEm0w6GEhWxwvG+fkSN5OmMx3W Eq6Fl1E7IzYd8FStgwXNqW18Nm/nLUJ+Wr2TW43y8WSv1EBRLoAZCqZu4 bdZ0yCTMz8EdwaFPkxdDpDMi0YBPX2bGCNZI2uX0X5/trn/ivaEJvpCYP RWb2Ho9BK1PyXJR9HnCm0pv+Ykv2x/rYwUlYra7ZTyN6N6pPw0+teALTf A5JWJzPSmvvD9djzBEkvPvJ/V0sBZwhm1vm9U+WSDT2fwG7qCH9b8G4tp Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493212" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493212" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548112" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:43 -0800 Message-Id: <20211211173545.23536-6-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 5/7] drm/i915/guc: Add extra debug on CT deadlock X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Print CT state (H2G + G2H head / tail pointers, credits) on CT deadlock. v2: (John Harrison) - Add units to debug messages Reviewed-by: John Harrison Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a0cc34be7b56..741be9abab68 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -523,6 +523,15 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n", ktime_ms_delta(ktime_get(), ct->stall_time), send->status, recv->status); + CT_ERROR(ct, "H2G Space: %u (Bytes)\n", + atomic_read(&ct->ctbs.send.space) * 4); + CT_ERROR(ct, "Head: %u (Dwords)\n", ct->ctbs.send.desc->head); + CT_ERROR(ct, "Tail: %u (Dwords)\n", ct->ctbs.send.desc->tail); + CT_ERROR(ct, "G2H Space: %u (Bytes)\n", + atomic_read(&ct->ctbs.recv.space) * 4); + CT_ERROR(ct, "Head: %u\n (Dwords)", ct->ctbs.recv.desc->head); + CT_ERROR(ct, "Tail: %u\n (Dwords)", ct->ctbs.recv.desc->tail); + ct->ctbs.send.broken = true; } From patchwork Sat Dec 11 17:35:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A471EC433EF for ; Sat, 11 Dec 2021 17:41:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CE75110E6AC; Sat, 11 Dec 2021 17:41:24 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1357910E330; Sat, 11 Dec 2021 17:41:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244475; x=1670780475; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QL7FferO/vmcvzY+uqMKOIStr97w9GvFQG07TpucjoU=; b=mknKjHNb52b3NtUtFGzhWmEl8n3LXoeZv4bo1+Ccpa5UZYhP9WgcjsfI TcQAwLtJNQeNKpmPYnD9N3xqJ/vY12ZEqsncoOhCFIK9sL/rZSHRRnsBA oRKg0tbGlcqywDbr08GwO/ybMx4WhkiJ/s/bbRc2t0UktVS5f9U+nJDe5 eGx7ECnFh5M5WaTpQT7s/pAfYogKreYFT4eHOOHh7EoD/4S4JTFCn9Tbt yrpgMB3ZJckoO1RKT+7BM12rSmrHCsfhq65XbTEA/ZdeoGbKlAwCn2K5r E66KE36L15SNr80haUtIzIdRNjyW98GYzxLTZwcw18K8BmuKwAUplYHhg g==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493213" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493213" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548117" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:44 -0800 Message-Id: <20211211173545.23536-7-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Kick G2H tasklet if no credits X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Let's be paranoid and kick the G2H tasklet, which dequeues messages, if G2H credits are exhausted. Signed-off-by: Matthew Brost Reviewed-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 741be9abab68..aa6dd6415202 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -591,12 +591,19 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) { + bool h2g = h2g_has_room(ct, h2g_dw); + bool g2h = g2h_has_room(ct, g2h_dw); + lockdep_assert_held(&ct->ctbs.send.lock); - if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { + if (unlikely(!h2g || !g2h)) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); + /* Be paranoid and kick G2H tasklet to free credits */ + if (!g2h) + tasklet_hi_schedule(&ct->receive_tasklet); + if (unlikely(ct_deadlocked(ct))) return -EPIPE; else From patchwork Sat Dec 11 17:35:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12671867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 58947C433FE for ; Sat, 11 Dec 2021 17:41:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D664410E6B2; Sat, 11 Dec 2021 17:41:24 +0000 (UTC) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2CC8010E5F2; Sat, 11 Dec 2021 17:41:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639244475; x=1670780475; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aIwPu6R5uRSS6xVK/GKMQE1X6C/KY+XUBfWCYQEPleo=; b=DdoscRGRLn4SvqelHicgaBuEOgUjA04ETZZ/i6uEKPk9oAFtui52N0Zr 5flGVI26eWdJFmXk/ate9fdRbLiudzLz/R0SlW9HcR+ChZiT4LYexlN4o FJMCGibZiOoijaFFfKasCbRj006cIEYNtR8ozRFvsHffL38H9pNCsLBq+ Mcd6u/qxMk9AfyTTYNXVCGatyu3rUjvZsfflXmacg0t+erTHNv7jVz3m3 vBsk5DLAac4PIaSTQeNUoQBuyTrQnln71WB4ElcMm7ELvhrwBhYisQwEG okrFd0myzu7GC8YATXaN86A48Y0cIML03Pl7GFJiMzdc4ahvh4Mebw3ix Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10195"; a="238493214" X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="238493214" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,198,1635231600"; d="scan'208";a="602548119" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2021 09:41:13 -0800 From: Matthew Brost To: , Date: Sat, 11 Dec 2021 09:35:45 -0800 Message-Id: <20211211173545.23536-8-matthew.brost@intel.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211211173545.23536-1-matthew.brost@intel.com> References: <20211211173545.23536-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Testing the stealing of guc ids is hard from user space as we have 64k guc_ids. Add a selftest, which artificially reduces the number of guc ids, and forces a steal. Description of test below. The test creates a spinner which is used to block all subsequent submissions until it completes. Next, a loop creates a context and a NOP request each iteration until the guc_ids are exhausted (request creation returns -EAGAIN). The spinner is ended, unblocking all requests created in the loop. At this point all guc_ids are exhausted but are available to steal. Try to create another request which should successfully steal a guc_id. Wait on last request to complete, idle GPU, verify a guc_id was stolen via a counter, and exit the test. Test also artificially reduces the number of guc_ids so the test runs in a timely manner. v2: (John Harrison) - s/stole/stolen - Fix some wording in test description - Rework indexing into context array - Add test description to commit message - Fix typo in commit message (Checkpatch) - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 +- drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 173 ++++++++++++++++++ 3 files changed, 196 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 1cb46098030d..f9240d4baa69 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -94,6 +94,11 @@ struct intel_guc { * @guc_ids: used to allocate new guc_ids, single-lrc */ struct ida guc_ids; + /** + * @num_guc_ids: Number of guc_ids, selftest feature to be able + * to reduce this number while testing. + */ + int num_guc_ids; /** * @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc */ @@ -202,6 +207,13 @@ struct intel_guc { */ struct delayed_work work; } timestamp; + +#ifdef CONFIG_DRM_I915_SELFTEST + /** + * @number_guc_id_stolen: The number of guc_ids that have been stolen + */ + int number_guc_id_stolen; +#endif }; static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 96fcf869e3ff..99414b49ca6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -145,7 +145,8 @@ guc_create_parallel(struct intel_engine_cs **engines, * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for * multi-lrc. */ -#define NUMBER_MULTI_LRC_GUC_ID (GUC_MAX_LRC_DESCRIPTORS / 16) +#define NUMBER_MULTI_LRC_GUC_ID(guc) \ + ((guc)->submission_state.num_guc_ids / 16) /* * Below is a set of functions which control the GuC scheduling state which @@ -1775,7 +1776,7 @@ int intel_guc_submission_init(struct intel_guc *guc) destroyed_worker_func); guc->submission_state.guc_ids_bitmap = - bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL); + bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL); if (!guc->submission_state.guc_ids_bitmap) return -ENOMEM; @@ -1869,13 +1870,13 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) if (intel_context_is_parent(ce)) ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap, - NUMBER_MULTI_LRC_GUC_ID, + NUMBER_MULTI_LRC_GUC_ID(guc), order_base_2(ce->parallel.number_children + 1)); else ret = ida_simple_get(&guc->submission_state.guc_ids, - NUMBER_MULTI_LRC_GUC_ID, - GUC_MAX_LRC_DESCRIPTORS, + NUMBER_MULTI_LRC_GUC_ID(guc), + guc->submission_state.num_guc_ids, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(ret < 0)) @@ -1941,6 +1942,10 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce) set_context_guc_id_invalid(cn); +#ifdef CONFIG_DRM_I915_SELFTEST + guc->number_guc_id_stolen++; +#endif + return 0; } else { return -EAGAIN; @@ -3779,6 +3784,7 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) { + guc->submission_state.num_guc_ids = GUC_MAX_LRC_DESCRIPTORS; guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); } diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c index fb0e4a7bd8ca..90f5eb66281c 100644 --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c @@ -3,8 +3,21 @@ * Copyright �� 2021 Intel Corporation */ +#include "selftests/igt_spinner.h" #include "selftests/intel_scheduler_helpers.h" +static int request_add_spin(struct i915_request *rq, struct igt_spinner *spin) +{ + int err = 0; + + i915_request_get(rq); + i915_request_add(rq); + if (spin && !igt_wait_for_spinner(spin, rq)) + err = -ETIMEDOUT; + + return err; +} + static struct i915_request *nop_user_request(struct intel_context *ce, struct i915_request *from) { @@ -110,10 +123,170 @@ static int intel_guc_scrub_ctbs(void *arg) return ret; } +/* + * intel_guc_steal_guc_ids - Test to exhaust all guc_ids and then steal one + * + * This test creates a spinner which is used to block all subsequent submissions + * until it completes. Next, a loop creates a context and a NOP request each + * iteration until the guc_ids are exhausted (request creation returns -EAGAIN). + * The spinner is ended, unblocking all requests created in the loop. At this + * point all guc_ids are exhausted but are available to steal. Try to create + * another request which should successfully steal a guc_id. Wait on last + * request to complete, idle GPU, verify a guc_id was stolen via a counter, and + * exit the test. Test also artificially reduces the number of guc_ids so the + * test runs in a timely manner. + */ +static int intel_guc_steal_guc_ids(void *arg) +{ + struct intel_gt *gt = arg; + struct intel_guc *guc = >->uc.guc; + int ret, sv, context_index = 0; + intel_wakeref_t wakeref; + struct intel_engine_cs *engine; + struct intel_context **ce; + struct igt_spinner spin; + struct i915_request *spin_rq = NULL, *rq, *last = NULL; + int number_guc_id_stolen = guc->number_guc_id_stolen; + + ce = kzalloc(sizeof(*ce) * GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL); + if (!ce) { + pr_err("Context array allocation failed\n"); + return -ENOMEM; + } + + wakeref = intel_runtime_pm_get(gt->uncore->rpm); + engine = intel_selftest_find_any_engine(gt); + sv = guc->submission_state.num_guc_ids; + guc->submission_state.num_guc_ids = 4096; + + /* Create spinner to block requests in below loop */ + ce[context_index] = intel_context_create(engine); + if (IS_ERR(ce[context_index])) { + ce[context_index] = NULL; + ret = PTR_ERR(ce[context_index]); + pr_err("Failed to create context: %d\n", ret); + goto err_wakeref; + } + ret = igt_spinner_init(&spin, engine->gt); + if (ret) { + pr_err("Failed to create spinner: %d\n", ret); + goto err_contexts; + } + spin_rq = igt_spinner_create_request(&spin, ce[context_index], + MI_ARB_CHECK); + if (IS_ERR(spin_rq)) { + ret = PTR_ERR(spin_rq); + pr_err("Failed to create spinner request: %d\n", ret); + goto err_contexts; + } + ret = request_add_spin(spin_rq, &spin); + if (ret) { + pr_err("Failed to add Spinner request: %d\n", ret); + goto err_spin_rq; + } + + /* Use all guc_ids */ + while (ret != -EAGAIN) { + ce[++context_index] = intel_context_create(engine); + if (IS_ERR(ce[context_index])) { + ce[context_index] = NULL; + ret = PTR_ERR(ce[context_index--]); + pr_err("Failed to create context: %d\n", ret); + goto err_spin_rq; + } + + rq = nop_user_request(ce[context_index], spin_rq); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + rq = NULL; + if (ret != -EAGAIN) { + pr_err("Failed to create request, %d: %d\n", + context_index, ret); + goto err_spin_rq; + } + } else { + if (last) + i915_request_put(last); + last = rq; + } + } + + /* Release blocked requests */ + igt_spinner_end(&spin); + ret = intel_selftest_wait_for_rq(spin_rq); + if (ret) { + pr_err("Spin request failed to complete: %d\n", ret); + i915_request_put(last); + goto err_spin_rq; + } + i915_request_put(spin_rq); + igt_spinner_fini(&spin); + spin_rq = NULL; + + /* Wait for last request */ + ret = i915_request_wait(last, 0, HZ * 30); + i915_request_put(last); + if (ret < 0) { + pr_err("Last request failed to complete: %d\n", ret); + goto err_spin_rq; + } + + /* Try to steal guc_id */ + rq = nop_user_request(ce[context_index], NULL); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + pr_err("Failed to steal guc_id, %d: %d\n", context_index, ret); + goto err_spin_rq; + } + + /* Wait for request with stolen guc_id */ + ret = i915_request_wait(rq, 0, HZ); + i915_request_put(rq); + if (ret < 0) { + pr_err("Request with stolen guc_id failed to complete: %d\n", + ret); + goto err_spin_rq; + } + + /* Wait for idle */ + ret = intel_gt_wait_for_idle(gt, HZ * 30); + if (ret < 0) { + pr_err("GT failed to idle: %d\n", ret); + goto err_spin_rq; + } + + /* Verify a guc_id got stolen */ + if (guc->number_guc_id_stolen == number_guc_id_stolen) { + pr_err("No guc_ids stolenn"); + ret = -EINVAL; + } else { + ret = 0; + } + +err_spin_rq: + if (spin_rq) { + igt_spinner_end(&spin); + intel_selftest_wait_for_rq(spin_rq); + i915_request_put(spin_rq); + igt_spinner_fini(&spin); + intel_gt_wait_for_idle(gt, HZ * 30); + } +err_contexts: + for (; context_index >= 0 && ce[context_index]; --context_index) + intel_context_put(ce[context_index]); +err_wakeref: + intel_runtime_pm_put(gt->uncore->rpm, wakeref); + kfree(ce); + guc->submission_state.num_guc_ids = sv; + + return ret; +} + int intel_guc_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { SUBTEST(intel_guc_scrub_ctbs), + SUBTEST(intel_guc_steal_guc_ids), }; struct intel_gt *gt = &i915->gt;