From patchwork Tue Aug  3 22:28:58 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417559
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 61E04C4320A
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:09 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 3609160184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:09 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3609160184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 8EBC36E952;
	Tue,  3 Aug 2021 22:12:06 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 1D6A26E147;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745888"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745888"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512692"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 01/46] drm/i915/guc: Allow flexible number of context ids
Date: Tue,  3 Aug 2021 15:28:58 -0700
Message-Id: <20210803222943.27686-2-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Number of available GuC contexts ids might be limited.
Stop referring in code to macro and use variable instead.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h           |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 +++++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index a9547069ee7e..1d7cb118e70f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -53,6 +53,8 @@ struct intel_guc {
 	 */
 	spinlock_t contexts_lock;
 	struct ida guc_ids;
+	u32 num_guc_ids;
+	u32 max_guc_ids;
 	struct list_head guc_id_list;
 
 	bool submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 89ff0e4b4bc7..abfccec7d062 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -275,7 +275,7 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-	GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
+	GEM_BUG_ON(index >= guc->max_guc_ids);
 
 	return &base[index];
 }
@@ -284,7 +284,7 @@ static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
 {
 	struct intel_context *ce = xa_load(&guc->context_lookup, id);
 
-	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+	GEM_BUG_ON(id >= guc->max_guc_ids);
 
 	return ce;
 }
@@ -294,8 +294,7 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 	u32 size;
 	int ret;
 
-	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
-			  GUC_MAX_LRC_DESCRIPTORS);
+	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * guc->max_guc_ids);
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
 					     (void **)&guc->lrc_desc_pool_vaddr);
 	if (ret)
@@ -1070,7 +1069,7 @@ static void guc_submit_request(struct i915_request *rq)
 static int new_guc_id(struct intel_guc *guc)
 {
 	return ida_simple_get(&guc->guc_ids, 0,
-			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+			      guc->num_guc_ids, GFP_KERNEL |
 			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 }
 
@@ -2562,6 +2561,8 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->max_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
+	guc->num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
 	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
@@ -2571,7 +2572,7 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 {
 	struct intel_context *ce;
 
-	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+	if (unlikely(desc_idx >= guc->max_guc_ids)) {
 		drm_err(&guc_to_gt(guc)->i915->drm,
 			"Invalid desc_idx %u", desc_idx);
 		return NULL;
@@ -2874,6 +2875,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 
 	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
 		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
+	drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
 	drm_printf(p, "GuC tasklet count: %u\n\n",
 		   atomic_read(&sched_engine->tasklet.count));
 
@@ -2913,7 +2916,6 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 {
 	struct intel_context *ce;
 	unsigned long index;
-
 	xa_for_each(&guc->context_lookup, index, ce) {
 		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
 		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);

From patchwork Tue Aug  3 22:28:59 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417529
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,
	USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8E9B9C43214
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:45 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 619C260184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:45 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 619C260184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id C1F5E6E92A;
	Tue,  3 Aug 2021 22:12:00 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id EE6DE89F8E;
 Tue,  3 Aug 2021 22:11:54 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745889"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745889"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512693"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 02/46] drm/i915/guc: Connect the number of guc_ids to debugfs
Date: Tue,  3 Aug 2021 15:28:59 -0700
Message-Id: <20210803222943.27686-3-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

For testing purposes it may make sense to reduce the number of guc_ids
available to be allocated. Add debugfs support for setting the number of
guc_ids.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 31 +++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 72ddfff42f7d..7c479c5e7b3a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -50,11 +50,42 @@ static int guc_registered_contexts_show(struct seq_file *m, void *data)
 }
 DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
 
+static int guc_num_id_get(void *data, u64 *val)
+{
+	struct intel_guc *guc = data;
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	*val = guc->num_guc_ids;
+
+	return 0;
+}
+
+static int guc_num_id_set(void *data, u64 val)
+{
+	struct intel_guc *guc = data;
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	if (val > guc->max_guc_ids)
+		val = guc->max_guc_ids;
+	else if (val < 256)
+		val = 256;
+
+	guc->num_guc_ids = val;
+
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(guc_num_id_fops, guc_num_id_get, guc_num_id_set, "%lld\n");
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
 		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
+		{ "guc_num_id", &guc_num_id_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index abfccec7d062..3b555c05c01c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2574,7 +2574,8 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 
 	if (unlikely(desc_idx >= guc->max_guc_ids)) {
 		drm_err(&guc_to_gt(guc)->i915->drm,
-			"Invalid desc_idx %u", desc_idx);
+			"Invalid desc_idx %u, max %u",
+			desc_idx, guc->max_guc_ids);
 		return NULL;
 	}
 

From patchwork Tue Aug  3 22:29:00 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417507
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5EA96C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:26 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id AC70760184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:23 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AC70760184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 55A636E8EB;
	Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 5099889D7D;
 Tue,  3 Aug 2021 22:11:54 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745890"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745890"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512695"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 03/46] drm/i915/guc: Don't return -EAGAIN to user when guc_ids
 exhausted
Date: Tue,  3 Aug 2021 15:29:00 -0700
Message-Id: <20210803222943.27686-4-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Rather than returning -EAGAIN to the user when no guc_ids are available,
implement a fair sharing algorithm in the kernel which blocks submissons
until guc_ids become available. Submissions are released one at a time,
based on priority, until the guc_id pressure is released to ensure fair
sharing of the guc_ids. Once the pressure is fully released, the normal
guc_id allocation (at request creation time in guc_request_alloc) can
resume as this allocation path should be significantly faster and a fair
sharing algorithm isn't needed when guc_ids are plentifully.

The fair sharing algorithm is implemented by forcing all submissions to
the tasklet which serializes submissions, dequeuing one at a time.

If the submission doesn't have a guc_id and new guc_id can't be found,
two lists are searched, one list with contexts that are not pinned but
still registered with the guc (searched first) and another list with
contexts that are pinned but do not have any submissions actively in
inflight (scheduling enabled + registered, searched second). If no
guc_ids can be found we kick a workqueue which will retire requests
hopefully freeing a guc_id. The workqueue + tasklet ping / pong back and
forth until a guc_id can be found.

Once a guc_id is found, we may have to disable context scheduling
depending on which list the context is stolen from. When we disable
scheduling, we block the tasklet from executing until the completion G2H
returns. The disable scheduling must be issued from the workqueue
because of the locking structure. When we deregister a context, we also
do the same thing (waiting on the G2H) but we can safely issue the
deregister H2G from the tasklet.

Once all the G2H have returned we can trigger a submission on the
context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  26 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 805 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h           |   6 +
 4 files changed, 754 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e54351a170e2..8ed964ef967b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -185,6 +185,9 @@ struct intel_context {
 	/* GuC LRC descriptor reference count */
 	atomic_t guc_id_ref;
 
+	/* Number of rq submitted without a guc_id */
+	u16 guc_num_rq_submit_no_id;
+
 	/*
 	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 1d7cb118e70f..e76579396efd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -33,7 +33,28 @@ struct intel_guc {
 
 	/* Global engine used to submit requests to GuC */
 	struct i915_sched_engine *sched_engine;
-	struct i915_request *stalled_request;
+
+	/* Global state related to submission tasklet */
+	struct i915_request *stalled_rq;
+	struct intel_context *stalled_context;
+	struct work_struct retire_worker;
+	unsigned long flags;
+	int total_num_rq_with_no_guc_id;
+
+	/*
+	 * Submisson stall reason. See intel_guc_submission.c for detailed
+	 * description.
+	 */
+	enum {
+		STALL_NONE,
+		STALL_GUC_ID_WORKQUEUE,
+		STALL_GUC_ID_TASKLET,
+		STALL_SCHED_DISABLE,
+		STALL_REGISTER_CONTEXT,
+		STALL_DEREGISTER_CONTEXT,
+		STALL_MOVE_LRC_TAIL,
+		STALL_ADD_REQUEST,
+	} submission_stall_reason;
 
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
@@ -55,7 +76,8 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
-	struct list_head guc_id_list;
+	struct list_head guc_id_list_no_ref;
+	struct list_head guc_id_list_unpinned;
 
 	bool submission_supported;
 	bool submission_selected;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 3b555c05c01c..f42a707f60ca 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -59,6 +59,25 @@
  * ELSP context descriptor dword into Work Item.
  * See guc_add_request()
  *
+ * GuC flow control state machine:
+ * The tasklet, workqueue (retire_worker), and the G2H handlers together more or
+ * less form a state machine which is used to submit requests + flow control
+ * requests, while waiting on resources / actions, if necessary. The enum,
+ * submission_stall_reason, controls the handoff of stalls between these
+ * entities with stalled_rq & stalled_context being the arguments. Each state
+ * described below.
+ *
+ * STALL_NONE			No stall condition
+ * STALL_GUC_ID_WORKQUEUE	Workqueue will try to free guc_ids
+ * STALL_GUC_ID_TASKLET		Tasklet will try to find guc_id
+ * STALL_SCHED_DISABLE		Workqueue will issue context schedule disable
+ *				H2G
+ * STALL_REGISTER_CONTEXT	Tasklet needs to register context
+ * STALL_DEREGISTER_CONTEXT	G2H handler is waiting for context deregister,
+ *				will register context upon receipt of G2H
+ * STALL_MOVE_LRC_TAIL		Tasklet will try to move LRC tail
+ * STALL_ADD_REQUEST		Tasklet will try to add the request (submit
+ *				context)
  */
 
 /* GuC Virtual Engine */
@@ -72,6 +91,83 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * Global GuC flags helper functions
+ */
+enum {
+	GUC_STATE_TASKLET_BLOCKED,
+	GUC_STATE_GUC_IDS_EXHAUSTED,
+};
+
+static bool tasklet_blocked(struct intel_guc *guc)
+{
+	return test_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+}
+
+static void set_tasklet_blocked(struct intel_guc *guc)
+{
+	lockdep_assert_held(&guc->sched_engine->lock);
+	set_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+}
+
+static void __clr_tasklet_blocked(struct intel_guc *guc)
+{
+	lockdep_assert_held(&guc->sched_engine->lock);
+	clear_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+}
+
+static void clr_tasklet_blocked(struct intel_guc *guc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	__clr_tasklet_blocked(guc);
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static bool guc_ids_exhausted(struct intel_guc *guc)
+{
+	return test_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+}
+
+static bool test_and_update_guc_ids_exhausted(struct intel_guc *guc)
+{
+	unsigned long flags;
+	bool ret = false;
+
+	/*
+	 * Strict ordering on checking if guc_ids are exhausted isn't required,
+	 * so let's avoid grabbing the submission lock if possible.
+	 */
+	if (guc_ids_exhausted(guc)) {
+		spin_lock_irqsave(&guc->sched_engine->lock, flags);
+		ret = guc_ids_exhausted(guc);
+		if (ret)
+			++guc->total_num_rq_with_no_guc_id;
+		spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+	}
+
+	return ret;
+}
+
+static void set_and_update_guc_ids_exhausted(struct intel_guc *guc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	++guc->total_num_rq_with_no_guc_id;
+	set_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static void clr_guc_ids_exhausted(struct intel_guc *guc)
+{
+	lockdep_assert_held(&guc->sched_engine->lock);
+	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id);
+
+	clear_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which do
  * not require a lock as all state transitions are mutually exclusive. i.e. It
@@ -82,6 +178,9 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
 #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
 #define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
+#define SCHED_STATE_NO_LOCK_BLOCK_TASKLET		BIT(3)
+#define SCHED_STATE_NO_LOCK_GUC_ID_STOLEN		BIT(4)
+#define SCHED_STATE_NO_LOCK_NEEDS_REGISTER		BIT(5)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -135,6 +234,60 @@ static inline void clr_context_registered(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_block_tasklet(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_BLOCK_TASKLET);
+}
+
+static inline void set_context_block_tasklet(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_BLOCK_TASKLET,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_block_tasklet(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_BLOCK_TASKLET,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline bool context_guc_id_stolen(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_GUC_ID_STOLEN);
+}
+
+static inline void set_context_guc_id_stolen(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_GUC_ID_STOLEN,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_guc_id_stolen(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_GUC_ID_STOLEN,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline bool context_needs_register(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_NEEDS_REGISTER);
+}
+
+static inline void set_context_needs_register(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_NEEDS_REGISTER,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_needs_register(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_NEEDS_REGISTER,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -418,9 +571,12 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 					      true, timeout);
 }
 
-static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+static inline bool request_has_no_guc_id(struct i915_request *rq)
+{
+	return test_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
+}
 
-static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err = 0;
 	struct intel_context *ce = rq->context;
@@ -439,18 +595,15 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		goto out;
 	}
 
+	/* Ensure context is in correct state before a submission */
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+	GEM_BUG_ON(request_has_no_guc_id(rq));
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_needs_register(ce));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
-
-	/*
-	 * Corner case where the GuC firmware was blown away and reloaded while
-	 * this context was pinned.
-	 */
-	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
-		err = guc_lrc_desc_pin(ce, false);
-		if (unlikely(err))
-			goto out;
-	}
+	GEM_BUG_ON(context_pending_disable(ce));
+	GEM_BUG_ON(context_wait_for_deregister_to_register(ce));
+	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 
 	/*
 	 * The request / context will be run on the hardware when scheduling
@@ -462,6 +615,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	enabled = context_enabled(ce);
 
 	if (!enabled) {
+		GEM_BUG_ON(context_pending_enable(ce));
+
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
 		action[len++] = GUC_CONTEXT_ENABLE;
@@ -489,6 +644,67 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+{
+	int ret;
+
+	lockdep_assert_held(&guc->sched_engine->lock);
+
+	ret = __guc_add_request(guc, rq);
+	if (ret == -EBUSY) {
+		guc->stalled_rq = rq;
+		guc->submission_stall_reason = STALL_ADD_REQUEST;
+	} else {
+		guc->stalled_rq = NULL;
+		guc->submission_stall_reason = STALL_NONE;
+	}
+
+	return ret;
+}
+
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+
+static int tasklet_register_context(struct intel_guc *guc,
+				    struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	int ret = 0;
+
+	/* Check state */
+	lockdep_assert_held(&guc->sched_engine->lock);
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+	GEM_BUG_ON(request_has_no_guc_id(rq));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+
+	/*
+	 * The guc_id is getting pinned during the tasklet and we need to
+	 * register this context or a corner case where the GuC firmware was
+	 * blown away and reloaded while this context was pinned
+	 */
+	if (unlikely((!lrc_desc_registered(guc, ce->guc_id) ||
+		      context_needs_register(ce)) &&
+		     !intel_context_is_banned(ce))) {
+		GEM_BUG_ON(context_pending_disable(ce));
+		GEM_BUG_ON(context_wait_for_deregister_to_register(ce));
+
+		ret = guc_lrc_desc_pin(ce, false);
+
+		if (likely(ret != -EBUSY))
+			clr_context_needs_register(ce);
+
+		if (unlikely(ret == -EBUSY)) {
+			guc->stalled_rq = rq;
+			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+		} else if (unlikely(ret == -EINPROGRESS)) {
+			guc->stalled_rq = rq;
+			guc->submission_stall_reason = STALL_DEREGISTER_CONTEXT;
+		}
+	}
+
+	return ret;
+}
+
 static inline void guc_set_lrc_tail(struct i915_request *rq)
 {
 	rq->context->lrc_reg_state[CTX_RING_TAIL] =
@@ -500,77 +716,142 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static void kick_retire_wq(struct intel_guc *guc)
+{
+	queue_work(system_unbound_wq, &guc->retire_worker);
+}
+
+static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq);
+
 static int guc_dequeue_one_context(struct intel_guc *guc)
 {
 	struct i915_sched_engine * const sched_engine = guc->sched_engine;
-	struct i915_request *last = NULL;
-	bool submit = false;
+	struct i915_request *last = guc->stalled_rq;
+	bool submit = !!last;
 	struct rb_node *rb;
 	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
+	GEM_BUG_ON(guc->stalled_context);
+	GEM_BUG_ON(!submit && guc->submission_stall_reason);
 
-	if (guc->stalled_request) {
-		submit = true;
-		last = guc->stalled_request;
-		goto resubmit;
-	}
+	if (submit) {
+		/* Flow control conditions */
+		switch (guc->submission_stall_reason) {
+		case STALL_GUC_ID_TASKLET:
+			goto done;
+		case STALL_REGISTER_CONTEXT:
+			goto register_context;
+		case STALL_MOVE_LRC_TAIL:
+			goto move_lrc_tail;
+		case STALL_ADD_REQUEST:
+			goto add_request;
+		default:
+			GEM_BUG_ON("Invalid stall state");
+		}
+	} else {
+		GEM_BUG_ON(!guc->total_num_rq_with_no_guc_id &&
+			   guc_ids_exhausted(guc));
 
-	while ((rb = rb_first_cached(&sched_engine->queue))) {
-		struct i915_priolist *p = to_priolist(rb);
-		struct i915_request *rq, *rn;
+		while ((rb = rb_first_cached(&sched_engine->queue))) {
+			struct i915_priolist *p = to_priolist(rb);
+			struct i915_request *rq, *rn;
 
-		priolist_for_each_request_consume(rq, rn, p) {
-			if (last && rq->context != last->context)
-				goto done;
+			priolist_for_each_request_consume(rq, rn, p) {
+				if (last && rq->context != last->context)
+					goto done;
 
-			list_del_init(&rq->sched.link);
+				list_del_init(&rq->sched.link);
 
-			__i915_request_submit(rq);
+				__i915_request_submit(rq);
 
-			trace_i915_request_in(rq, 0);
-			last = rq;
-			submit = true;
-		}
+				trace_i915_request_in(rq, 0);
+				last = rq;
+				submit = true;
+			}
 
-		rb_erase_cached(&p->node, &sched_engine->queue);
-		i915_priolist_free(p);
+			rb_erase_cached(&p->node, &sched_engine->queue);
+			i915_priolist_free(p);
+		}
 	}
+
 done:
 	if (submit) {
+		struct intel_context *ce = last->context;
+
+		if (ce->guc_num_rq_submit_no_id) {
+			ret = tasklet_pin_guc_id(guc, last);
+			if (ret)
+				goto blk_tasklet_kick;
+		}
+
+register_context:
+		ret = tasklet_register_context(guc, last);
+		if (unlikely(ret == -EINPROGRESS)) {
+			goto blk_tasklet;
+		} else if (unlikely(ret == -EPIPE)) {
+			goto deadlk;
+		} else if (ret == -EBUSY) {
+			goto schedule_tasklet;
+		} else if (unlikely(ret != 0)) {
+			GEM_WARN_ON(ret);	/* Unexpected */
+			goto deadlk;
+		}
+
+move_lrc_tail:
 		guc_set_lrc_tail(last);
-resubmit:
+
+add_request:
 		ret = guc_add_request(guc, last);
-		if (unlikely(ret == -EPIPE))
+		if (unlikely(ret == -EPIPE)) {
+			goto deadlk;
+		} else if (ret == -EBUSY) {
+			goto schedule_tasklet;
+		} else if (unlikely(ret != 0)) {
+			GEM_WARN_ON(ret);	/* Unexpected */
 			goto deadlk;
-		else if (ret == -EBUSY) {
-			tasklet_schedule(&sched_engine->tasklet);
-			guc->stalled_request = last;
-			return false;
 		}
 	}
 
-	guc->stalled_request = NULL;
+	/*
+	 * No requests without a guc_id, enable guc_id allocation at request
+	 * creation time (guc_request_alloc).
+	 */
+	if (!guc->total_num_rq_with_no_guc_id)
+		clr_guc_ids_exhausted(guc);
+
 	return submit;
 
+schedule_tasklet:
+	tasklet_schedule(&sched_engine->tasklet);
+	return false;
+
 deadlk:
 	sched_engine->tasklet.callback = NULL;
 	tasklet_disable_nosync(&sched_engine->tasklet);
 	return false;
+
+blk_tasklet_kick:
+	kick_retire_wq(guc);
+blk_tasklet:
+	set_tasklet_blocked(guc);
+	return false;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
+	struct intel_guc *guc = sched_engine->private_data;
 	unsigned long flags;
 	bool loop;
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	do {
-		loop = guc_dequeue_one_context(sched_engine->private_data);
-	} while (loop);
+	if (likely(!tasklet_blocked(guc)))
+		do {
+			loop = guc_dequeue_one_context(guc);
+		} while (loop);
 
 	i915_sched_engine_reset_on_empty(sched_engine);
 
@@ -653,6 +934,14 @@ submission_disabled(struct intel_guc *guc)
 			!__tasklet_is_enabled(&sched_engine->tasklet));
 }
 
+static void kick_tasklet(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	if (likely(!tasklet_blocked(guc)))
+		tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
 static void disable_submission(struct intel_guc *guc)
 {
 	struct i915_sched_engine * const sched_engine = guc->sched_engine;
@@ -676,8 +965,16 @@ static void enable_submission(struct intel_guc *guc)
 	    __tasklet_enable(&sched_engine->tasklet)) {
 		GEM_BUG_ON(!guc->ct.enabled);
 
+		/* Reset tasklet state */
+		guc->stalled_rq = NULL;
+		if (guc->stalled_context)
+			intel_context_put(guc->stalled_context);
+		guc->stalled_context = NULL;
+		guc->submission_stall_reason = STALL_NONE;
+		guc->flags = 0;
+
 		/* And kick in case we missed a new request submission. */
-		tasklet_hi_schedule(&sched_engine->tasklet);
+		kick_tasklet(guc);
 	}
 	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
 }
@@ -856,6 +1153,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 out_replay:
 	guc_reset_state(ce, head, stalled);
 	__unwind_incomplete_requests(ce);
+	ce->guc_num_rq_submit_no_id = 0;
 	intel_context_put(ce);
 }
 
@@ -888,6 +1186,7 @@ static void guc_cancel_context_requests(struct intel_context *ce)
 	spin_lock(&ce->guc_active.lock);
 	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
 		i915_request_put(i915_request_mark_eio(rq));
+	ce->guc_num_rq_submit_no_id = 0;
 	spin_unlock(&ce->guc_active.lock);
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
@@ -924,11 +1223,15 @@ guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
+			struct intel_context *ce = rq->context;
+
 			list_del_init(&rq->sched.link);
 
 			__i915_request_submit(rq);
 
 			i915_request_put(i915_request_mark_eio(rq));
+
+			ce->guc_num_rq_submit_no_id = 0;
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
@@ -980,6 +1283,51 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
+static void retire_worker_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce);
+
+static void retire_worker_func(struct work_struct *w)
+{
+	struct intel_guc *guc =
+		container_of(w, struct intel_guc, retire_worker);
+
+	/*
+	 * It is possible that another thread issues the schedule disable + that
+	 * G2H completes moving the state machine further along to a point
+	 * where nothing needs to be done here. Let's be paranoid and kick the
+	 * tasklet in that case.
+	 */
+	if (guc->submission_stall_reason != STALL_SCHED_DISABLE &&
+	    guc->submission_stall_reason != STALL_GUC_ID_WORKQUEUE) {
+		kick_tasklet(guc);
+		return;
+	}
+
+	if (guc->submission_stall_reason == STALL_SCHED_DISABLE) {
+		GEM_BUG_ON(!guc->stalled_context);
+		GEM_BUG_ON(context_guc_id_invalid(guc->stalled_context));
+
+		retire_worker_sched_disable(guc, guc->stalled_context);
+	}
+
+	/*
+	 * guc_id pressure, always try to release it regardless of state,
+	 * albeit after possibly issuing a schedule disable as that is async
+	 * operation.
+	 */
+	intel_gt_retire_requests(guc_to_gt(guc));
+
+	if (guc->submission_stall_reason == STALL_GUC_ID_WORKQUEUE) {
+		GEM_BUG_ON(guc->stalled_context);
+
+		/* Hopefully guc_ids are now available, kick tasklet */
+		guc->submission_stall_reason = STALL_GUC_ID_TASKLET;
+		clr_tasklet_blocked(guc);
+
+		kick_tasklet(guc);
+	}
+}
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1003,9 +1351,12 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
 	spin_lock_init(&guc->contexts_lock);
-	INIT_LIST_HEAD(&guc->guc_id_list);
+	INIT_LIST_HEAD(&guc->guc_id_list_no_ref);
+	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
 	ida_init(&guc->guc_ids);
 
+	INIT_WORK(&guc->retire_worker, retire_worker_func);
+
 	return 0;
 }
 
@@ -1022,10 +1373,28 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 				 struct i915_request *rq,
 				 int prio)
 {
+	bool empty = i915_sched_engine_is_empty(sched_engine);
+
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
 		      i915_sched_lookup_priolist(sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+	if (empty)
+		kick_tasklet(&rq->engine->gt->uc.guc);
+}
+
+static bool need_tasklet(struct intel_guc *guc, struct intel_context *ce)
+{
+	struct i915_sched_engine * const sched_engine =
+		ce->engine->sched_engine;
+
+	lockdep_assert_held(&sched_engine->lock);
+
+	return guc_ids_exhausted(guc) || submission_disabled(guc) ||
+		guc->stalled_rq || guc->stalled_context ||
+		!lrc_desc_registered(guc, ce->guc_id) ||
+		!i915_sched_engine_is_empty(sched_engine);
 }
 
 static int guc_bypass_tasklet_submit(struct intel_guc *guc,
@@ -1039,8 +1408,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 
 	guc_set_lrc_tail(rq);
 	ret = guc_add_request(guc, rq);
-	if (ret == -EBUSY)
-		guc->stalled_request = rq;
 
 	if (unlikely(ret == -EPIPE))
 		disable_submission(guc);
@@ -1057,11 +1424,10 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (submission_disabled(guc) || guc->stalled_request ||
-	    !i915_sched_engine_is_empty(sched_engine))
+	if (need_tasklet(guc, rq->context))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		tasklet_hi_schedule(&sched_engine->tasklet);
+		kick_tasklet(guc);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
@@ -1093,32 +1459,71 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 }
 
-static int steal_guc_id(struct intel_guc *guc)
+/*
+ * We have two lists for guc_ids available to steal. One list is for contexts
+ * that to have a zero guc_id_ref but are still pinned (scheduling enabled, only
+ * available inside tasklet) and the other is for contexts that are not pinned
+ * but still registered (available both outside and inside tasklet). Stealing
+ * from the latter only requires a deregister H2G, while the former requires a
+ * schedule disable H2G + a deregister H2G.
+ */
+static struct list_head *get_guc_id_list(struct intel_guc *guc,
+					 bool unpinned)
+{
+	if (unpinned)
+		return &guc->guc_id_list_unpinned;
+	else
+		return &guc->guc_id_list_no_ref;
+}
+
+static int steal_guc_id(struct intel_guc *guc, bool unpinned)
 {
 	struct intel_context *ce;
 	int guc_id;
+	struct list_head *guc_id_list = get_guc_id_list(guc, unpinned);
 
 	lockdep_assert_held(&guc->contexts_lock);
 
-	if (!list_empty(&guc->guc_id_list)) {
-		ce = list_first_entry(&guc->guc_id_list,
+	if (!list_empty(guc_id_list)) {
+		ce = list_first_entry(guc_id_list,
 				      struct intel_context,
 				      guc_id_link);
 
+		/* Ensure context getting stolen in expected state */
 		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
 		GEM_BUG_ON(context_guc_id_invalid(ce));
+		GEM_BUG_ON(context_guc_id_stolen(ce));
 
 		list_del_init(&ce->guc_id_link);
 		guc_id = ce->guc_id;
 		clr_context_registered(ce);
-		set_context_guc_id_invalid(ce);
+
+		/*
+		 * If stealing from the pinned list, defer invalidating
+		 * the guc_id until the retire workqueue processes this
+		 * context.
+		 */
+		if (!unpinned) {
+			GEM_BUG_ON(guc->stalled_context);
+			guc->stalled_context = intel_context_get(ce);
+			set_context_guc_id_stolen(ce);
+		} else {
+			set_context_guc_id_invalid(ce);
+		}
+
 		return guc_id;
 	} else {
 		return -EAGAIN;
 	}
 }
 
-static int assign_guc_id(struct intel_guc *guc, u16 *out)
+enum {	/* Return values for pin_guc_id / assign_guc_id */
+	SAME_GUC_ID		= 0,
+	NEW_GUC_ID_DISABLED	= 1,
+	NEW_GUC_ID_ENABLED	= 2,
+};
+
+static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
 {
 	int ret;
 
@@ -1126,17 +1531,33 @@ static int assign_guc_id(struct intel_guc *guc, u16 *out)
 
 	ret = new_guc_id(guc);
 	if (unlikely(ret < 0)) {
-		ret = steal_guc_id(guc);
-		if (ret < 0)
-			return ret;
+		ret = steal_guc_id(guc, true);
+		if (ret >= 0) {
+			*out = ret;
+			ret = NEW_GUC_ID_DISABLED;
+		} else if (ret < 0 && tasklet) {
+			/*
+			 * We only steal a guc_id from a context with scheduling
+			 * enabled if guc_ids are exhausted and we are submitting
+			 * from the tasklet.
+			 */
+			ret = steal_guc_id(guc, false);
+			if (ret >= 0) {
+				*out = ret;
+				ret = NEW_GUC_ID_ENABLED;
+			}
+		}
+	} else {
+		*out = ret;
+		ret = SAME_GUC_ID;
 	}
 
-	*out = ret;
-	return 0;
+	return ret;
 }
 
 #define PIN_GUC_ID_TRIES	4
-static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
+		      bool tasklet)
 {
 	int ret = 0;
 	unsigned long flags, tries = PIN_GUC_ID_TRIES;
@@ -1146,11 +1567,15 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 try_again:
 	spin_lock_irqsave(&guc->contexts_lock, flags);
 
+	if (!tasklet && guc_ids_exhausted(guc)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
 	if (context_guc_id_invalid(ce)) {
-		ret = assign_guc_id(guc, &ce->guc_id);
-		if (ret)
+		ret = assign_guc_id(guc, &ce->guc_id, tasklet);
+		if (unlikely(ret < 0))
 			goto out_unlock;
-		ret = 1;	/* Indidcates newly assigned guc_id */
 	}
 	if (!list_empty(&ce->guc_id_link))
 		list_del_init(&ce->guc_id_link);
@@ -1166,8 +1591,11 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	 * attempting to retire more requests. Double the sleep period each
 	 * subsequent pass before finally giving up. The sleep period has max of
 	 * 100ms and minimum of 1ms.
+	 *
+	 * We only try this if outside the tasklet, inside the tasklet we have a
+	 * (slower, more complex, blocking) different flow control algorithm.
 	 */
-	if (ret == -EAGAIN && --tries) {
+	if (ret == -EAGAIN && --tries && !tasklet) {
 		if (PIN_GUC_ID_TRIES - tries > 1) {
 			unsigned int timeslice_shifted =
 				ce->engine->props.timeslice_duration_ms <<
@@ -1184,7 +1612,9 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	return ret;
 }
 
-static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+static void unpin_guc_id(struct intel_guc *guc,
+			 struct intel_context *ce,
+			 bool unpinned)
 {
 	unsigned long flags;
 
@@ -1194,9 +1624,17 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 		return;
 
 	spin_lock_irqsave(&guc->contexts_lock, flags);
-	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
-	    !atomic_read(&ce->guc_id_ref))
-		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
+
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+
+	if (!context_guc_id_invalid(ce) && !context_guc_id_stolen(ce) &&
+	    !atomic_read(&ce->guc_id_ref)) {
+		struct list_head *head = get_guc_id_list(guc, unpinned);
+
+		list_add_tail(&ce->guc_id_link, head);
+	}
+
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 }
 
@@ -1300,6 +1738,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	int ret = 0;
 
 	GEM_BUG_ON(!engine->mask);
+	GEM_BUG_ON(context_guc_id_invalid(ce));
 
 	/*
 	 * Ensure LRC + CT vmas are is same region as write barrier is done
@@ -1342,6 +1781,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 		trace_intel_context_steal_guc_id(ce);
 		if (!loop) {
 			set_context_wait_for_deregister_to_register(ce);
+			set_context_block_tasklet(ce);
 			intel_context_get(ce);
 		} else {
 			bool disabled;
@@ -1369,7 +1809,14 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			ret = deregister_context(ce, ce->guc_id, loop);
 		if (unlikely(ret == -EBUSY)) {
 			clr_context_wait_for_deregister_to_register(ce);
+			clr_context_block_tasklet(ce);
 			intel_context_put(ce);
+		} else if (!loop && !ret) {
+			/*
+			 * A context de-registration has been issued from within
+			 * the tasklet. Need to block until it complete.
+			 */
+			return -EINPROGRESS;
 		} else if (unlikely(ret == -ENODEV)) {
 			ret = 0;	/* Will get registered later */
 		}
@@ -1425,7 +1872,9 @@ static void guc_context_unpin(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
-	unpin_guc_id(guc, ce);
+	GEM_BUG_ON(context_enabled(ce));
+
+	unpin_guc_id(guc, ce, true);
 	lrc_unpin(ce);
 }
 
@@ -1764,6 +2213,8 @@ static void guc_context_destroy(struct kref *kref)
 	unsigned long flags;
 	bool disabled;
 
+	GEM_BUG_ON(context_guc_id_stolen(ce));
+
 	/*
 	 * If the guc_id is invalid this context has been stolen and we can free
 	 * it immediately. Also can be freed immediately if the context is not
@@ -1925,6 +2376,9 @@ static void add_to_context(struct i915_request *rq)
 	spin_lock(&ce->guc_active.lock);
 	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
 
+	if (unlikely(request_has_no_guc_id(rq)))
+		++ce->guc_num_rq_submit_no_id;
+
 	if (rq->guc_prio == GUC_PRIO_INIT) {
 		rq->guc_prio = new_guc_prio;
 		add_context_inflight_prio(ce, rq->guc_prio);
@@ -1966,7 +2420,12 @@ static void remove_from_context(struct i915_request *rq)
 
 	spin_unlock_irq(&ce->guc_active.lock);
 
-	atomic_dec(&ce->guc_id_ref);
+	if (likely(!request_has_no_guc_id(rq)))
+		atomic_dec(&ce->guc_id_ref);
+	else
+		--ce_to_guc(rq->context)->total_num_rq_with_no_guc_id;
+	unpin_guc_id(ce_to_guc(ce), ce, false);
+
 	i915_request_notify_execute_cb_imm(rq);
 }
 
@@ -2018,13 +2477,144 @@ static void guc_signal_context_fence(struct intel_context *ce)
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 }
 
-static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
+static void invalidate_guc_id_sched_disable(struct intel_context *ce)
+{
+	set_context_guc_id_invalid(ce);
+	wmb();	/* Make sure guc_id invalidation visible first */
+	clr_context_guc_id_stolen(ce);
+}
+
+static void retire_worker_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce)
+{
+	unsigned long flags;
+	bool disabled;
+
+	guc->stalled_context = NULL;
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (!disabled && !context_pending_disable(ce) && context_enabled(ce)) {
+		/*
+		 * Still enabled, issue schedule disable + configure state so
+		 * when G2H returns tasklet is kicked.
+		 */
+
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+		u16 guc_id;
+
+		/*
+		 * We add +2 here as the schedule disable complete CTB handler
+		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
+		 */
+		GEM_BUG_ON(!atomic_read(&ce->pin_count));
+		atomic_add(2, &ce->pin_count);
+
+		set_context_block_tasklet(ce);
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			__guc_context_sched_disable(guc, ce, guc_id);
+
+		invalidate_guc_id_sched_disable(ce);
+	} else if (!disabled && context_pending_disable(ce)) {
+		/*
+		 * Schedule disable in flight, set bit to kick tasklet in G2H
+		 * handler and call it a day.
+		 */
+
+		set_context_block_tasklet(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		invalidate_guc_id_sched_disable(ce);
+	} else {
+		/* Schedule disable is done, kick tasklet */
+
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		invalidate_guc_id_sched_disable(ce);
+
+		guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+		clr_tasklet_blocked(guc);
+
+		kick_tasklet(ce_to_guc(ce));
+	}
+
+	intel_context_put(ce);
+}
+
+static bool context_needs_lrc_desc_pin(struct intel_context *ce, bool new_guc_id)
 {
 	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
 		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
 		!submission_disabled(ce_to_guc(ce));
 }
 
+static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	int ret = 0;
+
+	lockdep_assert_held(&guc->sched_engine->lock);
+	GEM_BUG_ON(!ce->guc_num_rq_submit_no_id);
+
+	if (atomic_add_unless(&ce->guc_id_ref, ce->guc_num_rq_submit_no_id, 0))
+		goto out;
+
+	ret = pin_guc_id(guc, ce, true);
+	if (unlikely(ret < 0)) {
+		/*
+		 * No guc_ids available, disable the tasklet and kick the retire
+		 * workqueue hopefully freeing up some guc_ids.
+		 */
+		guc->stalled_rq = rq;
+		guc->submission_stall_reason = STALL_GUC_ID_WORKQUEUE;
+		return ret;
+	}
+
+	if (ce->guc_num_rq_submit_no_id - 1 > 0)
+		atomic_add(ce->guc_num_rq_submit_no_id - 1,
+			   &ce->guc_id_ref);
+
+	if (context_needs_lrc_desc_pin(ce, !!ret))
+		set_context_needs_register(ce);
+
+	if (ret == NEW_GUC_ID_ENABLED) {
+		guc->stalled_rq = rq;
+		guc->submission_stall_reason = STALL_SCHED_DISABLE;
+	}
+
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+out:
+	guc->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
+	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id < 0);
+
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests, sched.link)
+		if (request_has_no_guc_id(rq)) {
+			--ce->guc_num_rq_submit_no_id;
+			clear_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED,
+				  &rq->fence.flags);
+		} else if (!ce->guc_num_rq_submit_no_id) {
+			break;
+		}
+
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+
+	/*
+	 * When NEW_GUC_ID_ENABLED is returned it means we are stealing a guc_id
+	 * from a context that has scheduling enabled. We have to disable
+	 * scheduling before deregistering the context and it isn't safe to do
+	 * in the tasklet because of lock inversion (ce->guc_state.lock must be
+	 * acquired before guc->sched_engine->lock). To work around this
+	 * we do the schedule disable in retire workqueue and block the tasklet
+	 * until the schedule done G2H returns. Returning non-zero here kicks
+	 * the workqueue.
+	 */
+	return (ret == NEW_GUC_ID_ENABLED) ? ret : 0;
+}
+
 static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
@@ -2056,6 +2646,15 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	rq->reserved_space -= GUC_REQUEST_SIZE;
 
+	/*
+	 * guc_ids are exhausted, don't allocate one here, defer to submission
+	 * in the tasklet.
+	 */
+	if (test_and_update_guc_ids_exhausted(guc)) {
+		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
+		goto out;
+	}
+
 	/*
 	 * Call pin_guc_id here rather than in the pinning step as with
 	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
@@ -2063,9 +2662,7 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * when guc_ids are being stolen due to over subscription. By the time
 	 * this function is reached, it is guaranteed that the guc_id will be
 	 * persistent until the generated request is retired. Thus, sealing these
-	 * race conditions. It is still safe to fail here if guc_ids are
-	 * exhausted and return -EAGAIN to the user indicating that they can try
-	 * again in the future.
+	 * race conditions.
 	 *
 	 * There is no need for a lock here as the timeline mutex ensures at
 	 * most one context can be executing this code path at once. The
@@ -2076,10 +2673,26 @@ static int guc_request_alloc(struct i915_request *rq)
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
 		goto out;
 
-	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
-	if (unlikely(ret < 0))
+	ret = pin_guc_id(guc, ce, false);	/* > 0 indicates new guc_id */
+	if (unlikely(ret == -EAGAIN)) {
+		/*
+		 * No guc_ids available, so we force this submission and all
+		 * future submissions to be serialized in the tasklet, sharing
+		 * the guc_ids on a per submission basis to ensure (more) fair
+		 * scheduling of submissions. Once the tasklet is flushed of
+		 * submissions we return to allocating guc_ids in this function.
+		 */
+		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
+		set_and_update_guc_ids_exhausted(guc);
+
+		return 0;
+	} else if (unlikely(ret < 0)) {
 		return ret;
-	if (context_needs_register(ce, !!ret)) {
+	}
+
+	GEM_BUG_ON(ret == NEW_GUC_ID_ENABLED);
+
+	if (context_needs_lrc_desc_pin(ce, !!ret)) {
 		ret = guc_lrc_desc_pin(ce, true);
 		if (unlikely(ret)) {	/* unwind */
 			if (ret == -EPIPE) {
@@ -2087,7 +2700,7 @@ static int guc_request_alloc(struct i915_request *rq)
 				goto out;	/* GPU will be reset */
 			}
 			atomic_dec(&ce->guc_id_ref);
-			unpin_guc_id(guc, ce);
+			unpin_guc_id(guc, ce, true);
 			return ret;
 		}
 	}
@@ -2358,7 +2971,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 					  struct intel_context *ce)
 {
 	if (context_guc_id_invalid(ce))
-		pin_guc_id(guc, ce);
+		pin_guc_id(guc, ce, false);
 	guc_lrc_desc_pin(ce, true);
 }
 
@@ -2625,6 +3238,16 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			register_context(ce, true);
 		guc_signal_context_fence(ce);
+		if (context_block_tasklet(ce)) {
+			GEM_BUG_ON(guc->submission_stall_reason !=
+				   STALL_DEREGISTER_CONTEXT);
+
+			clr_context_block_tasklet(ce);
+			guc->submission_stall_reason = STALL_MOVE_LRC_TAIL;
+			clr_tasklet_blocked(guc);
+
+			kick_tasklet(ce_to_guc(ce));
+		}
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
@@ -2688,6 +3311,14 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		guc_blocked_fence_complete(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
+		if (context_block_tasklet(ce)) {
+			clr_context_block_tasklet(ce);
+			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+			clr_tasklet_blocked(guc);
+
+			kick_tasklet(ce_to_guc(ce));
+		}
+
 		if (banned) {
 			guc_cancel_context_requests(ce);
 			intel_engine_signal_breadcrumbs(ce->engine);
@@ -2716,10 +3347,8 @@ static void capture_error_state(struct intel_guc *guc,
 
 static void guc_context_replay(struct intel_context *ce)
 {
-	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
-
 	__guc_reset_context(ce, true);
-	tasklet_hi_schedule(&sched_engine->tasklet);
+	kick_tasklet(ce_to_guc(ce));
 }
 
 static void guc_handle_context_reset(struct intel_guc *guc,
@@ -2878,8 +3507,16 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 		   atomic_read(&guc->outstanding_submission_g2h));
 	drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
 	drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
-	drm_printf(p, "GuC tasklet count: %u\n\n",
+	drm_printf(p, "GuC tasklet count: %u\n",
 		   atomic_read(&sched_engine->tasklet.count));
+	drm_printf(p, "GuC submit flags: 0x%04lx\n", guc->flags);
+	drm_printf(p, "GuC total number request without guc_id: %d\n",
+		   guc->total_num_rq_with_no_guc_id);
+	drm_printf(p, "GuC stall reason: %d\n", guc->submission_stall_reason);
+	drm_printf(p, "GuC stalled request: %s\n",
+		   yesno(guc->stalled_rq));
+	drm_printf(p, "GuC stalled context: %s\n\n",
+		   yesno(guc->stalled_context));
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 	drm_printf(p, "Requests in GuC submit tasklet:\n");
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 1bc1349ba3c2..807f76750cf4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -139,6 +139,12 @@ enum {
 	 * the GPU. Here we track such boost requests on a per-request basis.
 	 */
 	I915_FENCE_FLAG_BOOST,
+
+	/*
+	 * I915_FENCE_FLAG_GUC_ID_NOT_PINNED - Set to signal the GuC submission
+	 * tasklet that the guc_id isn't pinned.
+	 */
+	I915_FENCE_FLAG_GUC_ID_NOT_PINNED,
 };
 
 /**

From patchwork Tue Aug  3 22:29:01 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417503
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D6B2AC4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:21 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id A715660184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:21 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A715660184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 3E4756E8E3;
	Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 9329A89E06;
 Tue,  3 Aug 2021 22:11:54 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745891"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745891"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512696"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 04/46] drm/i915/guc: Don't allow requests not ready to consume
 all guc_ids
Date: Tue,  3 Aug 2021 15:29:01 -0700
Message-Id: <20210803222943.27686-5-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Add a heuristic which checks if over half of the available guc_ids are
currently consumed by requests not ready to be submitted. If this
heuristic is true at request creation time (normal guc_id allocation
location) force all submissions + guc_ids allocations to tasklet.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  3 ++
 drivers/gpu/drm/i915/gt/intel_reset.c         |  9 ++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  1 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 53 +++++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 5 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 8ed964ef967b..c01530d7dc67 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -188,6 +188,9 @@ struct intel_context {
 	/* Number of rq submitted without a guc_id */
 	u16 guc_num_rq_submit_no_id;
 
+	/* GuC number of requests not ready */
+	atomic_t guc_num_rq_not_ready;
+
 	/*
 	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 91200c43951f..ea763138197f 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,6 +22,7 @@
 #include "intel_reset.h"
 
 #include "uc/intel_guc.h"
+#include "uc/intel_guc_submission.h"
 
 #define RESET_MAX_RETRIES 3
 
@@ -850,6 +851,14 @@ static void nop_submit_request(struct i915_request *request)
 {
 	RQ_TRACE(request, "-EIO\n");
 
+	/*
+	 * XXX: Kinda ugly to check for GuC submission here but this function is
+	 * going away once we switch to the DRM scheduler so we can live with
+	 * this for now.
+	 */
+	if (intel_engine_uses_guc(request->engine))
+		intel_guc_decr_num_rq_not_ready(request->context);
+
 	request = i915_request_mark_eio(request);
 	if (request) {
 		i915_request_submit(request);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e76579396efd..917352c9f323 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -76,6 +76,7 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
+	atomic_t num_guc_ids_not_ready;
 	struct list_head guc_id_list_no_ref;
 	struct list_head guc_id_list_unpinned;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f42a707f60ca..ba750fc87af1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1384,6 +1384,41 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 		kick_tasklet(&rq->engine->gt->uc.guc);
 }
 
+/* Macro to tweak heuristic, using a simple over 50% not ready for now */
+#define TOO_MANY_GUC_IDS_NOT_READY(avail, consumed) \
+	((consumed) > (avail) / 2)
+static bool too_many_guc_ids_not_ready(struct intel_guc *guc,
+				       struct intel_context *ce)
+{
+	u32 available_guc_ids, guc_ids_consumed;
+
+	available_guc_ids = guc->num_guc_ids;
+	guc_ids_consumed = atomic_read(&guc->num_guc_ids_not_ready);
+
+	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
+		set_and_update_guc_ids_exhausted(guc);
+		return true;
+	}
+
+	return false;
+}
+
+static void incr_num_rq_not_ready(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
+		atomic_inc(&guc->num_guc_ids_not_ready);
+}
+
+void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1)
+		atomic_dec(&guc->num_guc_ids_not_ready);
+}
+
 static bool need_tasklet(struct intel_guc *guc, struct intel_context *ce)
 {
 	struct i915_sched_engine * const sched_engine =
@@ -1430,6 +1465,8 @@ static void guc_submit_request(struct i915_request *rq)
 		kick_tasklet(guc);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
+
+	intel_guc_decr_num_rq_not_ready(rq->context);
 }
 
 static int new_guc_id(struct intel_guc *guc)
@@ -2647,10 +2684,13 @@ static int guc_request_alloc(struct i915_request *rq)
 	rq->reserved_space -= GUC_REQUEST_SIZE;
 
 	/*
-	 * guc_ids are exhausted, don't allocate one here, defer to submission
-	 * in the tasklet.
+	 * guc_ids are exhausted or a heuristic is met indicating too many
+	 * guc_ids are waiting on requests with submission dependencies (not
+	 * ready to submit). Don't allocate one here, defer to submission in the
+	 * tasklet.
 	 */
-	if (test_and_update_guc_ids_exhausted(guc)) {
+	if (test_and_update_guc_ids_exhausted(guc) ||
+	    too_many_guc_ids_not_ready(guc, ce)) {
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
 		goto out;
 	}
@@ -2684,6 +2724,7 @@ static int guc_request_alloc(struct i915_request *rq)
 		 */
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
 		set_and_update_guc_ids_exhausted(guc);
+		incr_num_rq_not_ready(ce);
 
 		return 0;
 	} else if (unlikely(ret < 0)) {
@@ -2708,6 +2749,8 @@ static int guc_request_alloc(struct i915_request *rq)
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
 out:
+	incr_num_rq_not_ready(ce);
+
 	/*
 	 * We block all requests on this context if a G2H is pending for a
 	 * schedule disable or context deregistration as the GuC will fail a
@@ -3512,6 +3555,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 	drm_printf(p, "GuC submit flags: 0x%04lx\n", guc->flags);
 	drm_printf(p, "GuC total number request without guc_id: %d\n",
 		   guc->total_num_rq_with_no_guc_id);
+	drm_printf(p, "GuC Number GuC IDs not ready: %d\n",
+		   atomic_read(&guc->num_guc_ids_not_ready));
 	drm_printf(p, "GuC stall reason: %d\n", guc->submission_stall_reason);
 	drm_printf(p, "GuC stalled request: %s\n",
 		   yesno(guc->stalled_rq));
@@ -3567,6 +3612,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 			   atomic_read(&ce->pin_count));
 		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
 			   atomic_read(&ce->guc_id_ref));
+		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
+			   atomic_read(&ce->guc_num_rq_not_ready));
 		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
 			   ce->guc_state.sched_state,
 			   atomic_read(&ce->guc_sched_state_no_lock));
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c7ef44fa0c36..17af5e123b09 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -51,4 +51,6 @@ static inline bool intel_guc_submission_is_used(struct intel_guc *guc)
 	return intel_guc_is_used(guc) && intel_guc_submission_is_wanted(guc);
 }
 
+void intel_guc_decr_num_rq_not_ready(struct intel_context *ce);
+
 #endif

From patchwork Tue Aug  3 22:29:02 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417555
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9D52FC43214
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:04 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 709AF60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:04 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 709AF60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id C94A26E953;
	Tue,  3 Aug 2021 22:12:06 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 5DA8A6E16D;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745892"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745892"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512697"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 05/46] drm/i915/guc: Introduce guc_submit_engine object
Date: Tue,  3 Aug 2021 15:29:02 -0700
Message-Id: <20210803222943.27686-6-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Move fields related to controlling the GuC submission state machine to a
unique object (guc_submit_engine) rather than the global GuC state
(intel_guc). This encapsulation allows multiple instances of submission
objects to operate in parallel and a single instance can block if needed
while another can make forward progress. This is analogous to how the
execlist mode works assigning a schedule object per physical engine but
rather in GuC mode we assign a schedule object based on the blocking
dependencies.

The guc_submit_engine object also encapsulates the i915_sched_engine
object as well.

Lots of find-replace.

Currently only 1 guc_submit_engine instantiated, future patches will
instantiate more.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  33 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 557 +++++++++++-------
 .../i915/gt/uc/intel_guc_submission_types.h   |  52 ++
 drivers/gpu/drm/i915/i915_scheduler.c         |  22 +-
 drivers/gpu/drm/i915/i915_scheduler.h         |   3 +
 5 files changed, 410 insertions(+), 257 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 917352c9f323..8ac016201658 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -21,6 +21,11 @@
 
 struct __guc_ads_blob;
 
+enum {
+	GUC_SUBMIT_ENGINE_SINGLE_LRC,
+	GUC_SUBMIT_ENGINE_MAX
+};
+
 /*
  * Top level structure of GuC. It handles firmware loading and manages client
  * pool. intel_guc owns a intel_guc_client to replace the legacy ExecList
@@ -31,31 +36,6 @@ struct intel_guc {
 	struct intel_guc_log log;
 	struct intel_guc_ct ct;
 
-	/* Global engine used to submit requests to GuC */
-	struct i915_sched_engine *sched_engine;
-
-	/* Global state related to submission tasklet */
-	struct i915_request *stalled_rq;
-	struct intel_context *stalled_context;
-	struct work_struct retire_worker;
-	unsigned long flags;
-	int total_num_rq_with_no_guc_id;
-
-	/*
-	 * Submisson stall reason. See intel_guc_submission.c for detailed
-	 * description.
-	 */
-	enum {
-		STALL_NONE,
-		STALL_GUC_ID_WORKQUEUE,
-		STALL_GUC_ID_TASKLET,
-		STALL_SCHED_DISABLE,
-		STALL_REGISTER_CONTEXT,
-		STALL_DEREGISTER_CONTEXT,
-		STALL_MOVE_LRC_TAIL,
-		STALL_ADD_REQUEST,
-	} submission_stall_reason;
-
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
@@ -68,6 +48,8 @@ struct intel_guc {
 		void (*disable)(struct intel_guc *guc);
 	} interrupts;
 
+	struct guc_submit_engine *gse[GUC_SUBMIT_ENGINE_MAX];
+
 	/*
 	 * contexts_lock protects the pool of free guc ids and a linked list of
 	 * guc ids available to be stolen
@@ -76,7 +58,6 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
-	atomic_t num_guc_ids_not_ready;
 	struct list_head guc_id_list_no_ref;
 	struct list_head guc_id_list_unpinned;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ba750fc87af1..842094de848d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -21,6 +21,7 @@
 #include "gt/intel_ring.h"
 
 #include "intel_guc_submission.h"
+#include "intel_guc_submission_types.h"
 
 #include "i915_drv.h"
 #include "i915_trace.h"
@@ -57,7 +58,7 @@
  * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which
  * represents in-order queue. The kernel driver packs ring tail pointer and an
  * ELSP context descriptor dword into Work Item.
- * See guc_add_request()
+ * See gse_add_request()
  *
  * GuC flow control state machine:
  * The tasklet, workqueue (retire_worker), and the G2H handlers together more or
@@ -80,57 +81,57 @@
  *				context)
  */
 
-/* GuC Virtual Engine */
-struct guc_virtual_engine {
-	struct intel_engine_cs base;
-	struct intel_context context;
-};
-
 static struct intel_context *
 guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+static inline struct guc_submit_engine *ce_to_gse(struct intel_context *ce)
+{
+	return container_of(ce->engine->sched_engine, struct guc_submit_engine,
+			    sched_engine);
+}
+
 /*
  * Global GuC flags helper functions
  */
 enum {
-	GUC_STATE_TASKLET_BLOCKED,
-	GUC_STATE_GUC_IDS_EXHAUSTED,
+	GSE_STATE_TASKLET_BLOCKED,
+	GSE_STATE_GUC_IDS_EXHAUSTED,
 };
 
-static bool tasklet_blocked(struct intel_guc *guc)
+static bool tasklet_blocked(struct guc_submit_engine *gse)
 {
-	return test_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+	return test_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
-static void set_tasklet_blocked(struct intel_guc *guc)
+static void set_tasklet_blocked(struct guc_submit_engine *gse)
 {
-	lockdep_assert_held(&guc->sched_engine->lock);
-	set_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+	lockdep_assert_held(&gse->sched_engine.lock);
+	set_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
-static void __clr_tasklet_blocked(struct intel_guc *guc)
+static void __clr_tasklet_blocked(struct guc_submit_engine *gse)
 {
-	lockdep_assert_held(&guc->sched_engine->lock);
-	clear_bit(GUC_STATE_TASKLET_BLOCKED, &guc->flags);
+	lockdep_assert_held(&gse->sched_engine.lock);
+	clear_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
-static void clr_tasklet_blocked(struct intel_guc *guc)
+static void clr_tasklet_blocked(struct guc_submit_engine *gse)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&guc->sched_engine->lock, flags);
-	__clr_tasklet_blocked(guc);
-	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+	spin_lock_irqsave(&gse->sched_engine.lock, flags);
+	__clr_tasklet_blocked(gse);
+	spin_unlock_irqrestore(&gse->sched_engine.lock, flags);
 }
 
-static bool guc_ids_exhausted(struct intel_guc *guc)
+static bool guc_ids_exhausted(struct guc_submit_engine *gse)
 {
-	return test_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+	return test_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
 }
 
-static bool test_and_update_guc_ids_exhausted(struct intel_guc *guc)
+static bool test_and_update_guc_ids_exhausted(struct guc_submit_engine *gse)
 {
 	unsigned long flags;
 	bool ret = false;
@@ -139,33 +140,33 @@ static bool test_and_update_guc_ids_exhausted(struct intel_guc *guc)
 	 * Strict ordering on checking if guc_ids are exhausted isn't required,
 	 * so let's avoid grabbing the submission lock if possible.
 	 */
-	if (guc_ids_exhausted(guc)) {
-		spin_lock_irqsave(&guc->sched_engine->lock, flags);
-		ret = guc_ids_exhausted(guc);
+	if (guc_ids_exhausted(gse)) {
+		spin_lock_irqsave(&gse->sched_engine.lock, flags);
+		ret = guc_ids_exhausted(gse);
 		if (ret)
-			++guc->total_num_rq_with_no_guc_id;
-		spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+			++gse->total_num_rq_with_no_guc_id;
+		spin_unlock_irqrestore(&gse->sched_engine.lock, flags);
 	}
 
 	return ret;
 }
 
-static void set_and_update_guc_ids_exhausted(struct intel_guc *guc)
+static void set_and_update_guc_ids_exhausted(struct guc_submit_engine *gse)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&guc->sched_engine->lock, flags);
-	++guc->total_num_rq_with_no_guc_id;
-	set_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
-	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+	spin_lock_irqsave(&gse->sched_engine.lock, flags);
+	++gse->total_num_rq_with_no_guc_id;
+	set_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
+	spin_unlock_irqrestore(&gse->sched_engine.lock, flags);
 }
 
-static void clr_guc_ids_exhausted(struct intel_guc *guc)
+static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
 {
-	lockdep_assert_held(&guc->sched_engine->lock);
-	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id);
+	lockdep_assert_held(&gse->sched_engine.lock);
+	GEM_BUG_ON(gse->total_num_rq_with_no_guc_id);
 
-	clear_bit(GUC_STATE_GUC_IDS_EXHAUSTED, &guc->flags);
+	clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
 }
 
 /*
@@ -419,6 +420,20 @@ static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
 	return &ce->engine->gt->uc.guc;
 }
 
+static inline struct i915_sched_engine *
+ce_to_sched_engine(struct intel_context *ce)
+{
+	return ce->engine->sched_engine;
+}
+
+static inline struct i915_sched_engine *
+guc_to_sched_engine(struct intel_guc *guc, int index)
+{
+	GEM_BUG_ON(index < 0 || index >= GUC_SUBMIT_ENGINE_MAX);
+
+	return &guc->gse[index]->sched_engine;
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -644,19 +659,20 @@ static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
-static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int gse_add_request(struct guc_submit_engine *gse,
+			   struct i915_request *rq)
 {
 	int ret;
 
-	lockdep_assert_held(&guc->sched_engine->lock);
+	lockdep_assert_held(&gse->sched_engine.lock);
 
-	ret = __guc_add_request(guc, rq);
+	ret = __guc_add_request(gse->sched_engine.private_data, rq);
 	if (ret == -EBUSY) {
-		guc->stalled_rq = rq;
-		guc->submission_stall_reason = STALL_ADD_REQUEST;
+		gse->stalled_rq = rq;
+		gse->submission_stall_reason = STALL_ADD_REQUEST;
 	} else {
-		guc->stalled_rq = NULL;
-		guc->submission_stall_reason = STALL_NONE;
+		gse->stalled_rq = NULL;
+		gse->submission_stall_reason = STALL_NONE;
 	}
 
 	return ret;
@@ -664,14 +680,15 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
 
-static int tasklet_register_context(struct intel_guc *guc,
+static int tasklet_register_context(struct guc_submit_engine *gse,
 				    struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
+	struct intel_guc *guc = gse->sched_engine.private_data;
 	int ret = 0;
 
 	/* Check state */
-	lockdep_assert_held(&guc->sched_engine->lock);
+	lockdep_assert_held(&gse->sched_engine.lock);
 	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
 	GEM_BUG_ON(request_has_no_guc_id(rq));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
@@ -694,11 +711,11 @@ static int tasklet_register_context(struct intel_guc *guc,
 			clr_context_needs_register(ce);
 
 		if (unlikely(ret == -EBUSY)) {
-			guc->stalled_rq = rq;
-			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
+			gse->stalled_rq = rq;
+			gse->submission_stall_reason = STALL_REGISTER_CONTEXT;
 		} else if (unlikely(ret == -EINPROGRESS)) {
-			guc->stalled_rq = rq;
-			guc->submission_stall_reason = STALL_DEREGISTER_CONTEXT;
+			gse->stalled_rq = rq;
+			gse->submission_stall_reason = STALL_DEREGISTER_CONTEXT;
 		}
 	}
 
@@ -716,28 +733,29 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
-static void kick_retire_wq(struct intel_guc *guc)
+static void kick_retire_wq(struct guc_submit_engine *gse)
 {
-	queue_work(system_unbound_wq, &guc->retire_worker);
+	queue_work(system_unbound_wq, &gse->retire_worker);
 }
 
-static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq);
+static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
+			      struct i915_request *rq);
 
-static int guc_dequeue_one_context(struct intel_guc *guc)
+static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
-	struct i915_request *last = guc->stalled_rq;
+	struct i915_sched_engine * const sched_engine = &gse->sched_engine;
+	struct i915_request *last = gse->stalled_rq;
 	bool submit = !!last;
 	struct rb_node *rb;
 	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
-	GEM_BUG_ON(guc->stalled_context);
-	GEM_BUG_ON(!submit && guc->submission_stall_reason);
+	GEM_BUG_ON(gse->stalled_context);
+	GEM_BUG_ON(!submit && gse->submission_stall_reason);
 
 	if (submit) {
 		/* Flow control conditions */
-		switch (guc->submission_stall_reason) {
+		switch (gse->submission_stall_reason) {
 		case STALL_GUC_ID_TASKLET:
 			goto done;
 		case STALL_REGISTER_CONTEXT:
@@ -750,8 +768,8 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			GEM_BUG_ON("Invalid stall state");
 		}
 	} else {
-		GEM_BUG_ON(!guc->total_num_rq_with_no_guc_id &&
-			   guc_ids_exhausted(guc));
+		GEM_BUG_ON(!gse->total_num_rq_with_no_guc_id &&
+			   guc_ids_exhausted(gse));
 
 		while ((rb = rb_first_cached(&sched_engine->queue))) {
 			struct i915_priolist *p = to_priolist(rb);
@@ -780,13 +798,13 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 		struct intel_context *ce = last->context;
 
 		if (ce->guc_num_rq_submit_no_id) {
-			ret = tasklet_pin_guc_id(guc, last);
+			ret = tasklet_pin_guc_id(gse, last);
 			if (ret)
 				goto blk_tasklet_kick;
 		}
 
 register_context:
-		ret = tasklet_register_context(guc, last);
+		ret = tasklet_register_context(gse, last);
 		if (unlikely(ret == -EINPROGRESS)) {
 			goto blk_tasklet;
 		} else if (unlikely(ret == -EPIPE)) {
@@ -802,7 +820,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 		guc_set_lrc_tail(last);
 
 add_request:
-		ret = guc_add_request(guc, last);
+		ret = gse_add_request(gse, last);
 		if (unlikely(ret == -EPIPE)) {
 			goto deadlk;
 		} else if (ret == -EBUSY) {
@@ -817,8 +835,8 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	 * No requests without a guc_id, enable guc_id allocation at request
 	 * creation time (guc_request_alloc).
 	 */
-	if (!guc->total_num_rq_with_no_guc_id)
-		clr_guc_ids_exhausted(guc);
+	if (!gse->total_num_rq_with_no_guc_id)
+		clr_guc_ids_exhausted(gse);
 
 	return submit;
 
@@ -832,25 +850,26 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	return false;
 
 blk_tasklet_kick:
-	kick_retire_wq(guc);
+	kick_retire_wq(gse);
 blk_tasklet:
-	set_tasklet_blocked(guc);
+	set_tasklet_blocked(gse);
 	return false;
 }
 
-static void guc_submission_tasklet(struct tasklet_struct *t)
+static void gse_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
-	struct intel_guc *guc = sched_engine->private_data;
+	struct guc_submit_engine *gse =
+		container_of(sched_engine, typeof(*gse), sched_engine);
 	unsigned long flags;
 	bool loop;
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (likely(!tasklet_blocked(guc)))
+	if (likely(!tasklet_blocked(gse)))
 		do {
-			loop = guc_dequeue_one_context(guc);
+			loop = gse_dequeue_one_context(gse);
 		} while (loop);
 
 	i915_sched_engine_reset_on_empty(sched_engine);
@@ -925,69 +944,99 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 	}
 }
 
-static inline bool
-submission_disabled(struct intel_guc *guc)
+static bool submission_disabled(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	int i;
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine;
+
+		if (unlikely(!guc->gse[i]))
+			return true;
+
+		sched_engine = guc_to_sched_engine(guc, i);
+
+		if (unlikely(!__tasklet_is_enabled(&sched_engine->tasklet)))
+			return true;
+	}
 
-	return unlikely(!sched_engine ||
-			!__tasklet_is_enabled(&sched_engine->tasklet));
+	return false;
 }
 
-static void kick_tasklet(struct intel_guc *guc)
+static void kick_tasklet(struct guc_submit_engine *gse)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 
-	if (likely(!tasklet_blocked(guc)))
+	if (likely(!tasklet_blocked(gse)))
 		tasklet_hi_schedule(&sched_engine->tasklet);
 }
 
 static void disable_submission(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	int i;
 
-	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
-		GEM_BUG_ON(!guc->ct.enabled);
-		__tasklet_disable_sync_once(&sched_engine->tasklet);
-		sched_engine->tasklet.callback = NULL;
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		if (__tasklet_is_enabled(&sched_engine->tasklet)) {
+			GEM_BUG_ON(!guc->ct.enabled);
+			__tasklet_disable_sync_once(&sched_engine->tasklet);
+			sched_engine->tasklet.callback = NULL;
+		}
 	}
 }
 
 static void enable_submission(struct intel_guc *guc)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
 	unsigned long flags;
+	int i;
 
-	spin_lock_irqsave(&guc->sched_engine->lock, flags);
-	sched_engine->tasklet.callback = guc_submission_tasklet;
-	wmb();	/* Make sure callback visible */
-	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
-	    __tasklet_enable(&sched_engine->tasklet)) {
-		GEM_BUG_ON(!guc->ct.enabled);
-
-		/* Reset tasklet state */
-		guc->stalled_rq = NULL;
-		if (guc->stalled_context)
-			intel_context_put(guc->stalled_context);
-		guc->stalled_context = NULL;
-		guc->submission_stall_reason = STALL_NONE;
-		guc->flags = 0;
-
-		/* And kick in case we missed a new request submission. */
-		kick_tasklet(guc);
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+		struct guc_submit_engine *gse = guc->gse[i];
+
+		spin_lock_irqsave(&sched_engine->lock, flags);
+		sched_engine->tasklet.callback = gse_submission_tasklet;
+		wmb();	/* Mask sure callback is visible */
+		if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
+		    __tasklet_enable(&sched_engine->tasklet)) {
+			GEM_BUG_ON(!guc->ct.enabled);
+
+			/* Reset GuC submit engine state */
+			gse->stalled_rq = NULL;
+			if (gse->stalled_context)
+				intel_context_put(gse->stalled_context);
+			gse->stalled_context = NULL;
+			gse->submission_stall_reason = STALL_NONE;
+			gse->flags = 0;
+
+			/* And kick in case we missed a new request submission. */
+			kick_tasklet(gse);
+		}
+		spin_unlock_irqrestore(&sched_engine->lock, flags);
 	}
-	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
 }
 
-static void guc_flush_submissions(struct intel_guc *guc)
+static void gse_flush_submissions(struct guc_submit_engine *gse)
 {
-	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_sched_engine * const sched_engine = &gse->sched_engine;
 	unsigned long flags;
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+static void guc_flush_submissions(struct intel_guc *guc)
+{
+	int i;
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
+		if (likely(guc->gse[i]))
+			gse_flush_submissions(guc->gse[i]);
+}
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
 	int i;
@@ -1171,13 +1220,12 @@ void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
 		if (intel_context_is_pinned(ce))
 			__guc_reset_context(ce, stalled);
 
-	/* GuC is blown away, drop all references to contexts */
 	xa_destroy(&guc->context_lookup);
 }
 
 static void guc_cancel_context_requests(struct intel_context *ce)
 {
-	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
+	struct i915_sched_engine *sched_engine = ce_to_sched_engine(ce);
 	struct i915_request *rq;
 	unsigned long flags;
 
@@ -1192,8 +1240,9 @@ static void guc_cancel_context_requests(struct intel_context *ce)
 }
 
 static void
-guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
+gse_cancel_requests(struct guc_submit_engine *gse)
 {
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
@@ -1250,12 +1299,14 @@ void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 {
 	struct intel_context *ce;
 	unsigned long index;
+	int i;
 
 	xa_for_each(&guc->context_lookup, index, ce)
 		if (intel_context_is_pinned(ce))
 			guc_cancel_context_requests(ce);
 
-	guc_cancel_sched_engine_requests(guc->sched_engine);
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
+		gse_cancel_requests(guc->gse[i]);
 
 	/* GuC is blown away, drop all references to contexts */
 	xa_destroy(&guc->context_lookup);
@@ -1283,13 +1334,13 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
-static void retire_worker_sched_disable(struct intel_guc *guc,
+static void retire_worker_sched_disable(struct guc_submit_engine *gse,
 					struct intel_context *ce);
 
 static void retire_worker_func(struct work_struct *w)
 {
-	struct intel_guc *guc =
-		container_of(w, struct intel_guc, retire_worker);
+	struct guc_submit_engine *gse =
+		container_of(w, struct guc_submit_engine, retire_worker);
 
 	/*
 	 * It is possible that another thread issues the schedule disable + that
@@ -1297,17 +1348,17 @@ static void retire_worker_func(struct work_struct *w)
 	 * where nothing needs to be done here. Let's be paranoid and kick the
 	 * tasklet in that case.
 	 */
-	if (guc->submission_stall_reason != STALL_SCHED_DISABLE &&
-	    guc->submission_stall_reason != STALL_GUC_ID_WORKQUEUE) {
-		kick_tasklet(guc);
+	if (gse->submission_stall_reason != STALL_SCHED_DISABLE &&
+	    gse->submission_stall_reason != STALL_GUC_ID_WORKQUEUE) {
+		kick_tasklet(gse);
 		return;
 	}
 
-	if (guc->submission_stall_reason == STALL_SCHED_DISABLE) {
-		GEM_BUG_ON(!guc->stalled_context);
-		GEM_BUG_ON(context_guc_id_invalid(guc->stalled_context));
+	if (gse->submission_stall_reason == STALL_SCHED_DISABLE) {
+		GEM_BUG_ON(!gse->stalled_context);
+		GEM_BUG_ON(context_guc_id_invalid(gse->stalled_context));
 
-		retire_worker_sched_disable(guc, guc->stalled_context);
+		retire_worker_sched_disable(gse, gse->stalled_context);
 	}
 
 	/*
@@ -1315,16 +1366,16 @@ static void retire_worker_func(struct work_struct *w)
 	 * albeit after possibly issuing a schedule disable as that is async
 	 * operation.
 	 */
-	intel_gt_retire_requests(guc_to_gt(guc));
+	intel_gt_retire_requests(guc_to_gt(gse->sched_engine.private_data));
 
-	if (guc->submission_stall_reason == STALL_GUC_ID_WORKQUEUE) {
-		GEM_BUG_ON(guc->stalled_context);
+	if (gse->submission_stall_reason == STALL_GUC_ID_WORKQUEUE) {
+		GEM_BUG_ON(gse->stalled_context);
 
 		/* Hopefully guc_ids are now available, kick tasklet */
-		guc->submission_stall_reason = STALL_GUC_ID_TASKLET;
-		clr_tasklet_blocked(guc);
+		gse->submission_stall_reason = STALL_GUC_ID_TASKLET;
+		clr_tasklet_blocked(gse);
 
-		kick_tasklet(guc);
+		kick_tasklet(gse);
 	}
 }
 
@@ -1355,18 +1406,24 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
 	ida_init(&guc->guc_ids);
 
-	INIT_WORK(&guc->retire_worker, retire_worker_func);
-
 	return 0;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
+	int i;
+
 	if (!guc->lrc_desc_pool)
 		return;
 
 	guc_lrc_desc_pool_destroy(guc);
-	i915_sched_engine_put(guc->sched_engine);
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		i915_sched_engine_put(sched_engine);
+	}
 }
 
 static inline void queue_request(struct i915_sched_engine *sched_engine,
@@ -1381,22 +1438,23 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
 	if (empty)
-		kick_tasklet(&rq->engine->gt->uc.guc);
+		kick_tasklet(ce_to_gse(rq->context));
 }
 
 /* Macro to tweak heuristic, using a simple over 50% not ready for now */
 #define TOO_MANY_GUC_IDS_NOT_READY(avail, consumed) \
 	((consumed) > (avail) / 2)
-static bool too_many_guc_ids_not_ready(struct intel_guc *guc,
+static bool too_many_guc_ids_not_ready(struct guc_submit_engine *gse,
 				       struct intel_context *ce)
 {
 	u32 available_guc_ids, guc_ids_consumed;
+	struct intel_guc *guc = gse->sched_engine.private_data;
 
 	available_guc_ids = guc->num_guc_ids;
-	guc_ids_consumed = atomic_read(&guc->num_guc_ids_not_ready);
+	guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
 
 	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
-		set_and_update_guc_ids_exhausted(guc);
+		set_and_update_guc_ids_exhausted(gse);
 		return true;
 	}
 
@@ -1405,34 +1463,36 @@ static bool too_many_guc_ids_not_ready(struct intel_guc *guc,
 
 static void incr_num_rq_not_ready(struct intel_context *ce)
 {
-	struct intel_guc *guc = ce_to_guc(ce);
+	struct guc_submit_engine *gse = ce_to_gse(ce);
 
 	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
-		atomic_inc(&guc->num_guc_ids_not_ready);
+		atomic_inc(&gse->num_guc_ids_not_ready);
 }
 
 void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
 {
-	struct intel_guc *guc = ce_to_guc(ce);
+	struct guc_submit_engine *gse = ce_to_gse(ce);
 
-	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1)
-		atomic_dec(&guc->num_guc_ids_not_ready);
+	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1) {
+		GEM_BUG_ON(!atomic_read(&gse->num_guc_ids_not_ready));
+		atomic_dec(&gse->num_guc_ids_not_ready);
+	}
 }
 
-static bool need_tasklet(struct intel_guc *guc, struct intel_context *ce)
+static bool need_tasklet(struct guc_submit_engine *gse, struct intel_context *ce)
 {
-	struct i915_sched_engine * const sched_engine =
-		ce->engine->sched_engine;
+	struct i915_sched_engine * const sched_engine = &gse->sched_engine;
+	struct intel_guc *guc = gse->sched_engine.private_data;
 
 	lockdep_assert_held(&sched_engine->lock);
 
-	return guc_ids_exhausted(guc) || submission_disabled(guc) ||
-		guc->stalled_rq || guc->stalled_context ||
+	return guc_ids_exhausted(gse) || submission_disabled(guc) ||
+		gse->stalled_rq || gse->stalled_context ||
 		!lrc_desc_registered(guc, ce->guc_id) ||
 		!i915_sched_engine_is_empty(sched_engine);
 }
 
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+static int gse_bypass_tasklet_submit(struct guc_submit_engine *gse,
 				     struct i915_request *rq)
 {
 	int ret;
@@ -1442,27 +1502,27 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	trace_i915_request_in(rq, 0);
 
 	guc_set_lrc_tail(rq);
-	ret = guc_add_request(guc, rq);
+	ret = gse_add_request(gse, rq);
 
 	if (unlikely(ret == -EPIPE))
-		disable_submission(guc);
+		disable_submission(gse->sched_engine.private_data);
 
 	return ret;
 }
 
 static void guc_submit_request(struct i915_request *rq)
 {
-	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
-	struct intel_guc *guc = &rq->engine->gt->uc.guc;
+	struct guc_submit_engine *gse = ce_to_gse(rq->context);
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (need_tasklet(guc, rq->context))
+	if (need_tasklet(gse, rq->context))
 		queue_request(sched_engine, rq, rq_prio(rq));
-	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		kick_tasklet(guc);
+	else if (gse_bypass_tasklet_submit(gse, rq) == -EBUSY)
+		kick_tasklet(gse);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 
@@ -1541,8 +1601,9 @@ static int steal_guc_id(struct intel_guc *guc, bool unpinned)
 		 * context.
 		 */
 		if (!unpinned) {
-			GEM_BUG_ON(guc->stalled_context);
-			guc->stalled_context = intel_context_get(ce);
+			GEM_BUG_ON(ce_to_gse(ce)->stalled_context);
+
+			ce_to_gse(ce)->stalled_context = intel_context_get(ce);
 			set_context_guc_id_stolen(ce);
 		} else {
 			set_context_guc_id_invalid(ce);
@@ -1604,7 +1665,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
 try_again:
 	spin_lock_irqsave(&guc->contexts_lock, flags);
 
-	if (!tasklet && guc_ids_exhausted(guc)) {
+	if (!tasklet && guc_ids_exhausted(ce_to_gse(ce))) {
 		ret = -EAGAIN;
 		goto out_unlock;
 	}
@@ -2111,7 +2172,7 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 	intel_wakeref_t wakeref;
 	unsigned long flags;
 
-	guc_flush_submissions(guc);
+	gse_flush_submissions(ce_to_gse(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	set_context_banned(ce);
@@ -2199,7 +2260,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 	with_intel_runtime_pm(runtime_pm, wakeref)
-		__guc_context_sched_disable(guc, ce, guc_id);
+		__guc_context_sched_disable(ce_to_guc(ce), ce, guc_id);
 
 	return;
 unpin:
@@ -2460,7 +2521,7 @@ static void remove_from_context(struct i915_request *rq)
 	if (likely(!request_has_no_guc_id(rq)))
 		atomic_dec(&ce->guc_id_ref);
 	else
-		--ce_to_guc(rq->context)->total_num_rq_with_no_guc_id;
+		--ce_to_gse(rq->context)->total_num_rq_with_no_guc_id;
 	unpin_guc_id(ce_to_guc(ce), ce, false);
 
 	i915_request_notify_execute_cb_imm(rq);
@@ -2521,13 +2582,14 @@ static void invalidate_guc_id_sched_disable(struct intel_context *ce)
 	clr_context_guc_id_stolen(ce);
 }
 
-static void retire_worker_sched_disable(struct intel_guc *guc,
+static void retire_worker_sched_disable(struct guc_submit_engine *gse,
 					struct intel_context *ce)
 {
+	struct intel_guc *guc = gse->sched_engine.private_data;
 	unsigned long flags;
 	bool disabled;
 
-	guc->stalled_context = NULL;
+	gse->stalled_context = NULL;
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	disabled = submission_disabled(guc);
 	if (!disabled && !context_pending_disable(ce) && context_enabled(ce)) {
@@ -2573,10 +2635,10 @@ static void retire_worker_sched_disable(struct intel_guc *guc,
 
 		invalidate_guc_id_sched_disable(ce);
 
-		guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
-		clr_tasklet_blocked(guc);
+		gse->submission_stall_reason = STALL_REGISTER_CONTEXT;
+		clr_tasklet_blocked(gse);
 
-		kick_tasklet(ce_to_guc(ce));
+		kick_tasklet(gse);
 	}
 
 	intel_context_put(ce);
@@ -2589,25 +2651,26 @@ static bool context_needs_lrc_desc_pin(struct intel_context *ce, bool new_guc_id
 		!submission_disabled(ce_to_guc(ce));
 }
 
-static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
+static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
+			      struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	int ret = 0;
 
-	lockdep_assert_held(&guc->sched_engine->lock);
+	lockdep_assert_held(&gse->sched_engine.lock);
 	GEM_BUG_ON(!ce->guc_num_rq_submit_no_id);
 
 	if (atomic_add_unless(&ce->guc_id_ref, ce->guc_num_rq_submit_no_id, 0))
 		goto out;
 
-	ret = pin_guc_id(guc, ce, true);
+	ret = pin_guc_id(gse->sched_engine.private_data, ce, true);
 	if (unlikely(ret < 0)) {
 		/*
 		 * No guc_ids available, disable the tasklet and kick the retire
 		 * workqueue hopefully freeing up some guc_ids.
 		 */
-		guc->stalled_rq = rq;
-		guc->submission_stall_reason = STALL_GUC_ID_WORKQUEUE;
+		gse->stalled_rq = rq;
+		gse->submission_stall_reason = STALL_GUC_ID_WORKQUEUE;
 		return ret;
 	}
 
@@ -2619,14 +2682,14 @@ static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
 		set_context_needs_register(ce);
 
 	if (ret == NEW_GUC_ID_ENABLED) {
-		guc->stalled_rq = rq;
-		guc->submission_stall_reason = STALL_SCHED_DISABLE;
+		gse->stalled_rq = rq;
+		gse->submission_stall_reason = STALL_SCHED_DISABLE;
 	}
 
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 out:
-	guc->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
-	GEM_BUG_ON(guc->total_num_rq_with_no_guc_id < 0);
+	gse->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
+	GEM_BUG_ON(gse->total_num_rq_with_no_guc_id < 0);
 
 	list_for_each_entry_reverse(rq, &ce->guc_active.requests, sched.link)
 		if (request_has_no_guc_id(rq)) {
@@ -2644,7 +2707,7 @@ static int tasklet_pin_guc_id(struct intel_guc *guc, struct i915_request *rq)
 	 * from a context that has scheduling enabled. We have to disable
 	 * scheduling before deregistering the context and it isn't safe to do
 	 * in the tasklet because of lock inversion (ce->guc_state.lock must be
-	 * acquired before guc->sched_engine->lock). To work around this
+	 * acquired before gse->sched_engine.lock). To work around this
 	 * we do the schedule disable in retire workqueue and block the tasklet
 	 * until the schedule done G2H returns. Returning non-zero here kicks
 	 * the workqueue.
@@ -2656,6 +2719,7 @@ static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	struct intel_guc *guc = ce_to_guc(ce);
+	struct guc_submit_engine *gse = ce_to_gse(ce);
 	unsigned long flags;
 	int ret;
 
@@ -2689,8 +2753,8 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * ready to submit). Don't allocate one here, defer to submission in the
 	 * tasklet.
 	 */
-	if (test_and_update_guc_ids_exhausted(guc) ||
-	    too_many_guc_ids_not_ready(guc, ce)) {
+	if (test_and_update_guc_ids_exhausted(gse) ||
+	    too_many_guc_ids_not_ready(gse, ce)) {
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
 		goto out;
 	}
@@ -2723,7 +2787,7 @@ static int guc_request_alloc(struct i915_request *rq)
 		 * submissions we return to allocating guc_ids in this function.
 		 */
 		set_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED, &rq->fence.flags);
-		set_and_update_guc_ids_exhausted(guc);
+		set_and_update_guc_ids_exhausted(gse);
 		incr_num_rq_not_ready(ce);
 
 		return 0;
@@ -3131,17 +3195,41 @@ static void guc_sched_engine_destroy(struct kref *kref)
 {
 	struct i915_sched_engine *sched_engine =
 		container_of(kref, typeof(*sched_engine), ref);
-	struct intel_guc *guc = sched_engine->private_data;
+	struct guc_submit_engine *gse =
+		container_of(sched_engine, typeof(*gse), sched_engine);
+	struct intel_guc *guc = gse->sched_engine.private_data;
 
-	guc->sched_engine = NULL;
+	guc->gse[gse->id] = NULL;
 	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
-	kfree(sched_engine);
+	kfree(gse);
+}
+
+static void guc_submit_engine_init(struct intel_guc *guc,
+				   struct guc_submit_engine *gse,
+				   int id)
+{
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
+
+	i915_sched_engine_init(sched_engine, ENGINE_VIRTUAL);
+	INIT_WORK(&gse->retire_worker, retire_worker_func);
+	tasklet_setup(&sched_engine->tasklet, gse_submission_tasklet);
+	sched_engine->schedule = i915_schedule;
+	sched_engine->disabled = guc_sched_engine_disabled;
+	sched_engine->destroy = guc_sched_engine_destroy;
+	sched_engine->bump_inflight_request_prio =
+		guc_bump_inflight_request_prio;
+	sched_engine->retire_inflight_request_prio =
+		guc_retire_inflight_request_prio;
+	sched_engine->private_data = guc;
+	gse->id = id;
 }
 
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
 	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct i915_sched_engine *sched_engine;
+	int ret, i;
 
 	/*
 	 * The setup relies on several assumptions (e.g. irqs always enabled)
@@ -3149,24 +3237,20 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
 
-	if (!guc->sched_engine) {
-		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
-		if (!guc->sched_engine)
-			return -ENOMEM;
-
-		guc->sched_engine->schedule = i915_schedule;
-		guc->sched_engine->disabled = guc_sched_engine_disabled;
-		guc->sched_engine->private_data = guc;
-		guc->sched_engine->destroy = guc_sched_engine_destroy;
-		guc->sched_engine->bump_inflight_request_prio =
-			guc_bump_inflight_request_prio;
-		guc->sched_engine->retire_inflight_request_prio =
-			guc_retire_inflight_request_prio;
-		tasklet_setup(&guc->sched_engine->tasklet,
-			      guc_submission_tasklet);
+	if (!guc->gse[0]) {
+		for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+			guc->gse[i] = kzalloc(sizeof(*guc->gse[i]), GFP_KERNEL);
+			if (!guc->gse[i]) {
+				ret = -ENOMEM;
+				goto put_sched_engine;
+			}
+			guc_submit_engine_init(guc, guc->gse[i], i);
+		}
 	}
+
+	sched_engine = guc_to_sched_engine(guc, GUC_SUBMIT_ENGINE_SINGLE_LRC);
 	i915_sched_engine_put(engine->sched_engine);
-	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
+	engine->sched_engine = i915_sched_engine_get(sched_engine);
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
@@ -3182,6 +3266,16 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	engine->release = guc_release;
 
 	return 0;
+
+put_sched_engine:
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
+		struct i915_sched_engine *sched_engine =
+			guc_to_sched_engine(guc, i);
+
+		if (sched_engine)
+			i915_sched_engine_put(sched_engine);
+	}
+	return ret;
 }
 
 void intel_guc_submission_enable(struct intel_guc *guc)
@@ -3282,14 +3376,16 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 			register_context(ce, true);
 		guc_signal_context_fence(ce);
 		if (context_block_tasklet(ce)) {
-			GEM_BUG_ON(guc->submission_stall_reason !=
+			struct guc_submit_engine *gse = ce_to_gse(ce);
+
+			GEM_BUG_ON(gse->submission_stall_reason !=
 				   STALL_DEREGISTER_CONTEXT);
 
 			clr_context_block_tasklet(ce);
-			guc->submission_stall_reason = STALL_MOVE_LRC_TAIL;
-			clr_tasklet_blocked(guc);
+			gse->submission_stall_reason = STALL_MOVE_LRC_TAIL;
+			clr_tasklet_blocked(gse);
 
-			kick_tasklet(ce_to_guc(ce));
+			kick_tasklet(gse);
 		}
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
@@ -3355,11 +3451,13 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 		if (context_block_tasklet(ce)) {
+			struct guc_submit_engine *gse = ce_to_gse(ce);
+
 			clr_context_block_tasklet(ce);
-			guc->submission_stall_reason = STALL_REGISTER_CONTEXT;
-			clr_tasklet_blocked(guc);
+			gse->submission_stall_reason = STALL_REGISTER_CONTEXT;
+			clr_tasklet_blocked(gse);
 
-			kick_tasklet(ce_to_guc(ce));
+			kick_tasklet(gse);
 		}
 
 		if (banned) {
@@ -3391,7 +3489,7 @@ static void capture_error_state(struct intel_guc *guc,
 static void guc_context_replay(struct intel_context *ce)
 {
 	__guc_reset_context(ce, true);
-	kick_tasklet(ce_to_guc(ce));
+	kick_tasklet(ce_to_gse(ce));
 }
 
 static void guc_handle_context_reset(struct intel_guc *guc,
@@ -3536,35 +3634,32 @@ void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
 	}
 }
 
-void intel_guc_submission_print_info(struct intel_guc *guc,
-				     struct drm_printer *p)
+static void gse_log_submission_info(struct guc_submit_engine *gse,
+				    struct drm_printer *p, int id)
 {
-	struct i915_sched_engine *sched_engine = guc->sched_engine;
+	struct i915_sched_engine *sched_engine = &gse->sched_engine;
 	struct rb_node *rb;
 	unsigned long flags;
 
 	if (!sched_engine)
 		return;
 
-	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
-		   atomic_read(&guc->outstanding_submission_g2h));
-	drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
-	drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
-	drm_printf(p, "GuC tasklet count: %u\n",
+	drm_printf(p, "GSE[%d] tasklet count: %u\n", id,
 		   atomic_read(&sched_engine->tasklet.count));
-	drm_printf(p, "GuC submit flags: 0x%04lx\n", guc->flags);
-	drm_printf(p, "GuC total number request without guc_id: %d\n",
-		   guc->total_num_rq_with_no_guc_id);
-	drm_printf(p, "GuC Number GuC IDs not ready: %d\n",
-		   atomic_read(&guc->num_guc_ids_not_ready));
-	drm_printf(p, "GuC stall reason: %d\n", guc->submission_stall_reason);
-	drm_printf(p, "GuC stalled request: %s\n",
-		   yesno(guc->stalled_rq));
-	drm_printf(p, "GuC stalled context: %s\n\n",
-		   yesno(guc->stalled_context));
+	drm_printf(p, "GSE[%d] submit flags: 0x%04lx\n", id, gse->flags);
+	drm_printf(p, "GSE[%d] total number request without guc_id: %d\n",
+		   id, gse->total_num_rq_with_no_guc_id);
+	drm_printf(p, "GSE[%d] Number GuC IDs not ready: %d\n",
+		   id, atomic_read(&gse->num_guc_ids_not_ready));
+	drm_printf(p, "GSE[%d] stall reason: %d\n",
+		   id, gse->submission_stall_reason);
+	drm_printf(p, "GSE[%d] stalled request: %s\n",
+		   id, yesno(gse->stalled_rq));
+	drm_printf(p, "GSE[%d] stalled context: %s\n\n",
+		   id, yesno(gse->stalled_context));
 
 	spin_lock_irqsave(&sched_engine->lock, flags);
-	drm_printf(p, "Requests in GuC submit tasklet:\n");
+	drm_printf(p, "Requests in GSE[%d] submit tasklet:\n", id);
 	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
 		struct i915_priolist *pl = to_priolist(rb);
 		struct i915_request *rq;
@@ -3594,6 +3689,20 @@ static inline void guc_log_context_priority(struct drm_printer *p,
 	drm_printf(p, "\n");
 }
 
+void intel_guc_submission_print_info(struct intel_guc *guc,
+				     struct drm_printer *p)
+{
+	int i;
+
+	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC Number GuC IDs: %d\n", guc->num_guc_ids);
+	drm_printf(p, "GuC Max Number GuC IDs: %d\n\n", guc->max_guc_ids);
+
+	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
+		gse_log_submission_info(guc->gse[i], p, i);
+}
+
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p)
 {
@@ -3627,6 +3736,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 {
 	struct guc_virtual_engine *ve;
 	struct intel_guc *guc;
+	struct i915_sched_engine *sched_engine;
 	unsigned int n;
 	int err;
 
@@ -3635,6 +3745,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 		return ERR_PTR(-ENOMEM);
 
 	guc = &siblings[0]->gt->uc.guc;
+	sched_engine = guc_to_sched_engine(guc, GUC_SUBMIT_ENGINE_SINGLE_LRC);
 
 	ve->base.i915 = siblings[0]->i915;
 	ve->base.gt = siblings[0]->gt;
@@ -3648,7 +3759,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
-	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
+	ve->base.sched_engine = i915_sched_engine_get(sched_engine);
 
 	ve->base.cops = &virtual_guc_context_ops;
 	ve->base.request_alloc = guc_request_alloc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
new file mode 100644
index 000000000000..0c224ab18c02
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2019 Intel Corporation
+ */
+
+#ifndef _INTEL_GUC_SUBMISSION_TYPES_H_
+#define _INTEL_GUC_SUBMISSION_TYPES_H_
+
+#include "gt/intel_engine_types.h"
+#include "gt/intel_context_types.h"
+#include "i915_scheduler_types.h"
+
+struct intel_guc;
+struct i915_request;
+
+/* GuC Virtual Engine */
+struct guc_virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+};
+
+/*
+ * Object which encapsulates the globally operated on i915_sched_engine +
+ * the GuC submission state machine described in intel_guc_submission.c.
+ */
+struct guc_submit_engine {
+	struct i915_sched_engine sched_engine;
+	struct work_struct retire_worker;
+	struct i915_request *stalled_rq;
+	struct intel_context *stalled_context;
+	unsigned long flags;
+	int total_num_rq_with_no_guc_id;
+	atomic_t num_guc_ids_not_ready;
+	int id;
+
+	/*
+	 * Submisson stall reason. See intel_guc_submission.c for detailed
+	 * description.
+	 */
+	enum {
+		STALL_NONE,
+		STALL_GUC_ID_WORKQUEUE,
+		STALL_GUC_ID_TASKLET,
+		STALL_SCHED_DISABLE,
+		STALL_REGISTER_CONTEXT,
+		STALL_DEREGISTER_CONTEXT,
+		STALL_MOVE_LRC_TAIL,
+		STALL_ADD_REQUEST,
+	} submission_stall_reason;
+};
+
+#endif
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 762127dd56c5..1e7eb49e374c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -448,15 +448,9 @@ static bool default_disabled(struct i915_sched_engine *sched_engine)
 	return false;
 }
 
-struct i915_sched_engine *
-i915_sched_engine_create(unsigned int subclass)
+void i915_sched_engine_init(struct i915_sched_engine *sched_engine,
+			    unsigned int subclass)
 {
-	struct i915_sched_engine *sched_engine;
-
-	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
-	if (!sched_engine)
-		return NULL;
-
 	kref_init(&sched_engine->ref);
 
 	sched_engine->queue = RB_ROOT_CACHED;
@@ -481,6 +475,18 @@ i915_sched_engine_create(unsigned int subclass)
 	lock_map_release(&sched_engine->lock.dep_map);
 	local_irq_enable();
 #endif
+}
+
+struct i915_sched_engine *
+i915_sched_engine_create(unsigned int subclass)
+{
+	struct i915_sched_engine *sched_engine;
+
+	sched_engine = kzalloc(sizeof(*sched_engine), GFP_KERNEL);
+	if (!sched_engine)
+		return NULL;
+
+	i915_sched_engine_init(sched_engine, subclass);
 
 	return sched_engine;
 }
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 0b9b86af6c7f..4e4ef32b2cbc 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -48,6 +48,9 @@ static inline void i915_priolist_free(struct i915_priolist *p)
 		__i915_priolist_free(p);
 }
 
+void i915_sched_engine_init(struct i915_sched_engine *sched_engine,
+			    unsigned int subclass);
+
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass);
 

From patchwork Tue Aug  3 22:29:03 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417501
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ED157C4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:16 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id BD58960EE8
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:16 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BD58960EE8
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id BAB7889E3B;
	Tue,  3 Aug 2021 22:11:54 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 7A22989B48;
 Tue,  3 Aug 2021 22:11:53 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745893"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745893"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512698"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:52 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 06/46] drm/i915/guc: Check return of __xa_store when
 registering a context
Date: Tue,  3 Aug 2021 15:29:03 -0700
Message-Id: <20210803222943.27686-7-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Check return of __xa_store when registering a context as this can fail
in a rare case if not memory can not be allocated. If this occurs fall
back on the tasklet flow control and try again in the future.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 842094de848d..a7f7174b5343 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -505,18 +505,24 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
 	return __get_context(guc, id);
 }
 
-static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
-					   struct intel_context *ce)
+static inline int set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+					  struct intel_context *ce)
 {
 	unsigned long flags;
+	void *ret;
 
 	/*
 	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
 	 * lower level functions directly.
 	 */
 	xa_lock_irqsave(&guc->context_lookup, flags);
-	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	ret = __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
 	xa_unlock_irqrestore(&guc->context_lookup, flags);
+
+	if (unlikely(xa_is_err(ret)))
+		return -EBUSY;	/* Try again in future */
+
+	return 0;
 }
 
 static int guc_submission_send_busy_loop(struct intel_guc *guc,
@@ -1854,7 +1860,9 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	rcu_read_unlock();
 
 	reset_lrc_desc(guc, desc_idx);
-	set_lrc_desc_registered(guc, desc_idx, ce);
+	ret = set_lrc_desc_registered(guc, desc_idx, ce);
+	if (unlikely(ret))
+		return ret;
 
 	desc = __get_lrc_desc(guc, desc_idx);
 	desc->engine_class = engine_class_to_guc_class(engine->class);

From patchwork Tue Aug  3 22:29:04 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417511
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ED7E1C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:29 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id BF81B601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:29 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BF81B601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id E4A3D6E8F7;
	Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id E106289D00;
 Tue,  3 Aug 2021 22:11:53 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745894"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745894"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512699"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 07/46] drm/i915/guc: Non-static lrc descriptor registration
 buffer
Date: Tue,  3 Aug 2021 15:29:04 -0700
Message-Id: <20210803222943.27686-8-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Dynamically allocate space for lrc descriptor registration with the GuC
rather than using a large static buffer indexed by the guc_id. If no
space is available to register a context, fall back to tasklet flow
control mechanism. Only allow 1/2 of the space to be allocated outside
the tasklet to prevent unready requests/contexts from consuming all
registration space.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   9 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 202 ++++++++++++------
 3 files changed, 150 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index c01530d7dc67..2df79ba39867 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -179,6 +179,9 @@ struct intel_context {
 	/* GuC scheduling state flags that do not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
+	/* GuC lrc descriptor registration buffer */
+	unsigned int guc_lrcd_reg_idx;
+
 	/* GuC LRC descriptor ID */
 	u16 guc_id;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8ac016201658..c0a12ae95ba5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -69,8 +69,13 @@ struct intel_guc {
 	u32 ads_regset_size;
 	u32 ads_golden_ctxt_size;
 
-	struct i915_vma *lrc_desc_pool;
-	void *lrc_desc_pool_vaddr;
+	/* GuC LRC descriptor registration */
+	struct {
+		struct i915_vma *vma;
+		void *vaddr;
+		struct ida ida;
+		unsigned int max_idx;
+	} lrcd_reg;
 
 	/* guc_id to intel_context lookup */
 	struct xarray context_lookup;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a7f7174b5343..bfda15bf9182 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -439,65 +439,54 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
+static u32 __get_lrc_desc_offset(struct intel_guc *guc, int index)
 {
-	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
-
+	GEM_BUG_ON(index >= guc->lrcd_reg.max_idx);
 	GEM_BUG_ON(index >= guc->max_guc_ids);
 
-	return &base[index];
+	return intel_guc_ggtt_offset(guc, guc->lrcd_reg.vma) +
+		(index * sizeof(struct guc_lrc_desc));
 }
 
-static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
+static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, int index)
 {
-	struct intel_context *ce = xa_load(&guc->context_lookup, id);
+	struct guc_lrc_desc *desc;
 
-	GEM_BUG_ON(id >= guc->max_guc_ids);
+	GEM_BUG_ON(index >= guc->lrcd_reg.max_idx);
+	GEM_BUG_ON(index >= guc->max_guc_ids);
 
-	return ce;
+	desc = guc->lrcd_reg.vaddr;
+	desc = &desc[index];
+	memset(desc, 0, sizeof(*desc));
+
+	return desc;
 }
 
-static int guc_lrc_desc_pool_create(struct intel_guc *guc)
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
 {
-	u32 size;
-	int ret;
-
-	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * guc->max_guc_ids);
-	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
-					     (void **)&guc->lrc_desc_pool_vaddr);
-	if (ret)
-		return ret;
+	struct intel_context *ce = xa_load(&guc->context_lookup, id);
 
-	return 0;
-}
+	GEM_BUG_ON(id >= guc->max_guc_ids);
 
-static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
-{
-	guc->lrc_desc_pool_vaddr = NULL;
-	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
+	return ce;
 }
 
 static inline bool guc_submission_initialized(struct intel_guc *guc)
 {
-	return !!guc->lrc_desc_pool_vaddr;
+	return !!guc->lrcd_reg.max_idx;
 }
 
-static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
+static inline void clr_lrc_desc_registered(struct intel_guc *guc, u32 id)
 {
-	if (likely(guc_submission_initialized(guc))) {
-		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
-		unsigned long flags;
-
-		memset(desc, 0, sizeof(*desc));
+	unsigned long flags;
 
-		/*
-		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
-		 * the lower level functions directly.
-		 */
-		xa_lock_irqsave(&guc->context_lookup, flags);
-		__xa_erase(&guc->context_lookup, id);
-		xa_unlock_irqrestore(&guc->context_lookup, flags);
-	}
+	/*
+	 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
+	 * the lower level functions directly.
+	 */
+	xa_lock_irqsave(&guc->context_lookup, flags);
+	__xa_erase(&guc->context_lookup, id);
+	xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
 
 static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -1385,6 +1374,9 @@ static void retire_worker_func(struct work_struct *w)
 	}
 }
 
+static int guc_lrcd_reg_init(struct intel_guc *guc);
+static void guc_lrcd_reg_fini(struct intel_guc *guc);
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1393,17 +1385,12 @@ int intel_guc_submission_init(struct intel_guc *guc)
 {
 	int ret;
 
-	if (guc->lrc_desc_pool)
+	if (guc_submission_initialized(guc))
 		return 0;
 
-	ret = guc_lrc_desc_pool_create(guc);
+	ret = guc_lrcd_reg_init(guc);
 	if (ret)
 		return ret;
-	/*
-	 * Keep static analysers happy, let them know that we allocated the
-	 * vma after testing that it didn't exist earlier.
-	 */
-	GEM_BUG_ON(!guc->lrc_desc_pool);
 
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
@@ -1419,10 +1406,10 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 {
 	int i;
 
-	if (!guc->lrc_desc_pool)
+	if (!guc_submission_initialized(guc))
 		return;
 
-	guc_lrc_desc_pool_destroy(guc);
+	guc_lrcd_reg_fini(guc);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
 		struct i915_sched_engine *sched_engine =
@@ -1495,6 +1482,7 @@ static bool need_tasklet(struct guc_submit_engine *gse, struct intel_context *ce
 	return guc_ids_exhausted(gse) || submission_disabled(guc) ||
 		gse->stalled_rq || gse->stalled_context ||
 		!lrc_desc_registered(guc, ce->guc_id) ||
+		context_needs_register(ce) ||
 		!i915_sched_engine_is_empty(sched_engine);
 }
 
@@ -1546,7 +1534,7 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
 	if (!context_guc_id_invalid(ce)) {
 		ida_simple_remove(&guc->guc_ids, ce->guc_id);
-		reset_lrc_desc(guc, ce->guc_id);
+		clr_lrc_desc_registered(guc, ce->guc_id);
 		set_context_guc_id_invalid(ce);
 	}
 	if (!list_empty(&ce->guc_id_link))
@@ -1743,14 +1731,14 @@ static void unpin_guc_id(struct intel_guc *guc,
 }
 
 static int __guc_action_register_context(struct intel_guc *guc,
+					 struct intel_context *ce,
 					 u32 guc_id,
-					 u32 offset,
 					 bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_REGISTER_CONTEXT,
 		guc_id,
-		offset,
+		__get_lrc_desc_offset(guc, ce->guc_lrcd_reg_idx),
 	};
 
 	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1760,13 +1748,11 @@ static int __guc_action_register_context(struct intel_guc *guc,
 static int register_context(struct intel_context *ce, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
-		ce->guc_id * sizeof(struct guc_lrc_desc);
 	int ret;
 
 	trace_intel_context_register(ce);
 
-	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	ret = __guc_action_register_context(guc, ce, ce->guc_id, loop);
 	if (likely(!ret))
 		set_context_registered(ce);
 
@@ -1828,6 +1814,86 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 
 static inline u8 map_i915_prio_to_guc_prio(int prio);
 
+static int alloc_lrcd_reg_idx_buffer(struct intel_guc *guc, int num_per_vma)
+{
+	u32 size = num_per_vma * sizeof(struct guc_lrc_desc);
+	struct i915_vma **vma = &guc->lrcd_reg.vma;
+	void **vaddr = &guc->lrcd_reg.vaddr;
+	int ret;
+
+	GEM_BUG_ON(!is_power_of_2(size));
+
+	ret = intel_guc_allocate_and_map_vma(guc, size, vma, vaddr);
+	if (unlikely(ret))
+		return ret;
+
+	guc->lrcd_reg.max_idx += num_per_vma;
+
+	return 0;
+}
+
+static int alloc_lrcd_reg_idx(struct intel_guc *guc, bool tasklet)
+{
+	int ret;
+	gfp_t gfp = tasklet ? GFP_ATOMIC :
+		GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+
+	might_sleep_if(!tasklet);
+
+	/*
+	 * We only allow 1/2 of the space to be allocated outside of tasklet
+	 * (flow control) to ensure requests that are not ready don't consume
+	 * all context registration space.
+	 */
+	ret = ida_simple_get(&guc->lrcd_reg.ida, 0,
+			     tasklet ? guc->lrcd_reg.max_idx :
+			     guc->lrcd_reg.max_idx / 2, gfp);
+	if (unlikely(ret < 0))
+		return -EBUSY;
+
+	return ret;
+}
+
+static void __free_lrcd_reg_idx(struct intel_guc *guc, struct intel_context *ce)
+{
+	if (ce->guc_lrcd_reg_idx && guc->lrcd_reg.max_idx) {
+		ida_simple_remove(&guc->lrcd_reg.ida, ce->guc_lrcd_reg_idx);
+		ce->guc_lrcd_reg_idx = 0;
+	}
+}
+
+static void free_lrcd_reg_idx(struct intel_guc *guc, struct intel_context *ce)
+{
+	__free_lrcd_reg_idx(guc, ce);
+}
+
+static int guc_lrcd_reg_init(struct intel_guc *guc)
+{
+	unsigned int buffer_size = I915_GTT_PAGE_SIZE_4K * 16;
+	int ret;
+
+	ida_init(&guc->lrcd_reg.ida);
+
+	ret = alloc_lrcd_reg_idx_buffer(guc, buffer_size /
+					sizeof(struct guc_lrc_desc));
+	if (unlikely(ret))
+		return ret;
+
+	/* Zero is reserved */
+	ret = alloc_lrcd_reg_idx(guc, false);
+	GEM_BUG_ON(ret);
+
+	return ret;
+}
+
+static void guc_lrcd_reg_fini(struct intel_guc *guc)
+{
+	i915_vma_unpin_and_release(&guc->lrcd_reg.vma,
+				   I915_VMA_RELEASE_MAP);
+	ida_destroy(&guc->lrcd_reg.ida);
+	guc->lrcd_reg.max_idx = 0;
+}
+
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_engine_cs *engine = ce->engine;
@@ -1851,6 +1917,14 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
 		   i915_gem_object_is_lmem(ce->ring->vma->obj));
 
+	/* Allocate space for registration */
+	if (likely(!ce->guc_lrcd_reg_idx)) {
+		ret = alloc_lrcd_reg_idx(guc, !loop);
+		if (unlikely(ret < 0))
+			return ret;
+		ce->guc_lrcd_reg_idx = ret;
+	}
+
 	context_registered = lrc_desc_registered(guc, desc_idx);
 
 	rcu_read_lock();
@@ -1859,12 +1933,11 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 		prio = ctx->sched.priority;
 	rcu_read_unlock();
 
-	reset_lrc_desc(guc, desc_idx);
 	ret = set_lrc_desc_registered(guc, desc_idx, ce);
 	if (unlikely(ret))
 		return ret;
 
-	desc = __get_lrc_desc(guc, desc_idx);
+	desc = __get_lrc_desc(guc, ce->guc_lrcd_reg_idx);
 	desc->engine_class = engine_class_to_guc_class(engine->class);
 	desc->engine_submit_mask = adjust_engine_mask(engine->class,
 						      engine->mask);
@@ -1902,7 +1975,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			}
 			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 			if (unlikely(disabled)) {
-				reset_lrc_desc(guc, desc_idx);
+				clr_lrc_desc_registered(guc, desc_idx);
 				return 0;	/* Will get registered later */
 			}
 		}
@@ -1930,7 +2003,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			ret = register_context(ce, loop);
 		if (unlikely(ret == -EBUSY))
-			reset_lrc_desc(guc, desc_idx);
+			clr_lrc_desc_registered(guc, desc_idx);
 		else if (unlikely(ret == -ENODEV))
 			ret = 0;	/* Will get registered later */
 	}
@@ -2296,6 +2369,7 @@ static void __guc_context_destroy(struct intel_context *ce)
 
 	lrc_fini(ce);
 	intel_context_fini(ce);
+	__free_lrcd_reg_idx(ce_to_guc(ce), ce);
 
 	if (intel_engine_is_virtual(ce->engine)) {
 		struct guc_virtual_engine *ve =
@@ -2807,11 +2881,11 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	if (context_needs_lrc_desc_pin(ce, !!ret)) {
 		ret = guc_lrc_desc_pin(ce, true);
-		if (unlikely(ret)) {	/* unwind */
-			if (ret == -EPIPE) {
-				disable_submission(guc);
-				goto out;	/* GPU will be reset */
-			}
+		if (unlikely(ret == -EBUSY))
+			set_context_needs_register(ce);
+		else if (ret == -EPIPE)
+			disable_submission(guc); /* GPU will be reset */
+		else if (unlikely(ret)) {	/* unwind */
 			atomic_dec(&ce->guc_id_ref);
 			unpin_guc_id(guc, ce, true);
 			return ret;
@@ -3438,6 +3512,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
+
+		free_lrcd_reg_idx(guc, ce);
 	} else if (context_pending_disable(ce)) {
 		bool banned;
 
@@ -3706,6 +3782,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 		   atomic_read(&guc->outstanding_submission_g2h));
 	drm_printf(p, "GuC Number GuC IDs: %d\n", guc->num_guc_ids);
 	drm_printf(p, "GuC Max Number GuC IDs: %d\n\n", guc->max_guc_ids);
+	drm_printf(p, "GuC max context registered: %u\n\n",
+		   guc->lrcd_reg.max_idx);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
 		gse_log_submission_info(guc->gse[i], p, i);

From patchwork Tue Aug  3 22:29:05 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417519
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 66648C43216
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:36 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 34FCD60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:36 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 34FCD60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id C2F976E921;
	Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 1E6DF89B48;
 Tue,  3 Aug 2021 22:11:54 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745896"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745896"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512700"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 08/46] drm/i915/guc: Take GT PM ref when deregistering context
Date: Tue,  3 Aug 2021 15:29:05 -0700
Message-Id: <20210803222943.27686-9-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a deregister context H2G is in flight.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.h     |  5 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 13 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 99 +++++++++++++++----
 4 files changed, 102 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 70ea46d6cfb0..17a5028ea177 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -16,6 +16,11 @@ intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
 	return intel_wakeref_is_active(&engine->wakeref);
 }
 
+static inline void __intel_engine_pm_get(struct intel_engine_cs *engine)
+{
+	__intel_wakeref_get(&engine->wakeref);
+}
+
 static inline void intel_engine_pm_get(struct intel_engine_cs *engine)
 {
 	intel_wakeref_get(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index d0588d8aaa44..a17bf0d4592b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -41,6 +41,19 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
 	intel_wakeref_put_async(&gt->wakeref);
 }
 
+#define with_intel_gt_pm(gt, tmp) \
+	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+	     intel_gt_pm_put(gt), tmp = 0)
+#define with_intel_gt_pm_async(gt, tmp) \
+	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+	     intel_gt_pm_put_async(gt), tmp = 0)
+#define with_intel_gt_pm_if_awake(gt, tmp) \
+	for (tmp = intel_gt_pm_get_if_awake(gt); tmp; \
+	     intel_gt_pm_put(gt), tmp = 0)
+#define with_intel_gt_pm_if_awake_async(gt, tmp) \
+	for (tmp = intel_gt_pm_get_if_awake(gt); tmp; \
+	     intel_gt_pm_put_async(gt), tmp = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index c0a12ae95ba5..72fdfa1f6ccd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -61,6 +61,10 @@ struct intel_guc {
 	struct list_head guc_id_list_no_ref;
 	struct list_head guc_id_list_unpinned;
 
+	spinlock_t destroy_lock;	/* protects list / worker */
+	struct list_head destroyed_contexts;
+	struct work_struct destroy_worker;
+
 	bool submission_supported;
 	bool submission_selected;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index bfda15bf9182..262fa77b56e2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -914,6 +914,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 			if (deregister)
 				guc_signal_context_fence(ce);
 			if (destroyed) {
+				intel_gt_pm_put_async(guc_to_gt(guc));
 				release_guc_id(guc, ce);
 				__guc_context_destroy(ce);
 			}
@@ -1032,6 +1033,8 @@ static void guc_flush_submissions(struct intel_guc *guc)
 			gse_flush_submissions(guc->gse[i]);
 }
 
+static void guc_flush_destroyed_contexts(struct intel_guc *guc);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
 	int i;
@@ -1050,6 +1053,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
 
 	guc_flush_submissions(guc);
+	guc_flush_destroyed_contexts(guc);
 
 	/*
 	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
@@ -1377,6 +1381,8 @@ static void retire_worker_func(struct work_struct *w)
 static int guc_lrcd_reg_init(struct intel_guc *guc);
 static void guc_lrcd_reg_fini(struct intel_guc *guc);
 
+static void destroy_worker_func(struct work_struct *w);
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1399,6 +1405,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
 	ida_init(&guc->guc_ids);
 
+	spin_lock_init(&guc->destroy_lock);
+	INIT_LIST_HEAD(&guc->destroyed_contexts);
+	INIT_WORK(&guc->destroy_worker, destroy_worker_func);
+
 	return 0;
 }
 
@@ -1409,6 +1419,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	if (!guc_submission_initialized(guc))
 		return;
 
+	guc_flush_destroyed_contexts(guc);
 	guc_lrcd_reg_fini(guc);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i) {
@@ -2351,11 +2362,29 @@ static void guc_context_sched_disable(struct intel_context *ce)
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_gt *gt = guc_to_gt(guc);
+	unsigned long flags;
+	bool disabled;
 
+	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
+	/* Seal race with Reset */
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (likely(!disabled)) {
+		__intel_gt_pm_get(gt);
+		set_context_destroyed(ce);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	if (unlikely(disabled)) {
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+		return;
+	}
+
 	clr_context_registered(ce);
 	deregister_context(ce, ce->guc_id, true);
 }
@@ -2384,12 +2413,52 @@ static void __guc_context_destroy(struct intel_context *ce)
 	}
 }
 
+static void guc_flush_destroyed_contexts(struct intel_guc *guc)
+{
+	struct intel_context *ce, *cn;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->destroy_lock, flags);
+	list_for_each_entry_safe(ce, cn,
+				 &guc->destroyed_contexts, guc_id_link) {
+		list_del_init(&ce->guc_id_link);
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+	}
+	spin_unlock_irqrestore(&guc->destroy_lock, flags);
+}
+
+static void deregister_destroyed_contexts(struct intel_guc *guc)
+{
+	struct intel_context *ce, *cn;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->destroy_lock, flags);
+	list_for_each_entry_safe(ce, cn,
+				 &guc->destroyed_contexts, guc_id_link) {
+		list_del_init(&ce->guc_id_link);
+		spin_unlock_irqrestore(&guc->destroy_lock, flags);
+		guc_lrc_desc_unpin(ce);
+		spin_lock_irqsave(&guc->destroy_lock, flags);
+	}
+	spin_unlock_irqrestore(&guc->destroy_lock, flags);
+}
+
+static void destroy_worker_func(struct work_struct *w)
+{
+	struct intel_guc *guc =
+		container_of(w, struct intel_guc, destroy_worker);
+	struct intel_gt *gt = guc_to_gt(guc);
+	int tmp;
+
+	with_intel_gt_pm(gt, tmp)
+		deregister_destroyed_contexts(guc);
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
-	struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm;
 	struct intel_guc *guc = ce_to_guc(ce);
-	intel_wakeref_t wakeref;
 	unsigned long flags;
 	bool disabled;
 
@@ -2429,12 +2498,12 @@ static void guc_context_destroy(struct kref *kref)
 		list_del_init(&ce->guc_id_link);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 
-	/* Seal race with Reset */
-	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	/* Seal race with reset */
+	spin_lock_irqsave(&guc->destroy_lock, flags);
 	disabled = submission_disabled(guc);
 	if (likely(!disabled))
-		set_context_destroyed(ce);
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		list_add_tail(&ce->guc_id_link, &guc->destroyed_contexts);
+	spin_unlock_irqrestore(&guc->destroy_lock, flags);
 	if (unlikely(disabled)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -2442,20 +2511,11 @@ static void guc_context_destroy(struct kref *kref)
 	}
 
 	/*
-	 * We defer GuC context deregistration until the context is destroyed
-	 * in order to save on CTBs. With this optimization ideally we only need
-	 * 1 CTB to register the context during the first pin and 1 CTB to
-	 * deregister the context when the context is destroyed. Without this
-	 * optimization, a CTB would be needed every pin & unpin.
-	 *
-	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
-	 * from context_free_worker when runtime wakeref is not held.
-	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
-	 * in H2G CTB to deregister the context. A future patch may defer this
-	 * H2G CTB if the runtime wakeref is zero.
+	 * We use a worker to issue the H2G to deregister the context as we can
+	 * take the GT PM for the first time which isn't allowed from an atomic
+	 * context.
 	 */
-	with_intel_runtime_pm(runtime_pm, wakeref)
-		guc_lrc_desc_unpin(ce);
+	queue_work(system_unbound_wq, &guc->destroy_worker);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -3472,6 +3532,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
+		intel_gt_pm_put_async(guc_to_gt(guc));
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
 	}

From patchwork Tue Aug  3 22:29:06 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417509
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8EBD2C4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:28 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 5692E601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:28 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5692E601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id BEDDD6E900;
	Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 36CC96E14B;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745897"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745897"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512701"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 09/46] drm/i915: Add GT PM unpark worker
Date: Tue,  3 Aug 2021 15:29:06 -0700
Message-Id: <20210803222943.27686-10-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Sometimes it is desirable to queue work up for later if the GT PM isn't
held and run that work on next GT PM unpark.

Implemented with a list in the GT of all pending work, workqueues in
the list, a callback to add a workqueue to the list, and finally a
wakeref post_get callback that iterates / drains the list + queues the
workqueues.

First user of this is deregistration of GuC contexts.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c            |  3 ++
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  8 +++++
 .../gpu/drm/i915/gt/intel_gt_pm_unpark_work.c | 35 +++++++++++++++++++
 .../gpu/drm/i915/gt/intel_gt_pm_unpark_work.h | 32 +++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  3 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  3 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 +++++---
 drivers/gpu/drm/i915/intel_wakeref.c          |  5 +++
 drivers/gpu/drm/i915/intel_wakeref.h          |  1 +
 9 files changed, 99 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index a64aa43f7cd9..405558c08d6c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -29,6 +29,9 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 
 	spin_lock_init(&gt->irq_lock);
 
+	spin_lock_init(&gt->pm_unpark_work_lock);
+	INIT_LIST_HEAD(&gt->pm_unpark_work_list);
+
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index dea8e2479897..564c11a3748b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -90,6 +90,13 @@ static int __gt_unpark(struct intel_wakeref *wf)
 	return 0;
 }
 
+static void __gt_unpark_work_queue(struct intel_wakeref *wf)
+{
+	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
+
+	intel_gt_pm_unpark_work_queue(gt);
+}
+
 static int __gt_park(struct intel_wakeref *wf)
 {
 	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
@@ -118,6 +125,7 @@ static int __gt_park(struct intel_wakeref *wf)
 
 static const struct intel_wakeref_ops wf_ops = {
 	.get = __gt_unpark,
+	.post_get = __gt_unpark_work_queue,
 	.put = __gt_park,
 };
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c
new file mode 100644
index 000000000000..23162dbd0c35
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "intel_runtime_pm.h"
+#include "intel_gt_pm.h"
+
+void intel_gt_pm_unpark_work_queue(struct intel_gt *gt)
+{
+	struct intel_gt_pm_unpark_work *work, *next;
+	unsigned long flags;
+
+	spin_lock_irqsave(&gt->pm_unpark_work_lock, flags);
+	list_for_each_entry_safe(work, next,
+				 &gt->pm_unpark_work_list, link) {
+		list_del_init(&work->link);
+		queue_work(system_unbound_wq, &work->worker);
+	}
+	spin_unlock_irqrestore(&gt->pm_unpark_work_lock, flags);
+}
+
+void intel_gt_pm_unpark_work_add(struct intel_gt *gt,
+				 struct intel_gt_pm_unpark_work *work)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&gt->pm_unpark_work_lock, flags);
+	if (intel_gt_pm_is_awake(gt))
+		queue_work(system_unbound_wq, &work->worker);
+	else if (list_empty(&work->link))
+		list_add_tail(&work->link, &gt->pm_unpark_work_list);
+	spin_unlock_irqrestore(&gt->pm_unpark_work_lock, flags);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h
new file mode 100644
index 000000000000..08e9011be023
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#ifndef INTEL_GT_PM_UNPARK_WORK_H
+#define INTEL_GT_PM_UNPARK_WORK_H
+
+#include <linux/list.h>
+#include <linux/workqueue.h>
+
+struct intel_gt;
+
+struct intel_gt_pm_unpark_work {
+	struct list_head link;
+	struct work_struct worker;
+};
+
+void intel_gt_pm_unpark_work_queue(struct intel_gt *gt);
+
+void intel_gt_pm_unpark_work_add(struct intel_gt *gt,
+				 struct intel_gt_pm_unpark_work *work);
+
+static inline void
+intel_gt_pm_unpark_work_init(struct intel_gt_pm_unpark_work *work,
+			     work_func_t fn)
+{
+	INIT_LIST_HEAD(&work->link);
+	INIT_WORK(&work->worker, fn);
+}
+
+#endif /* INTEL_GT_PM_UNPARK_WORK_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 97a5075288d2..8d8a946561fa 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -91,6 +91,9 @@ struct intel_gt {
 	struct intel_wakeref wakeref;
 	atomic_t user_wakeref;
 
+	struct list_head pm_unpark_work_list;
+	spinlock_t pm_unpark_work_lock;	/* protect list */
+
 	struct list_head closed_vma;
 	spinlock_t closed_lock; /* guards the list of closed_vma */
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 72fdfa1f6ccd..aedd5a4281b8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -18,6 +18,7 @@
 #include "intel_uc_fw.h"
 #include "i915_utils.h"
 #include "i915_vma.h"
+#include "gt/intel_gt_pm_unpark_work.h"
 
 struct __guc_ads_blob;
 
@@ -63,7 +64,7 @@ struct intel_guc {
 
 	spinlock_t destroy_lock;	/* protects list / worker */
 	struct list_head destroyed_contexts;
-	struct work_struct destroy_worker;
+	struct intel_gt_pm_unpark_work destroy_worker;
 
 	bool submission_supported;
 	bool submission_selected;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 262fa77b56e2..7fe4d1559a81 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1406,8 +1406,9 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	ida_init(&guc->guc_ids);
 
 	spin_lock_init(&guc->destroy_lock);
+
 	INIT_LIST_HEAD(&guc->destroyed_contexts);
-	INIT_WORK(&guc->destroy_worker, destroy_worker_func);
+	intel_gt_pm_unpark_work_init(&guc->destroy_worker, destroy_worker_func);
 
 	return 0;
 }
@@ -2446,13 +2447,18 @@ static void deregister_destroyed_contexts(struct intel_guc *guc)
 
 static void destroy_worker_func(struct work_struct *w)
 {
+	struct intel_gt_pm_unpark_work *destroy_worker =
+		container_of(w, struct intel_gt_pm_unpark_work, worker);
 	struct intel_guc *guc =
-		container_of(w, struct intel_guc, destroy_worker);
+		container_of(destroy_worker, struct intel_guc, destroy_worker);
 	struct intel_gt *gt = guc_to_gt(guc);
 	int tmp;
 
-	with_intel_gt_pm(gt, tmp)
+	with_intel_gt_pm_if_awake(gt, tmp)
 		deregister_destroyed_contexts(guc);
+
+	if (!list_empty(&guc->destroyed_contexts))
+		intel_gt_pm_unpark_work_add(gt, destroy_worker);
 }
 
 static void guc_context_destroy(struct kref *kref)
@@ -2515,7 +2521,7 @@ static void guc_context_destroy(struct kref *kref)
 	 * take the GT PM for the first time which isn't allowed from an atomic
 	 * context.
 	 */
-	queue_work(system_unbound_wq, &guc->destroy_worker);
+	intel_gt_pm_unpark_work_add(guc_to_gt(guc), &guc->destroy_worker);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
index dfd87d082218..282fc4f312e3 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -24,6 +24,8 @@ static void rpm_put(struct intel_wakeref *wf)
 
 int __intel_wakeref_get_first(struct intel_wakeref *wf)
 {
+	bool do_post = false;
+
 	/*
 	 * Treat get/put as different subclasses, as we may need to run
 	 * the put callback from under the shrinker and do not want to
@@ -44,8 +46,11 @@ int __intel_wakeref_get_first(struct intel_wakeref *wf)
 		}
 
 		smp_mb__before_atomic(); /* release wf->count */
+		do_post = true;
 	}
 	atomic_inc(&wf->count);
+	if (do_post && wf->ops->post_get)
+		wf->ops->post_get(wf);
 	mutex_unlock(&wf->mutex);
 
 	INTEL_WAKEREF_BUG_ON(atomic_read(&wf->count) <= 0);
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
index 545c8f277c46..ef7e6a698e8a 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -30,6 +30,7 @@ typedef depot_stack_handle_t intel_wakeref_t;
 
 struct intel_wakeref_ops {
 	int (*get)(struct intel_wakeref *wf);
+	void (*post_get)(struct intel_wakeref *wf);
 	int (*put)(struct intel_wakeref *wf);
 };
 

From patchwork Tue Aug  3 22:29:07 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417517
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DF6BAC43214
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:34 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id B33B060184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:34 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B33B060184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id D4F896E16D;
	Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 4E1A46E167;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745898"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745898"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512703"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 10/46] drm/i915/guc: Take engine PM when a context is pinned
 with GuC submission
Date: Tue,  3 Aug 2021 15:29:07 -0700
Message-Id: <20210803222943.27686-11-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a scheduling of user context could be enabled.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |  1 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++--
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 903de270f2db..5e3a1e2095b0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -103,6 +103,7 @@ gt-y += \
 	gt/intel_gt_clock_utils.o \
 	gt/intel_gt_irq.o \
 	gt/intel_gt_pm.o \
+	gt/intel_gt_pm_unpark_work.o \
 	gt/intel_gt_pm_irq.o \
 	gt/intel_gt_requests.o \
 	gt/intel_gtt.o \
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 7fe4d1559a81..c5d9548bfd00 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2056,7 +2056,12 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
-	return __guc_context_pin(ce, ce->engine, vaddr);
+	int ret = __guc_context_pin(ce, ce->engine, vaddr);
+
+	if (likely(!ret && !intel_context_is_barrier(ce)))
+		intel_engine_pm_get(ce->engine);
+
+	return ret;
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -2067,6 +2072,9 @@ static void guc_context_unpin(struct intel_context *ce)
 
 	unpin_guc_id(guc, ce, true);
 	lrc_unpin(ce);
+
+	if (likely(!intel_context_is_barrier(ce)))
+		intel_engine_pm_put(ce->engine);
 }
 
 static void guc_context_post_unpin(struct intel_context *ce)
@@ -3002,8 +3010,30 @@ static int guc_virtual_context_pre_pin(struct intel_context *ce,
 static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
 {
 	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+	int ret = __guc_context_pin(ce, engine, vaddr);
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+
+	if (likely(!ret))
+		for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+			intel_engine_pm_get(engine);
 
-	return __guc_context_pin(ce, engine, vaddr);
+	return ret;
+}
+
+static void guc_virtual_context_unpin(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	GEM_BUG_ON(context_enabled(ce));
+	GEM_BUG_ON(intel_context_is_barrier(ce));
+
+	unpin_guc_id(guc, ce, true);
+	lrc_unpin(ce);
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_put(engine);
 }
 
 static void guc_virtual_context_enter(struct intel_context *ce)
@@ -3040,7 +3070,7 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 
 	.pre_pin = guc_virtual_context_pre_pin,
 	.pin = guc_virtual_context_pin,
-	.unpin = guc_context_unpin,
+	.unpin = guc_virtual_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
 	.ban = guc_context_ban,

From patchwork Tue Aug  3 22:29:08 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417533
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6B093C4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:46 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 3E532601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:46 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3E532601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id BA4796E927;
	Tue,  3 Aug 2021 22:12:00 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 715C26E17E;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745899"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745899"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512705"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 11/46] drm/i915/guc: Don't call switch_to_kernel_context with
 GuC submission
Date: Tue,  3 Aug 2021 15:29:08 -0700
Message-Id: <20210803222943.27686-12-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all contexts are pinned. By not calling
switch_to_kernel_context we save on issuing a request to the engine.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 1f07ac4e0672..58099de6bf07 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 	unsigned long flags;
 	bool result = true;
 
+	/* No need to switch_to_kernel_context if GuC submission */
+	if (intel_engine_uses_guc(engine))
+		return true;
+
 	/* GPU is pointing to the void, as good as in the kernel context. */
 	if (intel_gt_is_wedged(engine->gt))
 		return true;

From patchwork Tue Aug  3 22:29:09 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417513
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B4205C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:31 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 7C5CA601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:31 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7C5CA601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 1AA9C6E90C;
	Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 8AD0189E06;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745900"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745900"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512707"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 12/46] drm/i915/guc: Selftest for GuC flow control
Date: Tue,  3 Aug 2021 15:29:09 -0700
Message-Id: <20210803222943.27686-13-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Add 5 selftests for hard (from user space) to recreate flow conditions.
Test listed below:

1. A test to verify that the number of guc_ids can be exhausted and all
submissions still complete.

2. A test to verify that the flow control state machine can recover from
a full GPU reset.

3. A teset to verify that the lrcd registration slots can be exhausted
and all submissions still complete.

4. A test to verify that the H2G channel can deadlock and a full GPU
reset recovers the system.

5. A test to stress to CTB channel but submitting to lots of contexts
and then immediately destroy the contexts.

Tests 1, 2, and 3 also ensure when the flow control is triggered by
unready requests those unready requests do not DoS ready requests.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   6 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  43 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   9 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  16 +
 .../i915/gt/uc/intel_guc_submission_types.h   |   2 +
 .../i915/gt/uc/selftest_guc_flow_control.c    | 581 ++++++++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 .../i915/selftests/intel_scheduler_helpers.c  |  12 +
 .../i915/selftests/intel_scheduler_helpers.h  |   2 +
 9 files changed, 662 insertions(+), 10 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index aedd5a4281b8..c0c60ccabfa4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -103,6 +103,12 @@ struct intel_guc {
 
 	/* To serialize the intel_guc_send actions */
 	struct mutex send_mutex;
+
+	I915_SELFTEST_DECLARE(bool gse_hang_expected;)
+	I915_SELFTEST_DECLARE(bool deadlock_expected;)
+	I915_SELFTEST_DECLARE(bool bad_desc_expected;)
+	I915_SELFTEST_DECLARE(bool inject_bad_sched_disable;)
+	I915_SELFTEST_DECLARE(bool inject_corrupt_h2g;)
 };
 
 static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 22b4733b55e2..ab1ce8901c15 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,7 +3,6 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
-#include <linux/circ_buf.h>
 #include <linux/ktime.h>
 #include <linux/time64.h>
 #include <linux/timekeeping.h>
@@ -414,8 +413,9 @@ static int ct_write(struct intel_guc_ct *ct,
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	if (unlikely(desc->status))
-		goto corrupted;
+	if (!I915_SELFTEST_ONLY(ct_to_guc(ct)->deadlock_expected))
+		if (unlikely(desc->status))
+			goto corrupted;
 
 	GEM_BUG_ON(tail > size);
 
@@ -443,6 +443,15 @@ static int ct_write(struct intel_guc_ct *ct,
 		 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) |
 		 FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+	if (ct_to_guc(ct)->inject_corrupt_h2g) {
+		header = FIELD_PREP(GUC_CTB_MSG_0_FORMAT, 3) |
+			 FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len + 5) |
+			 FIELD_PREP(GUC_CTB_MSG_0_FENCE, 0xdead);
+		ct_to_guc(ct)->inject_corrupt_h2g = false;
+	}
+#endif
+
 	type = (flags & INTEL_GUC_CT_SEND_NB) ? GUC_HXG_TYPE_EVENT :
 		GUC_HXG_TYPE_REQUEST;
 	hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, type) |
@@ -481,8 +490,12 @@ static int ct_write(struct intel_guc_ct *ct,
 	return 0;
 
 corrupted:
-	CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
-		 desc->head, desc->tail, desc->status);
+	if (I915_SELFTEST_ONLY(ct_to_guc(ct)->bad_desc_expected))
+		CT_DEBUG(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
+			 desc->head, desc->tail, desc->status);
+	else
+		CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n",
+			 desc->head, desc->tail, desc->status);
 	ctb->broken = true;
 	return -EPIPE;
 }
@@ -539,9 +552,18 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 		struct guc_ct_buffer_desc *send = ct->ctbs.send.desc;
 		struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc;
 
-		CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
-			 ktime_ms_delta(ktime_get(), ct->stall_time),
-			 send->status, recv->status);
+		/*
+		 * CI doesn't like error messages, demote to debug if deadlock was
+		 * intentionally hit.
+		 */
+		if (I915_SELFTEST_ONLY(ct_to_guc(ct)->deadlock_expected))
+			CT_DEBUG(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
+				 ktime_ms_delta(ktime_get(), ct->stall_time),
+				 send->status, recv->status);
+		else
+			CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n",
+				 ktime_ms_delta(ktime_get(), ct->stall_time),
+				 send->status, recv->status);
 		ct->ctbs.send.broken = true;
 	}
 
@@ -767,8 +789,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		return -ENODEV;
 	}
 
-	if (unlikely(ct->ctbs.send.broken))
-		return -EPIPE;
+	if (!I915_SELFTEST_ONLY(ct_to_guc(ct)->deadlock_expected))
+		if (unlikely(ct->ctbs.send.broken))
+			return -EPIPE;
 
 	if (flags & INTEL_GUC_CT_SEND_NB)
 		return ct_send_nb(ct, action, len, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index f709a19c7e21..5963eda95022 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -6,6 +6,7 @@
 #ifndef _INTEL_GUC_CT_H_
 #define _INTEL_GUC_CT_H_
 
+#include <linux/circ_buf.h>
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
@@ -117,4 +118,12 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
 void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p);
 
+static inline bool intel_guc_ct_is_recv_buffer_empty(struct intel_guc_ct *ct)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
+
+	return atomic_read(&ctb->space) ==
+		(CIRC_SPACE(0, 0, ctb->size) - ctb->resv_space);
+}
+
 #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c5d9548bfd00..310116f40509 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -824,6 +824,7 @@ static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 			GEM_WARN_ON(ret);	/* Unexpected */
 			goto deadlk;
 		}
+		I915_SELFTEST_DECLARE(++gse->tasklets_submit_count;)
 	}
 
 	/*
@@ -2107,7 +2108,15 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 		GUC_CONTEXT_DISABLE
 	};
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+	if (guc->inject_bad_sched_disable &&
+	    guc_id == GUC_INVALID_LRC_ID)
+		guc->inject_bad_sched_disable = false;
+	else
+		GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+#else
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+#endif
 
 	trace_intel_context_sched_disable(ce);
 
@@ -2770,6 +2779,9 @@ static void retire_worker_sched_disable(struct guc_submit_engine *gse,
 		guc_id = prep_context_pending_disable(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
+		if (I915_SELFTEST_ONLY(guc->inject_bad_sched_disable))
+			guc_id = GUC_INVALID_LRC_ID;
+
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			__guc_context_sched_disable(guc, ce, guc_id);
 
@@ -4021,3 +4033,7 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
 
 	return false;
 }
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftest_guc_flow_control.c"
+#endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
index 0c224ab18c02..7069b7248f55 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
@@ -47,6 +47,8 @@ struct guc_submit_engine {
 		STALL_MOVE_LRC_TAIL,
 		STALL_ADD_REQUEST,
 	} submission_stall_reason;
+
+	I915_SELFTEST_DECLARE(u64 tasklets_submit_count;)
 };
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
new file mode 100644
index 000000000000..f31ab2674b2b
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
@@ -0,0 +1,581 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright �� 2021 Intel Corporation
+ */
+
+#include "selftests/igt_spinner.h"
+#include "selftests/igt_reset.h"
+#include "selftests/intel_scheduler_helpers.h"
+#include "gt/intel_engine_heartbeat.h"
+#include "gem/selftests/mock_context.h"
+
+static int __request_add_spin(struct i915_request *rq, struct igt_spinner *spin)
+{
+	int err = 0;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (spin && !igt_wait_for_spinner(spin, rq))
+		err = -ETIMEDOUT;
+
+	return err;
+}
+
+static struct i915_request *nop_kernel_request(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = intel_engine_create_kernel_request(engine);
+	if (IS_ERR(rq))
+		return rq;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	return rq;
+}
+
+static struct i915_request *nop_user_request(struct intel_context *ce,
+					     struct i915_request *from)
+{
+	struct i915_request *rq;
+	int ret;
+
+	rq = intel_context_create_request(ce);
+	if (IS_ERR(rq))
+		return rq;
+
+	if (from) {
+		ret = i915_sw_fence_await_dma_fence(&rq->submit,
+						    &from->fence, 0,
+						    I915_FENCE_GFP);
+		if (ret < 0) {
+			i915_request_put(rq);
+			return ERR_PTR(ret);
+		}
+	}
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	return rq;
+}
+
+static int nop_request_wait(struct intel_engine_cs *engine, bool kernel,
+			    bool flow_control)
+{
+	struct i915_gpu_error *global = &engine->gt->i915->gpu_error;
+	unsigned int reset_count = i915_reset_count(global);
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct guc_submit_engine *gse = guc->gse[GUC_SUBMIT_ENGINE_SINGLE_LRC];
+	u64 tasklets_submit_count = gse->tasklets_submit_count;
+	struct intel_context *ce;
+	struct i915_request *nop;
+	int ret;
+
+	if (kernel) {
+		nop = nop_kernel_request(engine);
+	} else {
+		ce = intel_context_create(engine);
+		if (IS_ERR(ce))
+			return PTR_ERR(ce);
+		nop = nop_user_request(ce, NULL);
+		intel_context_put(ce);
+	}
+	if (IS_ERR(nop))
+		return PTR_ERR(nop);
+
+	ret = intel_selftest_wait_for_rq(nop);
+	i915_request_put(nop);
+	if (ret)
+		return ret;
+
+	if (!flow_control &&
+	    gse->tasklets_submit_count != tasklets_submit_count) {
+		pr_err("Flow control for single-lrc unexpectedly kicked in\n");
+		ret = -EINVAL;
+	}
+
+	if (flow_control &&
+	    gse->tasklets_submit_count == tasklets_submit_count) {
+		pr_err("Flow control for single-lrc did not kick in\n");
+		ret = -EINVAL;
+	}
+
+	if (i915_reset_count(global) != reset_count) {
+		pr_err("Unexpected GPU reset during single-lrc submit\n");
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+#define NUM_GUC_ID		256
+#define NUM_CONTEXT		1024
+#define NUM_RQ_PER_CONTEXT	2
+#define HEARTBEAT_INTERVAL	1500
+
+static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	struct guc_submit_engine *gse = guc->gse[GUC_SUBMIT_ENGINE_SINGLE_LRC];
+	struct intel_context **contexts;
+	int ret = 0;
+	int i, j, k;
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *spin_rq = NULL, *last = NULL;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	u64 tasklets_submit_count = gse->tasklets_submit_count;
+	u32 old_beat;
+
+	contexts = kmalloc(sizeof(*contexts) * NUM_CONTEXT, GFP_KERNEL);
+	if (!contexts) {
+		pr_err("Context array allocation failed\n");
+		return -ENOMEM;
+	}
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	if (limit_guc_ids)
+		guc->num_guc_ids = NUM_GUC_ID;
+
+	ce = intel_context_create(intel_selftest_find_any_engine(gt));
+	if (IS_ERR(ce)) {
+		ret = PTR_ERR(ce);
+		pr_err("Failed to create context: %d\n", ret);
+		goto err;
+	}
+
+	reset_count = i915_reset_count(global);
+	engine = ce->engine;
+
+	old_beat = engine->props.heartbeat_interval_ms;
+	if (hang) {
+		ret = intel_engine_set_heartbeat(engine, HEARTBEAT_INTERVAL);
+		if (ret) {
+			pr_err("Failed to boost heartbeat interval: %d\n", ret);
+			goto err;
+		}
+	}
+
+	/* Create spinner to block requests in below loop */
+	ret = igt_spinner_init(&spin, engine->gt);
+	if (ret) {
+		pr_err("Failed to create spinner: %d\n", ret);
+		goto err_heartbeat;
+	}
+	spin_rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	intel_context_put(ce);
+	if (IS_ERR(spin_rq)) {
+		ret = PTR_ERR(spin_rq);
+		pr_err("Failed to create spinner request: %d\n", ret);
+		goto err_heartbeat;
+	}
+	ret = __request_add_spin(spin_rq, &spin);
+	if (ret) {
+		pr_err("Failed to add Spinner request: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/*
+	 * Create of lot of requests in a loop to trigger the flow control state
+	 * machine. Using a three level loop as it is interesting to hit flow
+	 * control with more than 1 request on each context in a row and also
+	 * interleave requests with other contexts.
+	 */
+	for (i = 0; i < NUM_RQ_PER_CONTEXT; ++i) {
+		for (j = 0; j < NUM_CONTEXT; ++j) {
+			for (k = 0; k < NUM_RQ_PER_CONTEXT; ++k) {
+				bool first_pass = !i && !k;
+
+				if (last)
+					i915_request_put(last);
+				last = NULL;
+
+				if (first_pass)
+					contexts[j] = intel_context_create(engine);
+				ce = contexts[j];
+
+				if (IS_ERR(ce)) {
+					ret = PTR_ERR(ce);
+					pr_err("Failed to create context, %d,%d,%d: %d\n",
+					       i, j, k, ret);
+					goto err_spin_rq;
+				}
+
+				last = nop_user_request(ce, spin_rq);
+				if (first_pass)
+					intel_context_put(ce);
+				if (IS_ERR(last)) {
+					ret = PTR_ERR(last);
+					pr_err("Failed to create request, %d,%d,%d: %d\n",
+					       i, j, k, ret);
+					goto err_spin_rq;
+				}
+			}
+		}
+	}
+
+	/* Verify GuC submit engine state */
+	if (limit_guc_ids && !guc_ids_exhausted(gse)) {
+		pr_err("guc_ids not exhausted\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+	if (!limit_guc_ids && guc_ids_exhausted(gse)) {
+		pr_err("guc_ids exhausted\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+
+	/* Ensure no DoS from unready requests */
+	ret = nop_request_wait(engine, false, true);
+	if (ret < 0) {
+		pr_err("User NOP request DoS: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Inject hang in flow control state machine */
+	if (hang) {
+		guc->gse_hang_expected = true;
+		guc->inject_bad_sched_disable = true;
+	}
+
+	/* Release blocked requests */
+	igt_spinner_end(&spin);
+	ret = intel_selftest_wait_for_rq(spin_rq);
+	if (ret) {
+		pr_err("Spin request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	i915_request_put(spin_rq);
+	igt_spinner_fini(&spin);
+	spin_rq = NULL;
+
+	/* Wait for last request / GT to idle */
+	ret = i915_request_wait(last, 0, hang ? HZ * 30 : HZ * 10);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	i915_request_put(last);
+	last = NULL;
+	ret = intel_gt_wait_for_idle(gt, HZ * 5);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Check state after idle */
+	if (guc_ids_exhausted(gse)) {
+		pr_err("guc_ids exhausted after last request signaled\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+	if (hang) {
+		if (i915_reset_count(global) == reset_count) {
+			pr_err("Failed to record a GPU reset\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+	} else {
+		if (i915_reset_count(global) != reset_count) {
+			pr_err("Unexpected GPU reset\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+		if (gse->tasklets_submit_count == tasklets_submit_count) {
+			pr_err("Flow control failed to kick in\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+	}
+
+	/* Verify requests can be submitted after flow control */
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+err_spin_rq:
+	if (spin_rq) {
+		igt_spinner_end(&spin);
+		intel_selftest_wait_for_rq(spin_rq);
+		i915_request_put(spin_rq);
+		igt_spinner_fini(&spin);
+		intel_gt_wait_for_idle(gt, HZ * 5);
+	}
+err_heartbeat:
+	if (last)
+		i915_request_put(last);
+	intel_engine_set_heartbeat(engine, old_beat);
+err:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	guc->num_guc_ids = guc->max_guc_ids;
+	guc->gse_hang_expected = false;
+	guc->inject_bad_sched_disable = false;
+	kfree(contexts);
+
+	return ret;
+}
+
+static int intel_guc_flow_control_guc_ids(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, true, false);
+}
+
+static int intel_guc_flow_control_lrcd_reg(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, false, false);
+}
+
+static int intel_guc_flow_control_hang_state_machine(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, true, true);
+}
+
+#define NUM_RQ_STRESS_CTBS	0x4000
+static int intel_guc_flow_control_stress_ctbs(void *arg)
+{
+	struct intel_gt *gt = arg;
+	int ret = 0;
+	int i;
+	struct intel_context *ce;
+	struct i915_request *last = NULL, *rq;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	struct intel_guc *guc = &gt->uc.guc;
+	struct intel_guc_ct_buffer *ctb = &guc->ct.ctbs.recv;
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	reset_count = i915_reset_count(global);
+	engine = intel_selftest_find_any_engine(gt);
+
+	/*
+	 * Create a bunch of requests, and then idle the GT which will create a
+	 * lot of H2G / G2H traffic.
+	 */
+	for (i = 0; i < NUM_RQ_STRESS_CTBS; ++i) {
+		ce = intel_context_create(engine);
+		if (IS_ERR(ce)) {
+			ret = PTR_ERR(ce);
+			pr_err("Failed to create context, %d: %d\n", i, ret);
+			goto err;
+		}
+
+		rq = nop_user_request(ce, NULL);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			pr_err("Failed to create request, %d: %d\n", i, ret);
+			goto err;
+		}
+
+		if (last)
+			i915_request_put(last);
+		last = rq;
+	}
+
+	ret = i915_request_wait(last, 0, HZ * 10);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err;
+	}
+	i915_request_put(last);
+	last = NULL;
+
+	ret = intel_gt_wait_for_idle(gt, HZ * 10);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err;
+	}
+
+	if (i915_reset_count(global) != reset_count) {
+		pr_err("Unexpected GPU reset\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err;
+	}
+
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err;
+	}
+
+	ret = intel_gt_wait_for_idle(gt, HZ);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err;
+	}
+
+	ret = wait_for(intel_guc_ct_is_recv_buffer_empty(&guc->ct), HZ);
+	if (ret) {
+		pr_err("Recv CTB not expected value=%d,%d outstanding_ctb=%d\n",
+		       atomic_read(&ctb->space),
+		       CIRC_SPACE(0, 0, ctb->size) - ctb->resv_space,
+		       atomic_read(&guc->outstanding_submission_g2h));
+		ret = -EINVAL;
+		goto err;
+	}
+
+err:
+	if (last)
+		i915_request_put(last);
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+
+	return ret;
+}
+
+#define NUM_RQ_DEADLOCK		2048
+static int __intel_guc_flow_control_deadlock_h2g(void *arg, bool bad_desc)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	int ret = 0;
+	int i;
+	struct intel_context *ce;
+	struct i915_request *last = NULL, *rq;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	u32 old_beat;
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	reset_count = i915_reset_count(global);
+	engine = intel_selftest_find_any_engine(gt);
+
+	old_beat = engine->props.heartbeat_interval_ms;
+	ret = intel_engine_set_heartbeat(engine, HEARTBEAT_INTERVAL);
+	if (ret) {
+		pr_err("Failed to boost heartbeat interval: %d\n", ret);
+		goto err;
+	}
+
+	guc->inject_corrupt_h2g = true;
+	if (bad_desc)
+		guc->bad_desc_expected = true;
+	else
+		guc->deadlock_expected = true;
+
+	for (i = 0; i < NUM_RQ_DEADLOCK; ++i) {
+		ce = intel_context_create(engine);
+		if (IS_ERR(ce)) {
+			ret = PTR_ERR(ce);
+			pr_err("Failed to create context, %d: %d\n", i, ret);
+			goto err_heartbeat;
+		}
+
+		rq = nop_user_request(ce, NULL);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			pr_err("Failed to create request, %d: %d\n", i, ret);
+			goto err_heartbeat;
+		}
+
+		if (last)
+			i915_request_put(last);
+		last = rq;
+	}
+
+	pr_debug("Number requests before deadlock: %d\n", i);
+
+	ret = i915_request_wait(last, 0, HZ * 5);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_heartbeat;
+	}
+	i915_request_put(last);
+	last = NULL;
+
+	ret = intel_gt_wait_for_idle(gt, HZ * 10);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_heartbeat;
+	}
+
+	if (i915_reset_count(global) == reset_count) {
+		pr_err("Failed to record a GPU reset\n");
+		ret = -EINVAL;
+		goto err_heartbeat;
+	}
+
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err_heartbeat;
+	}
+
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err_heartbeat;
+	}
+
+err_heartbeat:
+	if (last)
+		i915_request_put(last);
+	intel_engine_set_heartbeat(engine, old_beat);
+err:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	guc->inject_corrupt_h2g = false;
+	guc->deadlock_expected = false;
+	guc->bad_desc_expected = false;
+
+	return ret;
+}
+
+static int intel_guc_flow_control_deadlock_h2g(void *arg)
+{
+	return __intel_guc_flow_control_deadlock_h2g(arg, false);
+}
+
+static int intel_guc_flow_control_bad_desc_h2g(void *arg)
+{
+	return __intel_guc_flow_control_deadlock_h2g(arg, true);
+}
+
+int intel_guc_flow_control(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(intel_guc_flow_control_stress_ctbs),
+		SUBTEST(intel_guc_flow_control_guc_ids),
+		SUBTEST(intel_guc_flow_control_lrcd_reg),
+		SUBTEST(intel_guc_flow_control_hang_state_machine),
+		SUBTEST(intel_guc_flow_control_deadlock_h2g),
+		SUBTEST(intel_guc_flow_control_bad_desc_h2g),
+	};
+	struct intel_gt *gt = &i915->gt;
+
+	if (intel_gt_is_wedged(gt))
+		return 0;
+
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return 0;
+
+	return intel_gt_live_subtests(tests, gt);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index e2fd1b61af71..d9bd732b741a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -47,5 +47,6 @@ selftest(hangcheck, intel_hangcheck_live_selftests)
 selftest(execlists, intel_execlists_live_selftests)
 selftest(ring_submission, intel_ring_submission_live_selftests)
 selftest(perf, i915_perf_live_selftests)
+selftest(guc_flow_control, intel_guc_flow_control)
 /* Here be dragons: keep last to run last! */
 selftest(late_gt_pm, intel_gt_pm_late_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
index 4b328346b48a..310fb83c527e 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -14,6 +14,18 @@
 #define REDUCED_PREEMPT		10
 #define WAIT_FOR_RESET_TIME	10000
 
+struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		return engine;
+
+	pr_err("No valid engine found!\n");
+	return NULL;
+}
+
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
 				 struct intel_selftest_saved_policy *saved,
 				 u32 modify_type)
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
index 35c098601ac0..6c776345f75c 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
@@ -10,6 +10,7 @@
 
 struct i915_request;
 struct intel_engine_cs;
+struct intel_gt;
 
 struct intel_selftest_saved_policy {
 	u32 flags;
@@ -29,5 +30,6 @@ int intel_selftest_modify_policy(struct intel_engine_cs *engine,
 int intel_selftest_restore_policy(struct intel_engine_cs *engine,
 				  struct intel_selftest_saved_policy *saved);
 int intel_selftest_wait_for_rq(struct i915_request *rq);
+struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt);
 
 #endif

From patchwork Tue Aug  3 22:29:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417505
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 00478C43214
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:27 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id BFF5F60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:26 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BFF5F60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id BFC6E6E901;
	Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 9CB1C6E195;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745901"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745901"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512708"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 13/46] drm/i915: Add logical engine mapping
Date: Tue,  3 Aug 2021 15:29:10 -0700
Message-Id: <20210803222943.27686-14-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Add logical engine mapping. This is required for split-frame, as
workloads need to be placed on engines in a logically contiguous manner.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 60 ++++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
 .../drm/i915/gt/intel_execlists_submission.c  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 21 +------
 5 files changed, 56 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0d9105a31d84..4d790f9a65dd 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -290,7 +290,8 @@ static void nop_irq_handler(struct intel_engine_cs *engine, u16 iir)
 	GEM_DEBUG_WARN_ON(iir);
 }
 
-static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
+static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id,
+			      u8 logical_instance)
 {
 	const struct engine_info *info = &intel_engines[id];
 	struct drm_i915_private *i915 = gt->i915;
@@ -334,6 +335,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 
 	engine->class = info->class;
 	engine->instance = info->instance;
+	engine->logical_mask = BIT(logical_instance);
 	__sprint_engine_name(engine);
 
 	engine->props.heartbeat_interval_ms =
@@ -572,6 +574,37 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
 	return info->engine_mask;
 }
 
+static void populate_logical_ids(struct intel_gt *gt, u8 *logical_ids,
+				 u8 class, const u8 *map, u8 num_instances)
+{
+	int i, j;
+	u8 current_logical_id = 0;
+
+	for (j = 0; j < num_instances; ++j) {
+		for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
+			if (!HAS_ENGINE(gt, i) ||
+			    intel_engines[i].class != class)
+				continue;
+
+			if (intel_engines[i].instance == map[j]) {
+				logical_ids[intel_engines[i].instance] =
+					current_logical_id++;
+				break;
+			}
+		}
+	}
+}
+
+static void setup_logical_ids(struct intel_gt *gt, u8 *logical_ids, u8 class)
+{
+	int i;
+	u8 map[MAX_ENGINE_INSTANCE + 1];
+
+	for (i = 0; i < MAX_ENGINE_INSTANCE + 1; ++i)
+		map[i] = i;
+	populate_logical_ids(gt, logical_ids, class, map, ARRAY_SIZE(map));
+}
+
 /**
  * intel_engines_init_mmio() - allocate and prepare the Engine Command Streamers
  * @gt: pointer to struct intel_gt
@@ -583,7 +616,8 @@ int intel_engines_init_mmio(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	const unsigned int engine_mask = init_engine_mask(gt);
 	unsigned int mask = 0;
-	unsigned int i;
+	unsigned int i, class;
+	u8 logical_ids[MAX_ENGINE_INSTANCE + 1];
 	int err;
 
 	drm_WARN_ON(&i915->drm, engine_mask == 0);
@@ -593,15 +627,23 @@ int intel_engines_init_mmio(struct intel_gt *gt)
 	if (i915_inject_probe_failure(i915))
 		return -ENODEV;
 
-	for (i = 0; i < ARRAY_SIZE(intel_engines); i++) {
-		if (!HAS_ENGINE(gt, i))
-			continue;
+	for (class = 0; class < MAX_ENGINE_CLASS + 1; ++class) {
+		setup_logical_ids(gt, logical_ids, class);
 
-		err = intel_engine_setup(gt, i);
-		if (err)
-			goto cleanup;
+		for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
+			u8 instance = intel_engines[i].instance;
+
+			if (intel_engines[i].class != class ||
+			    !HAS_ENGINE(gt, i))
+				continue;
 
-		mask |= BIT(i);
+			err = intel_engine_setup(gt, i,
+						 logical_ids[instance]);
+			if (err)
+				goto cleanup;
+
+			mask |= BIT(i);
+		}
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ed91bcff20eb..85e5c9a9e502 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -266,6 +266,7 @@ struct intel_engine_cs {
 	unsigned int guc_id;
 
 	intel_engine_mask_t mask;
+	intel_engine_mask_t logical_mask;
 
 	u8 class;
 	u8 instance;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de5f9c86b9a4..baa1797af1c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3879,6 +3879,7 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 		ve->siblings[ve->num_siblings++] = sibling;
 		ve->base.mask |= sibling->mask;
+		ve->base.logical_mask |= sibling->logical_mask;
 
 		/*
 		 * All physical engines must be compatible for their emission
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 6926919bcac6..9f5f43a16182 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -176,7 +176,7 @@ static void guc_mapping_table_init(struct intel_gt *gt,
 	for_each_engine(engine, gt, id) {
 		u8 guc_class = engine_class_to_guc_class(engine->class);
 
-		system_info->mapping_table[guc_class][engine->instance] =
+		system_info->mapping_table[guc_class][ilog2(engine->logical_mask)] =
 			engine->instance;
 	}
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 310116f40509..dec757d319a2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1795,23 +1795,6 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 	return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
-static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
-{
-	switch (class) {
-	case RENDER_CLASS:
-		return mask >> RCS0;
-	case VIDEO_ENHANCEMENT_CLASS:
-		return mask >> VECS0;
-	case VIDEO_DECODE_CLASS:
-		return mask >> VCS0;
-	case COPY_ENGINE_CLASS:
-		return mask >> BCS0;
-	default:
-		MISSING_CASE(class);
-		return 0;
-	}
-}
-
 static void guc_context_policy_init(struct intel_engine_cs *engine,
 				    struct guc_lrc_desc *desc)
 {
@@ -1952,8 +1935,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 
 	desc = __get_lrc_desc(guc, ce->guc_lrcd_reg_idx);
 	desc->engine_class = engine_class_to_guc_class(engine->class);
-	desc->engine_submit_mask = adjust_engine_mask(engine->class,
-						      engine->mask);
+	desc->engine_submit_mask = engine->logical_mask;
 	desc->hw_context_desc = ce->lrc.lrca;
 	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
 	desc->priority = ce->guc_prio;
@@ -3978,6 +3960,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 		}
 
 		ve->base.mask |= sibling->mask;
+		ve->base.logical_mask |= sibling->logical_mask;
 
 		if (n != 0 && ve->base.class != sibling->class) {
 			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",

From patchwork Tue Aug  3 22:29:11 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417523
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C93A9C4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:38 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 980D260EE8
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:38 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 980D260EE8
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id CFF566E92C;
	Tue,  3 Aug 2021 22:12:00 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id BB4236E23F;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745902"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745902"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512709"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 14/46] drm/i915: Expose logical engine instance to user
Date: Tue,  3 Aug 2021 15:29:11 -0700
Message-Id: <20210803222943.27686-15-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Expose logical engine instance to user via query engine info IOCTL. This
is required for split-frame workloads as these needs to be placed on
engines in a logically contiguous order. The logical mapping can change
based on fusing. Rather than having user have knowledge of the fusing we
simply just expose the logical mapping with the existing query engine
info IOCTL.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/i915_query.c | 2 ++
 include/uapi/drm/i915_drm.h       | 8 +++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index e49da36c62fb..8a72923fbdba 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,7 +124,9 @@ query_engine_info(struct drm_i915_private *i915,
 	for_each_uabi_engine(engine, i915) {
 		info.engine.engine_class = engine->uabi_class;
 		info.engine.engine_instance = engine->uabi_instance;
+		info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
 		info.capabilities = engine->uabi_capabilities;
+		info.logical_instance = ilog2(engine->logical_mask);
 
 		if (copy_to_user(info_ptr, &info, sizeof(info)))
 			return -EFAULT;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f13d241417f..ef72e07fe08c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2706,14 +2706,20 @@ struct drm_i915_engine_info {
 
 	/** @flags: Engine flags. */
 	__u64 flags;
+#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE		(1 << 0)
 
 	/** @capabilities: Capabilities of this engine. */
 	__u64 capabilities;
 #define I915_VIDEO_CLASS_CAPABILITY_HEVC		(1 << 0)
 #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC	(1 << 1)
 
+	/** @logical_instance: Logical instance of engine */
+	__u16 logical_instance;
+
 	/** @rsvd1: Reserved fields. */
-	__u64 rsvd1[4];
+	__u16 rsvd1[3];
+	/** @rsvd2: Reserved fields. */
+	__u64 rsvd2[3];
 };
 
 /**

From patchwork Tue Aug  3 22:29:12 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417515
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 534A0C4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:33 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 26A6261037
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:33 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 26A6261037
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id B217C6E920;
	Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id C1E9289F8E;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745903"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745903"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512710"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 15/46] drm/i915/guc: Introduce context parent-child
 relationship
Date: Tue,  3 Aug 2021 15:29:12 -0700
Message-Id: <20210803222943.27686-16-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Introduce context parent-child relationship. Once this relationship is
created all pinning / unpinning operations are directed to the parent
context. The parent context is responsible for pinning all of its'
children and itself.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - a single H2G is used
register / deregister all of the contexts simultaneously.

Subsequent patches in the series will implement the pinning / unpinning
operations for parent / child contexts.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 29 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_context.h       | 18 ++++++++++++
 drivers/gpu/drm/i915/gt/intel_context_types.h | 12 ++++++++
 3 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 745e84c72c90..8cb92b10b547 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -395,6 +395,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
 
+	INIT_LIST_HEAD(&ce->guc_child_list);
+
 	spin_lock_init(&ce->guc_active.lock);
 	INIT_LIST_HEAD(&ce->guc_active.requests);
 
@@ -414,10 +416,17 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 
 void intel_context_fini(struct intel_context *ce)
 {
+	struct intel_context *child, *next;
+
 	if (ce->timeline)
 		intel_timeline_put(ce->timeline);
 	i915_vm_put(ce->vm);
 
+	/* Need to put the creation ref for the children */
+	if (intel_context_is_parent(ce))
+		for_each_child_safe(ce, child, next)
+			intel_context_put(child);
+
 	mutex_destroy(&ce->pin_mutex);
 	i915_active_fini(&ce->active);
 }
@@ -533,6 +542,26 @@ struct i915_request *intel_context_find_active_request(struct intel_context *ce)
 	return active;
 }
 
+void intel_context_bind_parent_child(struct intel_context *parent,
+				     struct intel_context *child)
+{
+	/*
+	 * Callers responsibility to validate that this function is used
+	 * correctly but we use GEM_BUG_ON here ensure that they do.
+	 */
+	GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
+	GEM_BUG_ON(intel_context_is_pinned(parent));
+	GEM_BUG_ON(intel_context_is_child(parent));
+	GEM_BUG_ON(intel_context_is_pinned(child));
+	GEM_BUG_ON(intel_context_is_child(child));
+	GEM_BUG_ON(intel_context_is_parent(child));
+
+	parent->guc_number_children++;
+	list_add_tail(&child->guc_child_link,
+		      &parent->guc_child_list);
+	child->parent = parent;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index c41098950746..ad6ce5ac4824 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -44,6 +44,24 @@ void intel_context_free(struct intel_context *ce);
 int intel_context_reconfigure_sseu(struct intel_context *ce,
 				   const struct intel_sseu sseu);
 
+static inline bool intel_context_is_child(struct intel_context *ce)
+{
+	return !!ce->parent;
+}
+
+static inline bool intel_context_is_parent(struct intel_context *ce)
+{
+	return !!ce->guc_number_children;
+}
+
+void intel_context_bind_parent_child(struct intel_context *parent,
+				     struct intel_context *child);
+
+#define for_each_child(parent, ce)\
+	list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
+#define for_each_child_safe(parent, ce, cn)\
+	list_for_each_entry_safe(ce, cn, &(parent)->guc_child_list, guc_child_link)
+
 /**
  * intel_context_lock_pinned - Stablises the 'pinned' status of the HW context
  * @ce - the context
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 2df79ba39867..66b22b370a72 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -202,6 +202,18 @@ struct intel_context {
 	/* GuC context blocked fence */
 	struct i915_sw_fence guc_blocked;
 
+	/* Head of children list or link in parent's children list */
+	union {
+		struct list_head guc_child_list;	/* parent */
+		struct list_head guc_child_link;	/* child */
+	};
+
+	/* Pointer to parent */
+	struct intel_context *parent;
+
+	/* Number of children if parent */
+	u8 guc_number_children;
+
 	/*
 	 * GuC priority management
 	 */

From patchwork Tue Aug  3 22:29:13 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417531
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 91096C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:44 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 5BB4260184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:43 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5BB4260184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 097B16E8F1;
	Tue,  3 Aug 2021 22:12:01 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id D2D726E252;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745904"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745904"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512712"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:53 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 16/46] drm/i915/guc: Implement GuC parent-child context pin /
 unpin functions
Date: Tue,  3 Aug 2021 15:29:13 -0700
Message-Id: <20210803222943.27686-17-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Implement GuC parent-child context pin / unpin functions in which in any
contexts in the relationship are pinned all the contexts are pinned. The
parent owns most of the pinning / unpinning process and the children
direct any pins / unpins to the parent.

Patch implements a number of unused functions that will be connected
later in the series.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 187 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_context.h       |  43 +---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  25 ++-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  26 +--
 drivers/gpu/drm/i915/gt/intel_lrc.h           |   6 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   5 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |   4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 183 +++++++++++++++--
 9 files changed, 371 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 8cb92b10b547..bb4c14656067 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -158,8 +158,8 @@ static void __ring_retire(struct intel_ring *ring)
 	intel_ring_unpin(ring);
 }
 
-static int intel_context_pre_pin(struct intel_context *ce,
-				 struct i915_gem_ww_ctx *ww)
+static int __intel_context_pre_pin(struct intel_context *ce,
+				   struct i915_gem_ww_ctx *ww)
 {
 	int err;
 
@@ -190,7 +190,7 @@ static int intel_context_pre_pin(struct intel_context *ce,
 	return err;
 }
 
-static void intel_context_post_unpin(struct intel_context *ce)
+static void __intel_context_post_unpin(struct intel_context *ce)
 {
 	if (ce->state)
 		__context_unpin_state(ce->state);
@@ -199,13 +199,85 @@ static void intel_context_post_unpin(struct intel_context *ce)
 	__ring_retire(ce->ring);
 }
 
-int __intel_context_do_pin_ww(struct intel_context *ce,
-			      struct i915_gem_ww_ctx *ww)
+static int intel_context_pre_pin(struct intel_context *ce,
+				 struct i915_gem_ww_ctx *ww)
 {
-	bool handoff = false;
-	void *vaddr;
+	struct intel_context *child;
+	int err, i = 0;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	for_each_child(ce, child) {
+		err = __intel_context_pre_pin(child, ww);
+		if (unlikely(err))
+			goto unwind;
+		++i;
+	}
+
+	err = __intel_context_pre_pin(ce, ww);
+	if (unlikely(err))
+		goto unwind;
+
+	return 0;
+
+unwind:
+	for_each_child(ce, child) {
+		if (!i--)
+			break;
+		__intel_context_post_unpin(ce);
+	}
+
+	return err;
+}
+
+static void intel_context_post_unpin(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	for_each_child(ce, child)
+		__intel_context_post_unpin(child);
+
+	__intel_context_post_unpin(ce);
+}
+
+static int __do_ww_lock(struct intel_context *ce,
+			struct i915_gem_ww_ctx *ww)
+{
+	int err = i915_gem_object_lock(ce->timeline->hwsp_ggtt->obj, ww);
+
+	if (!err && ce->ring->vma->obj)
+		err = i915_gem_object_lock(ce->ring->vma->obj, ww);
+	if (!err && ce->state)
+		err = i915_gem_object_lock(ce->state->obj, ww);
+
+	return err;
+}
+
+static int do_ww_lock(struct intel_context *ce,
+		      struct i915_gem_ww_ctx *ww)
+{
+	struct intel_context *child;
 	int err = 0;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	for_each_child(ce, child) {
+		err = __do_ww_lock(child, ww);
+		if (unlikely(err))
+			return err;
+	}
+
+	return __do_ww_lock(ce, ww);
+}
+
+static int __intel_context_do_pin_ww(struct intel_context *ce,
+				     struct i915_gem_ww_ctx *ww)
+{
+	bool handoff = false;
+	int err;
+
 	if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
 		err = intel_context_alloc_state(ce);
 		if (err)
@@ -217,14 +289,11 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 	 * refcount for __intel_context_active(), which prevent a lock
 	 * inversion of ce->pin_mutex vs dma_resv_lock().
 	 */
+	err = do_ww_lock(ce, ww);
+	if (err)
+		return err;
 
-	err = i915_gem_object_lock(ce->timeline->hwsp_ggtt->obj, ww);
-	if (!err && ce->ring->vma->obj)
-		err = i915_gem_object_lock(ce->ring->vma->obj, ww);
-	if (!err && ce->state)
-		err = i915_gem_object_lock(ce->state->obj, ww);
-	if (!err)
-		err = intel_context_pre_pin(ce, ww);
+	err = intel_context_pre_pin(ce, ww);
 	if (err)
 		return err;
 
@@ -232,7 +301,7 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 	if (err)
 		goto err_ctx_unpin;
 
-	err = ce->ops->pre_pin(ce, ww, &vaddr);
+	err = ce->ops->pre_pin(ce, ww);
 	if (err)
 		goto err_release;
 
@@ -250,7 +319,7 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 		if (unlikely(err))
 			goto err_unlock;
 
-		err = ce->ops->pin(ce, vaddr);
+		err = ce->ops->pin(ce);
 		if (err) {
 			intel_context_active_release(ce);
 			goto err_unlock;
@@ -290,7 +359,7 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 	return err;
 }
 
-int __intel_context_do_pin(struct intel_context *ce)
+static int __intel_context_do_pin(struct intel_context *ce)
 {
 	struct i915_gem_ww_ctx ww;
 	int err;
@@ -337,7 +406,7 @@ static void __intel_context_retire(struct i915_active *active)
 		 intel_context_get_avg_runtime_ns(ce));
 
 	set_bit(CONTEXT_VALID_BIT, &ce->flags);
-	intel_context_post_unpin(ce);
+	__intel_context_post_unpin(ce);
 	intel_context_put(ce);
 }
 
@@ -562,6 +631,88 @@ void intel_context_bind_parent_child(struct intel_context *parent,
 	child->parent = parent;
 }
 
+static inline int ____intel_context_pin(struct intel_context *ce)
+{
+	if (likely(intel_context_pin_if_active(ce)))
+		return 0;
+
+	return __intel_context_do_pin(ce);
+}
+
+static inline int __intel_context_pin_ww(struct intel_context *ce,
+					 struct i915_gem_ww_ctx *ww)
+{
+	if (likely(intel_context_pin_if_active(ce)))
+		return 0;
+
+	return __intel_context_do_pin_ww(ce, ww);
+}
+
+static inline void __intel_context_unpin(struct intel_context *ce)
+{
+	if (!ce->ops->sched_disable) {
+		__intel_context_do_unpin(ce, 1);
+	} else {
+		/*
+		 * Move ownership of this pin to the scheduling disable which is
+		 * an async operation. When that operation completes the above
+		 * intel_context_sched_disable_unpin is called potentially
+		 * unpinning the context.
+		 */
+		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
+			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
+				ce->ops->sched_disable(ce);
+				break;
+			}
+		}
+	}
+}
+
+/*
+ * FIXME: This is ugly, these branches are only needed for parallel contexts in
+ * GuC submission. Basically the idea is if any of the contexts, that are
+ * configured for parallel submission, are pinned all the contexts need to be
+ * pinned in order to register these contexts with the GuC. We are adding the
+ * layer here while it should probably be pushed to the backend via a vfunc. But
+ * since we already have ce->pin + a layer atop it is confusing. Definitely
+ * needs a bit of rework how to properly layer / structure this code path. What
+ * is in place works but is not ideal.
+ */
+int intel_context_pin(struct intel_context *ce)
+{
+	if (intel_context_is_child(ce)) {
+		if (!atomic_fetch_add(1, &ce->pin_count))
+			return ____intel_context_pin(ce->parent);
+		else
+			return 0;
+	} else {
+		return ____intel_context_pin(ce);
+	}
+}
+
+int intel_context_pin_ww(struct intel_context *ce,
+			 struct i915_gem_ww_ctx *ww)
+{
+	if (intel_context_is_child(ce)) {
+		if (!atomic_fetch_add(1, &ce->pin_count))
+			return __intel_context_pin_ww(ce->parent, ww);
+		else
+			return 0;
+	} else {
+		return __intel_context_pin_ww(ce, ww);
+	}
+}
+
+void intel_context_unpin(struct intel_context *ce)
+{
+	if (intel_context_is_child(ce)) {
+		if (atomic_fetch_add(-1, &ce->pin_count) == 1)
+			__intel_context_unpin(ce->parent);
+	} else {
+		__intel_context_unpin(ce);
+	}
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index ad6ce5ac4824..c208691fc87d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -110,31 +110,15 @@ static inline void intel_context_unlock_pinned(struct intel_context *ce)
 	mutex_unlock(&ce->pin_mutex);
 }
 
-int __intel_context_do_pin(struct intel_context *ce);
-int __intel_context_do_pin_ww(struct intel_context *ce,
-			      struct i915_gem_ww_ctx *ww);
-
 static inline bool intel_context_pin_if_active(struct intel_context *ce)
 {
 	return atomic_inc_not_zero(&ce->pin_count);
 }
 
-static inline int intel_context_pin(struct intel_context *ce)
-{
-	if (likely(intel_context_pin_if_active(ce)))
-		return 0;
-
-	return __intel_context_do_pin(ce);
-}
-
-static inline int intel_context_pin_ww(struct intel_context *ce,
-				       struct i915_gem_ww_ctx *ww)
-{
-	if (likely(intel_context_pin_if_active(ce)))
-		return 0;
+int intel_context_pin(struct intel_context *ce);
 
-	return __intel_context_do_pin_ww(ce, ww);
-}
+int intel_context_pin_ww(struct intel_context *ce,
+			 struct i915_gem_ww_ctx *ww);
 
 static inline void __intel_context_pin(struct intel_context *ce)
 {
@@ -146,28 +130,11 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub);
 
 static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
 {
+	GEM_BUG_ON(intel_context_is_child(ce));
 	__intel_context_do_unpin(ce, 2);
 }
 
-static inline void intel_context_unpin(struct intel_context *ce)
-{
-	if (!ce->ops->sched_disable) {
-		__intel_context_do_unpin(ce, 1);
-	} else {
-		/*
-		 * Move ownership of this pin to the scheduling disable which is
-		 * an async operation. When that operation completes the above
-		 * intel_context_sched_disable_unpin is called potentially
-		 * unpinning the context.
-		 */
-		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
-			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
-				ce->ops->sched_disable(ce);
-				break;
-			}
-		}
-	}
-}
+void intel_context_unpin(struct intel_context *ce);
 
 void intel_context_enter_engine(struct intel_context *ce);
 void intel_context_exit_engine(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 66b22b370a72..eb82be15b7a2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -39,8 +39,8 @@ struct intel_context_ops {
 
 	void (*ban)(struct intel_context *ce, struct i915_request *rq);
 
-	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
-	int (*pin)(struct intel_context *ce, void *vaddr);
+	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww);
+	int (*pin)(struct intel_context *ce);
 	void (*unpin)(struct intel_context *ce);
 	void (*post_unpin)(struct intel_context *ce);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index baa1797af1c8..fc74ca28f245 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2554,16 +2554,17 @@ static void execlists_submit_request(struct i915_request *request)
 static int
 __execlists_context_pre_pin(struct intel_context *ce,
 			    struct intel_engine_cs *engine,
-			    struct i915_gem_ww_ctx *ww, void **vaddr)
+			    struct i915_gem_ww_ctx *ww)
 {
 	int err;
 
-	err = lrc_pre_pin(ce, engine, ww, vaddr);
+	err = lrc_pre_pin(ce, engine, ww);
 	if (err)
 		return err;
 
 	if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) {
-		lrc_init_state(ce, engine, *vaddr);
+		lrc_init_state(ce, engine, ce->lrc_reg_state -
+			       LRC_STATE_OFFSET / sizeof(*ce->lrc_reg_state));
 
 		 __i915_gem_object_flush_map(ce->state->obj, 0, engine->context_size);
 	}
@@ -2572,15 +2573,14 @@ __execlists_context_pre_pin(struct intel_context *ce,
 }
 
 static int execlists_context_pre_pin(struct intel_context *ce,
-				     struct i915_gem_ww_ctx *ww,
-				     void **vaddr)
+				     struct i915_gem_ww_ctx *ww)
 {
-	return __execlists_context_pre_pin(ce, ce->engine, ww, vaddr);
+	return __execlists_context_pre_pin(ce, ce->engine, ww);
 }
 
-static int execlists_context_pin(struct intel_context *ce, void *vaddr)
+static int execlists_context_pin(struct intel_context *ce)
 {
-	return lrc_pin(ce, ce->engine, vaddr);
+	return lrc_pin(ce, ce->engine);
 }
 
 static int execlists_context_alloc(struct intel_context *ce)
@@ -3570,20 +3570,19 @@ static int virtual_context_alloc(struct intel_context *ce)
 }
 
 static int virtual_context_pre_pin(struct intel_context *ce,
-				   struct i915_gem_ww_ctx *ww,
-				   void **vaddr)
+				   struct i915_gem_ww_ctx *ww)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 
 	 /* Note: we must use a real engine class for setting up reg state */
-	return __execlists_context_pre_pin(ce, ve->siblings[0], ww, vaddr);
+	return __execlists_context_pre_pin(ce, ve->siblings[0], ww);
 }
 
-static int virtual_context_pin(struct intel_context *ce, void *vaddr)
+static int virtual_context_pin(struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 
-	return lrc_pin(ce, ve->siblings[0], vaddr);
+	return lrc_pin(ce, ve->siblings[0]);
 }
 
 static void virtual_context_enter(struct intel_context *ce)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index bb4af4977920..c466fc966005 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -947,30 +947,30 @@ void lrc_reset(struct intel_context *ce)
 int
 lrc_pre_pin(struct intel_context *ce,
 	    struct intel_engine_cs *engine,
-	    struct i915_gem_ww_ctx *ww,
-	    void **vaddr)
+	    struct i915_gem_ww_ctx *ww)
 {
+	void *vaddr;
 	GEM_BUG_ON(!ce->state);
 	GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
 
-	*vaddr = i915_gem_object_pin_map(ce->state->obj,
-					 i915_coherent_map_type(ce->engine->i915,
-								ce->state->obj,
-								false) |
-					 I915_MAP_OVERRIDE);
+	vaddr = i915_gem_object_pin_map(ce->state->obj,
+					i915_coherent_map_type(ce->engine->i915,
+							       ce->state->obj,
+							       false) |
+					I915_MAP_OVERRIDE);
 
-	return PTR_ERR_OR_ZERO(*vaddr);
+	ce->lrc_reg_state = vaddr + LRC_STATE_OFFSET;
+
+	return PTR_ERR_OR_ZERO(vaddr);
 }
 
 int
 lrc_pin(struct intel_context *ce,
-	struct intel_engine_cs *engine,
-	void *vaddr)
+	struct intel_engine_cs *engine)
 {
-	ce->lrc_reg_state = vaddr + LRC_STATE_OFFSET;
-
 	if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags))
-		lrc_init_state(ce, engine, vaddr);
+		lrc_init_state(ce, engine,
+			       (void *)ce->lrc_reg_state - LRC_STATE_OFFSET);
 
 	ce->lrc.lrca = lrc_update_regs(ce, engine, ce->ring->tail);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index 7f697845c4cf..837fcf00270d 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -38,12 +38,10 @@ void lrc_destroy(struct kref *kref);
 int
 lrc_pre_pin(struct intel_context *ce,
 	    struct intel_engine_cs *engine,
-	    struct i915_gem_ww_ctx *ww,
-	    void **vaddr);
+	    struct i915_gem_ww_ctx *ww);
 int
 lrc_pin(struct intel_context *ce,
-	struct intel_engine_cs *engine,
-	void *vaddr);
+	struct intel_engine_cs *engine);
 void lrc_unpin(struct intel_context *ce);
 void lrc_post_unpin(struct intel_context *ce);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 2958e2fae380..f4f301bfb9f7 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -472,8 +472,7 @@ static int ring_context_init_default_state(struct intel_context *ce,
 }
 
 static int ring_context_pre_pin(struct intel_context *ce,
-				struct i915_gem_ww_ctx *ww,
-				void **unused)
+				struct i915_gem_ww_ctx *ww)
 {
 	struct i915_address_space *vm;
 	int err = 0;
@@ -576,7 +575,7 @@ static int ring_context_alloc(struct intel_context *ce)
 	return 0;
 }
 
-static int ring_context_pin(struct intel_context *ce, void *unused)
+static int ring_context_pin(struct intel_context *ce)
 {
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 2c1af030310c..826b5d7a4573 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -167,12 +167,12 @@ static int mock_context_alloc(struct intel_context *ce)
 }
 
 static int mock_context_pre_pin(struct intel_context *ce,
-				struct i915_gem_ww_ctx *ww, void **unused)
+				struct i915_gem_ww_ctx *ww)
 {
 	return 0;
 }
 
-static int mock_context_pin(struct intel_context *ce, void *unused)
+static int mock_context_pin(struct intel_context *ce)
 {
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dec757d319a2..c5c73c42bcf7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1905,6 +1905,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 
 	GEM_BUG_ON(!engine->mask);
 	GEM_BUG_ON(context_guc_id_invalid(ce));
+	GEM_BUG_ON(intel_context_is_child(ce));
 
 	/*
 	 * Ensure LRC + CT vmas are is same region as write barrier is done
@@ -2008,15 +2009,13 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 
 static int __guc_context_pre_pin(struct intel_context *ce,
 				 struct intel_engine_cs *engine,
-				 struct i915_gem_ww_ctx *ww,
-				 void **vaddr)
+				 struct i915_gem_ww_ctx *ww)
 {
-	return lrc_pre_pin(ce, engine, ww, vaddr);
+	return lrc_pre_pin(ce, engine, ww);
 }
 
 static int __guc_context_pin(struct intel_context *ce,
-			     struct intel_engine_cs *engine,
-			     void *vaddr)
+			     struct intel_engine_cs *engine)
 {
 	if (i915_ggtt_offset(ce->state) !=
 	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
@@ -2027,20 +2026,33 @@ static int __guc_context_pin(struct intel_context *ce,
 	 * explaination of why.
 	 */
 
-	return lrc_pin(ce, engine, vaddr);
+	return lrc_pin(ce, engine);
+}
+
+static void __guc_context_unpin(struct intel_context *ce)
+{
+	lrc_unpin(ce);
+}
+
+static void __guc_context_post_unpin(struct intel_context *ce)
+{
+	lrc_post_unpin(ce);
 }
 
 static int guc_context_pre_pin(struct intel_context *ce,
-			       struct i915_gem_ww_ctx *ww,
-			       void **vaddr)
+			       struct i915_gem_ww_ctx *ww)
 {
-	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
+	return __guc_context_pre_pin(ce, ce->engine, ww);
 }
 
-static int guc_context_pin(struct intel_context *ce, void *vaddr)
+static int guc_context_pin(struct intel_context *ce)
 {
-	int ret = __guc_context_pin(ce, ce->engine, vaddr);
+	int ret;
 
+	GEM_BUG_ON(intel_context_is_parent(ce) ||
+		   intel_context_is_child(ce));
+
+	ret = __guc_context_pin(ce, ce->engine);
 	if (likely(!ret && !intel_context_is_barrier(ce)))
 		intel_engine_pm_get(ce->engine);
 
@@ -2054,7 +2066,7 @@ static void guc_context_unpin(struct intel_context *ce)
 	GEM_BUG_ON(context_enabled(ce));
 
 	unpin_guc_id(guc, ce, true);
-	lrc_unpin(ce);
+	__guc_context_unpin(ce);
 
 	if (likely(!intel_context_is_barrier(ce)))
 		intel_engine_pm_put(ce->engine);
@@ -2062,7 +2074,141 @@ static void guc_context_unpin(struct intel_context *ce)
 
 static void guc_context_post_unpin(struct intel_context *ce)
 {
-	lrc_post_unpin(ce);
+	__guc_context_post_unpin(ce);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static int guc_parent_context_pre_pin(struct intel_context *ce,
+				      struct i915_gem_ww_ctx *ww)
+{
+	struct intel_context *child;
+	int err, i = 0, j = 0;
+
+	for_each_child(ce, child) {
+		err = i915_active_acquire(&child->active);
+		if (unlikely(err))
+			goto unwind_active;
+		++i;
+	}
+
+	for_each_child(ce, child) {
+		err = __guc_context_pre_pin(child, child->engine, ww);
+		if (unlikely(err))
+			goto unwind_pre_pin;
+		++j;
+	}
+
+	err = __guc_context_pre_pin(ce, ce->engine, ww);
+	if (unlikely(err))
+		goto unwind_pre_pin;
+
+	return 0;
+
+unwind_pre_pin:
+	for_each_child(ce, child) {
+		if (!j--)
+			break;
+		__guc_context_post_unpin(child);
+	}
+
+unwind_active:
+	for_each_child(ce, child) {
+		if (!i--)
+			break;
+		i915_active_release(&child->active);
+	}
+
+	return err;
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_parent_context_post_unpin(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	for_each_child(ce, child)
+		__guc_context_post_unpin(child);
+	__guc_context_post_unpin(ce);
+
+	for_each_child(ce, child) {
+		intel_context_get(child);
+		i915_active_release(&child->active);
+		intel_context_put(child);
+	}
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static int guc_parent_context_pin(struct intel_context *ce)
+{
+	int ret, i = 0, j = 0;
+	struct intel_context *child;
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	for_each_child(ce, child) {
+		ret = __guc_context_pin(child, child->engine);
+		if (unlikely(ret))
+			goto unwind_pin;
+		++i;
+	}
+	ret = __guc_context_pin(ce, ce->engine);
+	if (unlikely(ret))
+		goto unwind_pin;
+
+	for_each_child(ce, child)
+		if (test_bit(CONTEXT_LRCA_DIRTY, &child->flags)) {
+			set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+			break;
+		}
+
+	for_each_engine_masked(engine, ce->engine->gt,
+			       ce->engine->mask, tmp)
+		intel_engine_pm_get(engine);
+	for_each_child(ce, child)
+		for_each_engine_masked(engine, child->engine->gt,
+				       child->engine->mask, tmp)
+			intel_engine_pm_get(engine);
+
+	return 0;
+
+unwind_pin:
+	for_each_child(ce, child) {
+		if (++j > i)
+			break;
+		__guc_context_unpin(child);
+	}
+
+	return ret;
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_parent_context_unpin(struct intel_context *ce)
+{
+	struct intel_context *child;
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+	GEM_BUG_ON(context_enabled(ce));
+
+	unpin_guc_id(ce_to_guc(ce), ce, true);
+	for_each_child(ce, child)
+		__guc_context_unpin(child);
+	__guc_context_unpin(ce);
+
+	for_each_engine_masked(engine, ce->engine->gt,
+			       ce->engine->mask, tmp)
+		intel_engine_pm_put(engine);
+	for_each_child(ce, child)
+		for_each_engine_masked(engine, child->engine->gt,
+				       child->engine->mask, tmp)
+			intel_engine_pm_put(engine);
 }
 
 static void __guc_context_sched_enable(struct intel_guc *guc,
@@ -2993,18 +3139,17 @@ static int guc_request_alloc(struct i915_request *rq)
 }
 
 static int guc_virtual_context_pre_pin(struct intel_context *ce,
-				       struct i915_gem_ww_ctx *ww,
-				       void **vaddr)
+				       struct i915_gem_ww_ctx *ww)
 {
 	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
 
-	return __guc_context_pre_pin(ce, engine, ww, vaddr);
+	return __guc_context_pre_pin(ce, engine, ww);
 }
 
-static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
+static int guc_virtual_context_pin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
-	int ret = __guc_context_pin(ce, engine, vaddr);
+	int ret = __guc_context_pin(ce, engine);
 	intel_engine_mask_t tmp, mask = ce->engine->mask;
 
 	if (likely(!ret))
@@ -3024,7 +3169,7 @@ static void guc_virtual_context_unpin(struct intel_context *ce)
 	GEM_BUG_ON(intel_context_is_barrier(ce));
 
 	unpin_guc_id(guc, ce, true);
-	lrc_unpin(ce);
+	__guc_context_unpin(ce);
 
 	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
 		intel_engine_pm_put(engine);

From patchwork Tue Aug  3 22:29:14 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417535
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7B7D2C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:47 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 519A1601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:47 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 519A1601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 13F746E93C;
	Tue,  3 Aug 2021 22:12:03 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id EC8696E2D1;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745905"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745905"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512713"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 17/46] drm/i915/guc: Add multi-lrc context registration
Date: Tue,  3 Aug 2021 15:29:14 -0700
Message-Id: <20210803222943.27686-18-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   5 +
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 117 +++++++++++++++++-
 5 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index eb82be15b7a2..9665cb31bab0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -211,9 +211,15 @@ struct intel_context {
 	/* Pointer to parent */
 	struct intel_context *parent;
 
+	/* GuC workqueue head / tail - only applies to parent */
+	u16 guc_wqi_tail;
+	u16 guc_wqi_head;
+
 	/* Number of children if parent */
 	u8 guc_number_children;
 
+	u8 parent_page; /* if set, page num reserved for parent context */
+
 	/*
 	 * GuC priority management
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index c466fc966005..4b65c3a98331 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
 		context_size += PAGE_SIZE;
 	}
 
+	if (intel_context_is_parent(ce)) {
+		ce->parent_page = context_size / PAGE_SIZE;
+		context_size += PAGE_SIZE;
+	}
+
 	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
 	if (IS_ERR(obj))
 		obj = i915_gem_object_create_shmem(engine->i915, context_size);
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index d832c8f11c11..0a496acec213 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
 	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+	INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
 	INTEL_GUC_ACTION_RESET_CLIENT = 0x5507,
 	INTEL_GUC_ACTION_LIMIT
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 82534259b7ad..e08fbd40281c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -51,7 +51,7 @@
 
 #define GUC_DOORBELL_INVALID		256
 
-#define GUC_WQ_SIZE			(PAGE_SIZE * 2)
+#define GUC_WQ_SIZE			(PAGE_SIZE / 2)
 
 /* Work queue item header definitions */
 #define WQ_STATUS_ACTIVE		1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c5c73c42bcf7..98c1c0b7b087 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -439,6 +439,39 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
+/*
+ * When using multi-lrc submission an extra page in the context state is
+ * reserved for the process descriptor and work queue.
+ *
+ * The layout of this page is below:
+ * 0						guc_process_desc
+ * ...						unused
+ * PAGE_SIZE / 2				work queue start
+ * ...						work queue
+ * PAGE_SIZE - 1				work queue end
+ */
+#define WQ_OFFSET	(PAGE_SIZE / 2)
+static inline u32 __get_process_desc_offset(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->parent_page);
+
+	return ce->parent_page * PAGE_SIZE;
+}
+
+static inline u32 __get_wq_offset(struct intel_context *ce)
+{
+	return __get_process_desc_offset(ce) + WQ_OFFSET;
+}
+
+static inline struct guc_process_desc *
+__get_process_desc(struct intel_context *ce)
+{
+	return (struct guc_process_desc *)
+		(ce->lrc_reg_state +
+		 ((__get_process_desc_offset(ce) -
+		   LRC_STATE_OFFSET) / sizeof(u32)));
+}
+
 static u32 __get_lrc_desc_offset(struct intel_guc *guc, int index)
 {
 	GEM_BUG_ON(index >= guc->lrcd_reg.max_idx);
@@ -1743,6 +1776,28 @@ static void unpin_guc_id(struct intel_guc *guc,
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 }
 
+static int __guc_action_register_multi_lrc(struct intel_guc *guc,
+					   struct intel_context *ce,
+					   u32 guc_id,
+					   bool loop)
+{
+	struct intel_context *child;
+	u32 action[4 + MAX_ENGINE_INSTANCE];
+	int len = 0;
+
+	GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
+
+	action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
+	action[len++] = guc_id;
+	action[len++] = ce->guc_number_children + 1;
+	action[len++] = __get_lrc_desc_offset(guc, ce->guc_lrcd_reg_idx);
+	for_each_child(ce, child)
+		action[len++] =
+			__get_lrc_desc_offset(guc, child->guc_lrcd_reg_idx);
+
+	return guc_submission_send_busy_loop(guc, action, len, 0, loop);
+}
+
 static int __guc_action_register_context(struct intel_guc *guc,
 					 struct intel_context *ce,
 					 u32 guc_id,
@@ -1763,9 +1818,14 @@ static int register_context(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = ce_to_guc(ce);
 	int ret;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	trace_intel_context_register(ce);
 
-	ret = __guc_action_register_context(guc, ce, ce->guc_id, loop);
+	if (intel_context_is_parent(ce))
+		ret =  __guc_action_register_multi_lrc(guc, ce, ce->guc_id,
+						       loop);
+	else
+		ret = __guc_action_register_context(guc, ce, ce->guc_id, loop);
 	if (likely(!ret))
 		set_context_registered(ce);
 
@@ -1790,6 +1850,7 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	trace_intel_context_deregister(ce);
 
 	return __guc_action_deregister_context(guc, guc_id, loop);
@@ -1860,7 +1921,11 @@ static void __free_lrcd_reg_idx(struct intel_guc *guc, struct intel_context *ce)
 
 static void free_lrcd_reg_idx(struct intel_guc *guc, struct intel_context *ce)
 {
+	struct intel_context *child;
+
 	__free_lrcd_reg_idx(guc, ce);
+	for_each_child(ce, child)
+		__free_lrcd_reg_idx(guc, child);
 }
 
 static int guc_lrcd_reg_init(struct intel_guc *guc)
@@ -1901,6 +1966,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
 	bool context_registered;
 	intel_wakeref_t wakeref;
+	struct intel_context *child;
 	int ret = 0;
 
 	GEM_BUG_ON(!engine->mask);
@@ -1921,6 +1987,13 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			return ret;
 		ce->guc_lrcd_reg_idx = ret;
 	}
+	for_each_child(ce, child)
+		if (likely(!child->guc_lrcd_reg_idx)) {
+			ret = alloc_lrcd_reg_idx(guc, !loop);
+			if (unlikely(ret < 0))
+				return ret;
+			child->guc_lrcd_reg_idx = ret;
+		}
 
 	context_registered = lrc_desc_registered(guc, desc_idx);
 
@@ -1944,6 +2017,42 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	guc_context_policy_init(engine, desc);
 	init_sched_state(ce);
 
+	/*
+	 * Context is a parent, we need to register a process descriptor
+	 * describing a work queue and register all child contexts.
+	 */
+	if (intel_context_is_parent(ce)) {
+		struct guc_process_desc *pdesc;
+
+		ce->guc_wqi_tail = 0;
+		ce->guc_wqi_head = 0;
+
+		desc->process_desc = i915_ggtt_offset(ce->state) +
+			__get_process_desc_offset(ce);
+		desc->wq_addr = i915_ggtt_offset(ce->state) +
+			__get_wq_offset(ce);
+		desc->wq_size = GUC_WQ_SIZE;
+
+		pdesc = __get_process_desc(ce);
+		memset(pdesc, 0, sizeof(*(pdesc)));
+		pdesc->stage_id = ce->guc_id;
+		pdesc->wq_base_addr = desc->wq_addr;
+		pdesc->wq_size_bytes = desc->wq_size;
+		pdesc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+		pdesc->wq_status = WQ_STATUS_ACTIVE;
+
+		for_each_child(ce, child) {
+			desc = __get_lrc_desc(guc, child->guc_lrcd_reg_idx);
+
+			desc->engine_class =
+				engine_class_to_guc_class(engine->class);
+			desc->hw_context_desc = child->lrc.lrca;
+			desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
+			guc_context_policy_init(engine, desc);
+		}
+	}
+
 	/*
 	 * The context_lookup xarray is used to determine if the hardware
 	 * context is currently registered. There are two cases in which it
@@ -3653,6 +3762,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 		return NULL;
 	}
 
+	if (unlikely(intel_context_is_child(ce))) {
+		drm_err(&guc_to_gt(guc)->i915->drm,
+			"Context is child, desc_idx %u", desc_idx);
+		return NULL;
+	}
+
 	return ce;
 }
 

From patchwork Tue Aug  3 22:29:15 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417539
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1DB6BC4320A
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:51 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id D7F02601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:50 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D7F02601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id EBDBA6E93B;
	Tue,  3 Aug 2021 22:12:02 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id EE4FC6E40B;
 Tue,  3 Aug 2021 22:11:55 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745906"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745906"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512714"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 18/46] drm/i915/guc: Ensure GuC schedule operations do not
 operate on child contexts
Date: Tue,  3 Aug 2021 15:29:15 -0700
Message-Id: <20210803222943.27686-19-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

In GuC parent-child contexts the parent context controls the scheduling,
ensure only the parent does the scheduling operations.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++----
 1 file changed, 41 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 98c1c0b7b087..f23dd716723f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -405,6 +405,18 @@ static inline void decr_context_blocked(struct intel_context *ce)
 	ce->guc_state.sched_state -= SCHED_STATE_BLOCKED;
 }
 
+static inline struct intel_context *
+to_parent(struct intel_context *ce)
+{
+	return intel_context_is_child(ce) ? ce->parent : ce;
+}
+
+static inline struct intel_context *
+request_to_scheduling_context(struct i915_request *rq)
+{
+	return to_parent(rq->context);
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return ce->guc_id == GUC_INVALID_LRC_ID;
@@ -711,7 +723,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
 static int tasklet_register_context(struct guc_submit_engine *gse,
 				    struct i915_request *rq)
 {
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
 	struct intel_guc *guc = gse->sched_engine.private_data;
 	int ret = 0;
 
@@ -720,6 +732,7 @@ static int tasklet_register_context(struct guc_submit_engine *gse,
 	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
 	GEM_BUG_ON(request_has_no_guc_id(rq));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
+	GEM_BUG_ON(intel_context_is_child(ce));
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 
 	/*
@@ -2355,6 +2368,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 #endif
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	trace_intel_context_sched_disable(ce);
 
 	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -2570,6 +2584,8 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	u16 guc_id;
 	bool enabled;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
 	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		clr_context_enabled(ce);
@@ -2971,6 +2987,8 @@ static void guc_signal_context_fence(struct intel_context *ce)
 {
 	unsigned long flags;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	clr_context_wait_for_deregister_to_register(ce);
 	__guc_signal_context_fence(ce);
@@ -3056,14 +3074,26 @@ static bool context_needs_lrc_desc_pin(struct intel_context *ce, bool new_guc_id
 		!submission_disabled(ce_to_guc(ce));
 }
 
+static void clear_lrca_dirty(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+	for_each_child(ce, child)
+		clear_bit(CONTEXT_LRCA_DIRTY, &child->flags);
+}
+
 static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
 			      struct i915_request *rq)
 {
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
 	int ret = 0;
 
 	lockdep_assert_held(&gse->sched_engine.lock);
 	GEM_BUG_ON(!ce->guc_num_rq_submit_no_id);
+	GEM_BUG_ON(intel_context_is_child(ce));
 
 	if (atomic_add_unless(&ce->guc_id_ref, ce->guc_num_rq_submit_no_id, 0))
 		goto out;
@@ -3091,7 +3121,7 @@ static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
 		gse->submission_stall_reason = STALL_SCHED_DISABLE;
 	}
 
-	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+	clear_lrca_dirty(ce);
 out:
 	gse->total_num_rq_with_no_guc_id -= ce->guc_num_rq_submit_no_id;
 	GEM_BUG_ON(gse->total_num_rq_with_no_guc_id < 0);
@@ -3122,7 +3152,7 @@ static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
 
 static int guc_request_alloc(struct i915_request *rq)
 {
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
 	struct intel_guc *guc = ce_to_guc(ce);
 	struct guc_submit_engine *gse = ce_to_gse(ce);
 	unsigned long flags;
@@ -3173,11 +3203,12 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * persistent until the generated request is retired. Thus, sealing these
 	 * race conditions.
 	 *
-	 * There is no need for a lock here as the timeline mutex ensures at
-	 * most one context can be executing this code path at once. The
-	 * guc_id_ref is incremented once for every request in flight and
-	 * decremented on each retire. When it is zero, a lock around the
-	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+	 * There is no need for a lock here as the timeline mutex (or
+	 * parallel_submit mutex in the case of multi-lrc) ensures at most one
+	 * context can be executing this code path at once. The guc_id_ref is
+	 * incremented once for every request in flight and decremented on each
+	 * retire. When it is zero, a lock around the increment (in pin_guc_id)
+	 * is needed to seal a race with unpin_guc_id.
 	 */
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
 		goto out;
@@ -3215,8 +3246,7 @@ static int guc_request_alloc(struct i915_request *rq)
 		}
 	}
 
-	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
-
+	clear_lrca_dirty(ce);
 out:
 	incr_num_rq_not_ready(ce);
 

From patchwork Tue Aug  3 22:29:16 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417521
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 65D0FC4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:37 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 28B4D60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:37 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 28B4D60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id B7DBC6E924;
	Tue,  3 Aug 2021 22:11:59 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 086CB6E8DD;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745907"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745907"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512715"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child
 relationship consecutive guc_ids
Date: Tue,  3 Aug 2021 15:29:16 -0700
Message-Id: <20210803222943.27686-20-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.h       |   6 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |   3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   7 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 222 ++++++++++++------
 .../i915/gt/uc/intel_guc_submission_types.h   |  10 +
 5 files changed, 179 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index c208691fc87d..7ce3b3d2edb7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -54,6 +54,12 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
 	return !!ce->guc_number_children;
 }
 
+static inline struct intel_context *
+intel_context_to_parent(struct intel_context *ce)
+{
+	return intel_context_is_child(ce) ? ce->parent : ce;
+}
+
 void intel_context_bind_parent_child(struct intel_context *parent,
 				     struct intel_context *child);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index ea763138197f..c3d4baa1b2b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -849,6 +849,7 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
 
 static void nop_submit_request(struct i915_request *request)
 {
+	struct intel_context *ce = intel_context_to_parent(request->context);
 	RQ_TRACE(request, "-EIO\n");
 
 	/*
@@ -857,7 +858,7 @@ static void nop_submit_request(struct i915_request *request)
 	 * this for now.
 	 */
 	if (intel_engine_uses_guc(request->engine))
-		intel_guc_decr_num_rq_not_ready(request->context);
+		intel_guc_decr_num_rq_not_ready(ce);
 
 	request = i915_request_mark_eio(request);
 	if (request) {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index c0c60ccabfa4..30a0f364db8f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -24,6 +24,7 @@ struct __guc_ads_blob;
 
 enum {
 	GUC_SUBMIT_ENGINE_SINGLE_LRC,
+	GUC_SUBMIT_ENGINE_MULTI_LRC,
 	GUC_SUBMIT_ENGINE_MAX
 };
 
@@ -59,8 +60,10 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
-	struct list_head guc_id_list_no_ref;
-	struct list_head guc_id_list_unpinned;
+	unsigned long *guc_ids_bitmap;
+#define MAX_GUC_ID_ORDER	(order_base_2(MAX_ENGINE_INSTANCE + 1))
+	struct list_head guc_id_list_no_ref[MAX_GUC_ID_ORDER + 1];
+	struct list_head guc_id_list_unpinned[MAX_GUC_ID_ORDER + 1];
 
 	spinlock_t destroy_lock;	/* protects list / worker */
 	struct list_head destroyed_contexts;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f23dd716723f..afb9b4bb8971 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -169,6 +169,15 @@ static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
 	clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
 }
 
+/*
+ * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
+ * and a different allocation algorithm is used (bitmap vs. ida). We believe the
+ * number of multi-lrc contexts in use should be low and 1/16 should be
+ * sufficient. Minimum of 32 ids for multi-lrc.
+ */
+#define NUMBER_MULTI_LRC_GUC_ID(guc) \
+	((guc)->num_guc_ids / 16 > 32 ? (guc)->num_guc_ids / 16 : 32)
+
 /*
  * Below is a set of functions which control the GuC scheduling state which do
  * not require a lock as all state transitions are mutually exclusive. i.e. It
@@ -405,16 +414,10 @@ static inline void decr_context_blocked(struct intel_context *ce)
 	ce->guc_state.sched_state -= SCHED_STATE_BLOCKED;
 }
 
-static inline struct intel_context *
-to_parent(struct intel_context *ce)
-{
-	return intel_context_is_child(ce) ? ce->parent : ce;
-}
-
 static inline struct intel_context *
 request_to_scheduling_context(struct i915_request *rq)
 {
-	return to_parent(rq->context);
+	return intel_context_to_parent(rq->context);
 }
 
 static inline bool context_guc_id_invalid(struct intel_context *ce)
@@ -1436,7 +1439,7 @@ static void destroy_worker_func(struct work_struct *w);
  */
 int intel_guc_submission_init(struct intel_guc *guc)
 {
-	int ret;
+	int ret, i;
 
 	if (guc_submission_initialized(guc))
 		return 0;
@@ -1448,9 +1451,13 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
 	spin_lock_init(&guc->contexts_lock);
-	INIT_LIST_HEAD(&guc->guc_id_list_no_ref);
-	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
+	for (i = 0; i < MAX_GUC_ID_ORDER + 1; ++i) {
+		INIT_LIST_HEAD(&guc->guc_id_list_no_ref[i]);
+		INIT_LIST_HEAD(&guc->guc_id_list_unpinned[i]);
+	}
 	ida_init(&guc->guc_ids);
+	guc->guc_ids_bitmap =
+		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
 
 	spin_lock_init(&guc->destroy_lock);
 
@@ -1476,6 +1483,8 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 
 		i915_sched_engine_put(sched_engine);
 	}
+
+	bitmap_free(guc->guc_ids_bitmap);
 }
 
 static inline void queue_request(struct i915_sched_engine *sched_engine,
@@ -1499,11 +1508,13 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 static bool too_many_guc_ids_not_ready(struct guc_submit_engine *gse,
 				       struct intel_context *ce)
 {
-	u32 available_guc_ids, guc_ids_consumed;
 	struct intel_guc *guc = gse->sched_engine.private_data;
+	u32 available_guc_ids = intel_context_is_parent(ce) ?
+		NUMBER_MULTI_LRC_GUC_ID(guc) :
+		guc->num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
+	u32 guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
 
-	available_guc_ids = guc->num_guc_ids;
-	guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
+	GEM_BUG_ON(intel_context_is_child(ce));
 
 	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
 		set_and_update_guc_ids_exhausted(gse);
@@ -1517,17 +1528,26 @@ static void incr_num_rq_not_ready(struct intel_context *ce)
 {
 	struct guc_submit_engine *gse = ce_to_gse(ce);
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+	GEM_BUG_ON(!intel_context_is_parent(ce) &&
+		   ce->guc_number_children);
+
 	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
-		atomic_inc(&gse->num_guc_ids_not_ready);
+		atomic_add(ce->guc_number_children + 1,
+			   &gse->num_guc_ids_not_ready);
 }
 
 void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
 {
 	struct guc_submit_engine *gse = ce_to_gse(ce);
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
 	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1) {
 		GEM_BUG_ON(!atomic_read(&gse->num_guc_ids_not_ready));
-		atomic_dec(&gse->num_guc_ids_not_ready);
+
+		atomic_sub(ce->guc_number_children + 1,
+			   &gse->num_guc_ids_not_ready);
 	}
 }
 
@@ -1579,20 +1599,42 @@ static void guc_submit_request(struct i915_request *rq)
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 
-	intel_guc_decr_num_rq_not_ready(rq->context);
+	intel_guc_decr_num_rq_not_ready(request_to_scheduling_context(rq));
 }
 
-static int new_guc_id(struct intel_guc *guc)
+static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
-	return ida_simple_get(&guc->guc_ids, 0,
-			      guc->num_guc_ids, GFP_KERNEL |
-			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+	int ret;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	if (intel_context_is_parent(ce))
+		ret = bitmap_find_free_region(guc->guc_ids_bitmap,
+					      NUMBER_MULTI_LRC_GUC_ID(guc),
+					      order_base_2(ce->guc_number_children
+							   + 1));
+	else
+		ret = ida_simple_get(&guc->guc_ids,
+				     NUMBER_MULTI_LRC_GUC_ID(guc),
+				     guc->num_guc_ids, GFP_KERNEL |
+				     __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+	if (unlikely(ret < 0))
+		return ret;
+
+	ce->guc_id = ret;
+	return 0;
 }
 
 static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
+	GEM_BUG_ON(intel_context_is_child(ce));
 	if (!context_guc_id_invalid(ce)) {
-		ida_simple_remove(&guc->guc_ids, ce->guc_id);
+		if (intel_context_is_parent(ce))
+			bitmap_release_region(guc->guc_ids_bitmap, ce->guc_id,
+					      order_base_2(ce->guc_number_children
+							   + 1));
+		else
+			ida_simple_remove(&guc->guc_ids, ce->guc_id);
 		clr_lrc_desc_registered(guc, ce->guc_id);
 		set_context_guc_id_invalid(ce);
 	}
@@ -1604,6 +1646,8 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
 	unsigned long flags;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
 	spin_lock_irqsave(&guc->contexts_lock, flags);
 	__release_guc_id(guc, ce);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
@@ -1618,54 +1662,93 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
  * schedule disable H2G + a deregister H2G.
  */
 static struct list_head *get_guc_id_list(struct intel_guc *guc,
+					 u8 number_children,
 					 bool unpinned)
 {
+	GEM_BUG_ON(order_base_2(number_children + 1) > MAX_GUC_ID_ORDER);
+
 	if (unpinned)
-		return &guc->guc_id_list_unpinned;
+		return &guc->guc_id_list_unpinned[order_base_2(number_children + 1)];
 	else
-		return &guc->guc_id_list_no_ref;
+		return &guc->guc_id_list_no_ref[order_base_2(number_children + 1)];
 }
 
-static int steal_guc_id(struct intel_guc *guc, bool unpinned)
+static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce,
+			bool unpinned)
 {
-	struct intel_context *ce;
-	int guc_id;
-	struct list_head *guc_id_list = get_guc_id_list(guc, unpinned);
+	struct intel_context *cn;
+	u8 number_children = ce->guc_number_children;
 
 	lockdep_assert_held(&guc->contexts_lock);
+	GEM_BUG_ON(intel_context_is_child(ce));
 
-	if (!list_empty(guc_id_list)) {
-		ce = list_first_entry(guc_id_list,
-				      struct intel_context,
-				      guc_id_link);
+	do {
+		struct list_head *guc_id_list =
+			get_guc_id_list(guc, number_children, unpinned);
 
-		/* Ensure context getting stolen in expected state */
-		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
-		GEM_BUG_ON(context_guc_id_invalid(ce));
-		GEM_BUG_ON(context_guc_id_stolen(ce));
+		if (!list_empty(guc_id_list)) {
+			u8 cn_o2, ce_o2 =
+				order_base_2(ce->guc_number_children + 1);
 
-		list_del_init(&ce->guc_id_link);
-		guc_id = ce->guc_id;
-		clr_context_registered(ce);
+			cn = list_first_entry(guc_id_list,
+					      struct intel_context,
+					      guc_id_link);
+			cn_o2 = order_base_2(cn->guc_number_children + 1);
+
+			/*
+			 * Corner case where a multi-lrc context steals a guc_id
+			 * from another context that has more guc_id that itself.
+			 */
+			if (cn_o2 != ce_o2) {
+				bitmap_release_region(guc->guc_ids_bitmap,
+						      cn->guc_id,
+						      cn_o2);
+				bitmap_allocate_region(guc->guc_ids_bitmap,
+						       ce->guc_id,
+						       ce_o2);
+			}
+
+			/* Ensure context getting stolen in expected state */
+			GEM_BUG_ON(atomic_read(&cn->guc_id_ref));
+			GEM_BUG_ON(context_guc_id_invalid(cn));
+			GEM_BUG_ON(context_guc_id_stolen(cn));
+			GEM_BUG_ON(ce_to_gse(ce) != ce_to_gse(cn));
+
+			list_del_init(&cn->guc_id_link);
+			ce->guc_id = cn->guc_id;
+
+			/*
+			 * If stealing from the pinned list, defer invalidating
+			 * the guc_id until the retire workqueue processes this
+			 * context.
+			 */
+			clr_context_registered(cn);
+			if (!unpinned) {
+				GEM_BUG_ON(ce_to_gse(cn)->stalled_context);
+				ce_to_gse(cn)->stalled_context =
+					intel_context_get(cn);
+				set_context_guc_id_stolen(cn);
+			} else {
+				set_context_guc_id_invalid(cn);
+			}
+
+			return 0;
+		}
 
 		/*
-		 * If stealing from the pinned list, defer invalidating
-		 * the guc_id until the retire workqueue processes this
-		 * context.
+		 * When using multi-lrc we search the guc_id_lists with the
+		 * least amount of guc_ids required first but will consume a
+		 * block larger of guc_ids if necessary. 2x the children always
+		 * moves you two the next list.
 		 */
-		if (!unpinned) {
-			GEM_BUG_ON(ce_to_gse(ce)->stalled_context);
+		if (!number_children ||
+		    order_base_2(number_children + 1) == MAX_GUC_ID_ORDER)
+			break;
 
-			ce_to_gse(ce)->stalled_context = intel_context_get(ce);
-			set_context_guc_id_stolen(ce);
-		} else {
-			set_context_guc_id_invalid(ce);
-		}
+		number_children *= 2;
+	} while (true);
 
-		return guc_id;
-	} else {
-		return -EAGAIN;
-	}
+	return -EAGAIN;
 }
 
 enum {	/* Return values for pin_guc_id / assign_guc_id */
@@ -1674,17 +1757,18 @@ enum {	/* Return values for pin_guc_id / assign_guc_id */
 	NEW_GUC_ID_ENABLED	= 2,
 };
 
-static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
+static int assign_guc_id(struct intel_guc *guc, struct intel_context *ce,
+			 bool tasklet)
 {
 	int ret;
 
 	lockdep_assert_held(&guc->contexts_lock);
+	GEM_BUG_ON(intel_context_is_child(ce));
 
-	ret = new_guc_id(guc);
+	ret = new_guc_id(guc, ce);
 	if (unlikely(ret < 0)) {
-		ret = steal_guc_id(guc, true);
-		if (ret >= 0) {
-			*out = ret;
+		ret = steal_guc_id(guc, ce, true);
+		if (!ret) {
 			ret = NEW_GUC_ID_DISABLED;
 		} else if (ret < 0 && tasklet) {
 			/*
@@ -1692,15 +1776,18 @@ static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
 			 * enabled if guc_ids are exhausted and we are submitting
 			 * from the tasklet.
 			 */
-			ret = steal_guc_id(guc, false);
-			if (ret >= 0) {
-				*out = ret;
+			ret = steal_guc_id(guc, ce, false);
+			if (!ret)
 				ret = NEW_GUC_ID_ENABLED;
-			}
 		}
-	} else {
-		*out = ret;
-		ret = SAME_GUC_ID;
+	}
+
+	if (!(ret < 0) && intel_context_is_parent(ce)) {
+		struct intel_context *child;
+		int i = 1;
+
+		for_each_child(ce, child)
+			child->guc_id = ce->guc_id + i++;
 	}
 
 	return ret;
@@ -1713,6 +1800,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
 	int ret = 0;
 	unsigned long flags, tries = PIN_GUC_ID_TRIES;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
 
 try_again:
@@ -1724,7 +1812,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
 	}
 
 	if (context_guc_id_invalid(ce)) {
-		ret = assign_guc_id(guc, &ce->guc_id, tasklet);
+		ret = assign_guc_id(guc, ce, tasklet);
 		if (unlikely(ret < 0))
 			goto out_unlock;
 	}
@@ -1770,6 +1858,7 @@ static void unpin_guc_id(struct intel_guc *guc,
 	unsigned long flags;
 
 	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
+	GEM_BUG_ON(intel_context_is_child(ce));
 
 	if (unlikely(context_guc_id_invalid(ce)))
 		return;
@@ -1781,7 +1870,8 @@ static void unpin_guc_id(struct intel_guc *guc,
 
 	if (!context_guc_id_invalid(ce) && !context_guc_id_stolen(ce) &&
 	    !atomic_read(&ce->guc_id_ref)) {
-		struct list_head *head = get_guc_id_list(guc, unpinned);
+		struct list_head *head =
+			get_guc_id_list(guc, ce->guc_number_children, unpinned);
 
 		list_add_tail(&ce->guc_id_link, head);
 	}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
index 7069b7248f55..a5933e07bdd2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
@@ -22,6 +22,16 @@ struct guc_virtual_engine {
 /*
  * Object which encapsulates the globally operated on i915_sched_engine +
  * the GuC submission state machine described in intel_guc_submission.c.
+ *
+ * Currently we have two instances of these per GuC. One for single-lrc and one
+ * for multi-lrc submission. We split these into two submission engines as they
+ * can operate in parallel allowing a blocking condition on one not to affect
+ * the other. i.e. guc_ids are statically allocated between these two submission
+ * modes. One mode may have guc_ids exhausted which requires blocking while the
+ * other has plenty of guc_ids and can make forward progres.
+ *
+ * In the future if different submission use cases arise we can simply
+ * instantiate another of these objects and assign it to the context.
  */
 struct guc_submit_engine {
 	struct i915_sched_engine sched_engine;

From patchwork Tue Aug  3 22:29:17 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417541
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A8B1FC43214
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:52 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 7A1D660184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:52 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7A1D660184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id F20026E947;
	Tue,  3 Aug 2021 22:12:03 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 1DDA96E8E1;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745910"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745910"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512717"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 20/46] drm/i915/guc: Add hang check to GuC submit engine
Date: Tue,  3 Aug 2021 15:29:17 -0700
Message-Id: <20210803222943.27686-21-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

The heartbeat uses a single instance of a GuC submit engine (GSE) to do
the hang check. As such if a different GSE's state machine hangs, the
heartbeat cannot detect this hang. Add timer to each GSE which in turn
can disable all submissions if it is hung.

Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
 .../i915/gt/uc/intel_guc_submission_types.h   |  3 ++
 2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index afb9b4bb8971..2d8296bcc583 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -105,15 +105,21 @@ static bool tasklet_blocked(struct guc_submit_engine *gse)
 	return test_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
+/* 2 seconds seems like a reasonable timeout waiting for a G2H */
+#define MAX_TASKLET_BLOCKED_NS	2000000000
 static void set_tasklet_blocked(struct guc_submit_engine *gse)
 {
 	lockdep_assert_held(&gse->sched_engine.lock);
+	hrtimer_start_range_ns(&gse->hang_timer,
+			       ns_to_ktime(MAX_TASKLET_BLOCKED_NS), 0,
+			       HRTIMER_MODE_REL_PINNED);
 	set_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
 static void __clr_tasklet_blocked(struct guc_submit_engine *gse)
 {
 	lockdep_assert_held(&gse->sched_engine.lock);
+	hrtimer_cancel(&gse->hang_timer);
 	clear_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
 }
 
@@ -1028,6 +1034,7 @@ static void disable_submission(struct intel_guc *guc)
 		if (__tasklet_is_enabled(&sched_engine->tasklet)) {
 			GEM_BUG_ON(!guc->ct.enabled);
 			__tasklet_disable_sync_once(&sched_engine->tasklet);
+			hrtimer_try_to_cancel(&guc->gse[i]->hang_timer);
 			sched_engine->tasklet.callback = NULL;
 		}
 	}
@@ -3750,6 +3757,33 @@ static void guc_sched_engine_destroy(struct kref *kref)
 	kfree(gse);
 }
 
+static enum hrtimer_restart gse_hang(struct hrtimer *hrtimer)
+{
+	struct guc_submit_engine *gse =
+		container_of(hrtimer, struct guc_submit_engine, hang_timer);
+	struct intel_guc *guc = gse->sched_engine.private_data;
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+	if (guc->gse_hang_expected)
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"GSE[%i] hung, disabling submission", gse->id);
+	else
+		drm_err(&guc_to_gt(guc)->i915->drm,
+			"GSE[%i] hung, disabling submission", gse->id);
+#else
+	drm_err(&guc_to_gt(guc)->i915->drm,
+		"GSE[%i] hung, disabling submission", gse->id);
+#endif
+
+	/*
+	 * Tasklet not making forward progress, disable submission which in turn
+	 * will kick in the heartbeat to do a full GPU reset.
+	 */
+	disable_submission(guc);
+
+	return HRTIMER_NORESTART;
+}
+
 static void guc_submit_engine_init(struct intel_guc *guc,
 				   struct guc_submit_engine *gse,
 				   int id)
@@ -3767,6 +3801,8 @@ static void guc_submit_engine_init(struct intel_guc *guc,
 	sched_engine->retire_inflight_request_prio =
 		guc_retire_inflight_request_prio;
 	sched_engine->private_data = guc;
+	hrtimer_init(&gse->hang_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	gse->hang_timer.function = gse_hang;
 	gse->id = id;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
index a5933e07bdd2..eae2e9725ede 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
@@ -6,6 +6,8 @@
 #ifndef _INTEL_GUC_SUBMISSION_TYPES_H_
 #define _INTEL_GUC_SUBMISSION_TYPES_H_
 
+#include <linux/xarray.h>
+
 #include "gt/intel_engine_types.h"
 #include "gt/intel_context_types.h"
 #include "i915_scheduler_types.h"
@@ -41,6 +43,7 @@ struct guc_submit_engine {
 	unsigned long flags;
 	int total_num_rq_with_no_guc_id;
 	atomic_t num_guc_ids_not_ready;
+	struct hrtimer hang_timer;
 	int id;
 
 	/*

From patchwork Tue Aug  3 22:29:18 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417561
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1BACBC4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:11 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id DDCAD601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:10 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DDCAD601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 704546E8E9;
	Tue,  3 Aug 2021 22:12:07 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 1E01D6E8E2;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745912"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745912"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512719"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 21/46] drm/i915/guc: Add guc_child_context_destroy
Date: Tue,  3 Aug 2021 15:29:18 -0700
Message-Id: <20210803222943.27686-22-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Since child contexts do not own the guc_ids or GuC context registration,
child contexts can simply be freed on destroy. Add
guc_child_context_destroy context operation to do this.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2d8296bcc583..850edeff9230 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2828,6 +2828,13 @@ static void destroy_worker_func(struct work_struct *w)
 		intel_gt_pm_unpark_work_add(gt, destroy_worker);
 }
 
+/* Future patches will use this function */
+__maybe_unused
+static void guc_child_context_destroy(struct kref *kref)
+{
+	__guc_context_destroy(container_of(kref, struct intel_context, ref));
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);

From patchwork Tue Aug  3 22:29:19 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417543
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D8143C4320A
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:53 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id A3A6460EFF
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:53 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A3A6460EFF
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 23A766E147;
	Tue,  3 Aug 2021 22:12:03 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 3CBE46E14B;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745913"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745913"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512720"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 22/46] drm/i915/guc: Implement multi-lrc submission
Date: Tue,  3 Aug 2021 15:29:19 -0700
Message-Id: <20210803222943.27686-23-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 224 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h           |   8 +
 3 files changed, 223 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index e08fbd40281c..6910a0cdb8c8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -64,12 +64,14 @@
 #define   WQ_TYPE_PSEUDO		(0x2 << WQ_TYPE_SHIFT)
 #define   WQ_TYPE_INORDER		(0x3 << WQ_TYPE_SHIFT)
 #define   WQ_TYPE_NOOP			(0x4 << WQ_TYPE_SHIFT)
-#define WQ_TARGET_SHIFT			10
+#define   WQ_TYPE_MULTI_LRC		(0x5 << WQ_TYPE_SHIFT)
+#define WQ_TARGET_SHIFT			8
 #define WQ_LEN_SHIFT			16
 #define WQ_NO_WCFLUSH_WAIT		(1 << 27)
 #define WQ_PRESENT_WORKLOAD		(1 << 28)
 
-#define WQ_RING_TAIL_SHIFT		20
+#define WQ_GUC_ID_SHIFT			0
+#define WQ_RING_TAIL_SHIFT		18
 #define WQ_RING_TAIL_MAX		0x7FF	/* 2^11 QWords */
 #define WQ_RING_TAIL_MASK		(WQ_RING_TAIL_MAX << WQ_RING_TAIL_SHIFT)
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 850edeff9230..d1d4a1e59e8d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -493,6 +493,29 @@ __get_process_desc(struct intel_context *ce)
 		   LRC_STATE_OFFSET) / sizeof(u32)));
 }
 
+static inline u32 *get_wq_pointer(struct guc_process_desc *desc,
+				  struct intel_context *ce,
+				  u32 wqi_size)
+{
+	/*
+	 * Check for space in work queue. Caching a value of head pointer in
+	 * intel_context structure in order reduce the number accesses to shared
+	 * GPU memory which may be across a PCIe bus.
+	 */
+#define AVAILABLE_SPACE	\
+	CIRC_SPACE(ce->guc_wqi_tail, ce->guc_wqi_head, GUC_WQ_SIZE)
+	if (wqi_size > AVAILABLE_SPACE) {
+		ce->guc_wqi_head = READ_ONCE(desc->head);
+
+		if (wqi_size > AVAILABLE_SPACE)
+			return NULL;
+	}
+#undef AVAILABLE_SPACE
+
+	return ((u32 *)__get_process_desc(ce)) +
+		((WQ_OFFSET + ce->guc_wqi_tail) / sizeof(u32));
+}
+
 static u32 __get_lrc_desc_offset(struct intel_guc *guc, int index)
 {
 	GEM_BUG_ON(index >= guc->lrcd_reg.max_idx);
@@ -643,7 +666,7 @@ static inline bool request_has_no_guc_id(struct i915_request *rq)
 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err = 0;
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
@@ -697,6 +720,18 @@ static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		trace_intel_context_sched_enable(ce);
 		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
+
+		/*
+		 * Without multi-lrc KMD does the submission step (moving the
+		 * lrc tail) so enabling scheduling is sufficient to submit the
+		 * context. This isn't the case in multi-lrc submission as the
+		 * GuC needs to move the tails, hence the need for another H2G
+		 * to submit a multi-lrc context after enabling scheduling.
+		 */
+		if (intel_context_is_parent(ce)) {
+			action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
+			err = intel_guc_send_nb(guc, action, len - 1, 0);
+		}
 	} else if (!enabled) {
 		clr_context_pending_enable(ce);
 		intel_context_put(ce);
@@ -783,6 +818,132 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static inline bool is_multi_lrc_rq(struct i915_request *rq)
+{
+	return intel_context_is_child(rq->context) ||
+		intel_context_is_parent(rq->context);
+}
+
+/*
+ * Multi-lrc requests are not submitted to the GuC until all requests in
+ * the set are ready. With the exception of the last request in the set,
+ * submitting a multi-lrc request is therefore just a status update on
+ * the driver-side and can be safely merged with other requests. When the
+ * last multi-lrc request in a set is detected, we break out of the
+ * submission loop and submit the whole set, thus we never attempt to
+ * merge that one with othe requests.
+ */
+static inline bool can_merge_rq(struct i915_request *rq,
+				struct i915_request *last)
+{
+	return is_multi_lrc_rq(last) || rq->context == last->context;
+}
+
+static inline u32 wq_space_until_wrap(struct intel_context *ce)
+{
+	return (GUC_WQ_SIZE - ce->guc_wqi_tail);
+}
+
+static inline void write_wqi(struct guc_process_desc *desc,
+			     struct intel_context *ce,
+			     u32 wqi_size)
+{
+	ce->guc_wqi_tail = (ce->guc_wqi_tail + wqi_size) & (GUC_WQ_SIZE - 1);
+	WRITE_ONCE(desc->tail, ce->guc_wqi_tail);
+}
+
+static inline int guc_wq_noop_append(struct intel_context *ce)
+{
+	struct guc_process_desc *desc = __get_process_desc(ce);
+	u32 *wqi = get_wq_pointer(desc, ce, wq_space_until_wrap(ce));
+
+	if (!wqi)
+		return -EBUSY;
+
+	*wqi = WQ_TYPE_NOOP |
+		((wq_space_until_wrap(ce) / sizeof(u32) - 1) << WQ_LEN_SHIFT);
+	ce->guc_wqi_tail = 0;
+
+	return 0;
+}
+
+static int __guc_wq_item_append(struct i915_request *rq)
+{
+	struct intel_context *ce = request_to_scheduling_context(rq);
+	struct intel_context *child;
+	struct guc_process_desc *desc = __get_process_desc(ce);
+	unsigned int wqi_size = (ce->guc_number_children + 4) *
+		sizeof(u32);
+	u32 *wqi;
+	int ret;
+
+	/* Ensure context is in correct state updating work queue */
+	GEM_BUG_ON(ce->guc_num_rq_submit_no_id);
+	GEM_BUG_ON(request_has_no_guc_id(rq));
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+	GEM_BUG_ON(context_pending_disable(ce));
+	GEM_BUG_ON(context_wait_for_deregister_to_register(ce));
+
+	/* Insert NOOP if this work queue item will wrap the tail pointer. */
+	if (wqi_size > wq_space_until_wrap(ce)) {
+		ret = guc_wq_noop_append(ce);
+		if (ret)
+			return ret;
+	}
+
+	wqi = get_wq_pointer(desc, ce, wqi_size);
+	if (!wqi)
+		return -EBUSY;
+
+	*wqi++ = WQ_TYPE_MULTI_LRC |
+		((wqi_size / sizeof(u32) - 1) << WQ_LEN_SHIFT);
+	*wqi++ = ce->lrc.lrca;
+	*wqi++ = (ce->guc_id << WQ_GUC_ID_SHIFT) |
+		 ((ce->ring->tail / sizeof(u64)) << WQ_RING_TAIL_SHIFT);
+	*wqi++ = 0;	/* fence_id */
+	for_each_child(ce, child)
+		*wqi++ = child->ring->tail / sizeof(u64);
+
+	write_wqi(desc, ce, wqi_size);
+
+	return 0;
+}
+
+static int gse_wq_item_append(struct guc_submit_engine *gse,
+			      struct i915_request *rq)
+{
+	struct intel_context *ce = request_to_scheduling_context(rq);
+	int ret = 0;
+
+	if (likely(!intel_context_is_banned(ce))) {
+		ret = __guc_wq_item_append(rq);
+
+		if (unlikely(ret == -EBUSY)) {
+			gse->stalled_rq = rq;
+			gse->submission_stall_reason = STALL_MOVE_LRC_TAIL;
+		}
+	}
+
+	return ret;
+}
+
+static inline bool multi_lrc_submit(struct i915_request *rq)
+{
+	struct intel_context *ce = request_to_scheduling_context(rq);
+
+	intel_ring_set_tail(rq->ring, rq->tail);
+
+	/*
+	 * We expect the front end (execbuf IOCTL) to set this flag on the last
+	 * request generated from a multi-BB submission. This indicates to the
+	 * backend (GuC interface) that we should submit this context thus
+	 * submitting all the requests generated in parallel.
+	 */
+	return test_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL, &rq->fence.flags) ||
+		intel_context_is_banned(ce);
+}
+
 static void kick_retire_wq(struct guc_submit_engine *gse)
 {
 	queue_work(system_unbound_wq, &gse->retire_worker);
@@ -826,7 +987,7 @@ static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 			struct i915_request *rq, *rn;
 
 			priolist_for_each_request_consume(rq, rn, p) {
-				if (last && rq->context != last->context)
+				if (last && !can_merge_rq(rq, last))
 					goto done;
 
 				list_del_init(&rq->sched.link);
@@ -835,7 +996,22 @@ static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 
 				trace_i915_request_in(rq, 0);
 				last = rq;
-				submit = true;
+
+				if (is_multi_lrc_rq(rq)) {
+					/*
+					 * We need to coalesce all multi-lrc
+					 * requests in a relationship into a
+					 * single H2G. We are guaranteed that
+					 * all of these requests will be
+					 * submitted sequentially.
+					 */
+					if (multi_lrc_submit(rq)) {
+						submit = true;
+						goto done;
+					}
+				} else {
+					submit = true;
+				}
 			}
 
 			rb_erase_cached(&p->node, &sched_engine->queue);
@@ -845,7 +1021,7 @@ static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 
 done:
 	if (submit) {
-		struct intel_context *ce = last->context;
+		struct intel_context *ce = request_to_scheduling_context(last);
 
 		if (ce->guc_num_rq_submit_no_id) {
 			ret = tasklet_pin_guc_id(gse, last);
@@ -867,7 +1043,17 @@ static int gse_dequeue_one_context(struct guc_submit_engine *gse)
 		}
 
 move_lrc_tail:
-		guc_set_lrc_tail(last);
+		if (is_multi_lrc_rq(last)) {
+			ret = gse_wq_item_append(gse, last);
+			if (ret == -EBUSY) {
+				goto schedule_tasklet;
+			} else if (ret != 0) {
+				GEM_WARN_ON(ret);	/* Unexpected */
+				goto deadlk;
+			}
+		} else {
+			guc_set_lrc_tail(last);
+		}
 
 add_request:
 		ret = gse_add_request(gse, last);
@@ -1575,14 +1761,22 @@ static bool need_tasklet(struct guc_submit_engine *gse, struct intel_context *ce
 static int gse_bypass_tasklet_submit(struct guc_submit_engine *gse,
 				     struct i915_request *rq)
 {
-	int ret;
+	int ret = 0;
 
 	__i915_request_submit(rq);
 
 	trace_i915_request_in(rq, 0);
 
-	guc_set_lrc_tail(rq);
-	ret = gse_add_request(gse, rq);
+	if (is_multi_lrc_rq(rq)) {
+		if (multi_lrc_submit(rq)) {
+			ret = gse_wq_item_append(gse, rq);
+			if (!ret)
+				ret = gse_add_request(gse, rq);
+		}
+	} else {
+		guc_set_lrc_tail(rq);
+		ret = gse_add_request(gse, rq);
+	}
 
 	if (unlikely(ret == -EPIPE))
 		disable_submission(gse->sched_engine.private_data);
@@ -1599,7 +1793,7 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (need_tasklet(gse, rq->context))
+	if (need_tasklet(gse, request_to_scheduling_context(rq)))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (gse_bypass_tasklet_submit(gse, rq) == -EBUSY)
 		kick_tasklet(gse);
@@ -2988,9 +3182,10 @@ static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
 
 static void add_to_context(struct i915_request *rq)
 {
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
 	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
 
 	spin_lock(&ce->guc_active.lock);
@@ -3026,7 +3221,9 @@ static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
 
 static void remove_from_context(struct i915_request *rq)
 {
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
+
+	GEM_BUG_ON(intel_context_is_child(ce));
 
 	spin_lock_irq(&ce->guc_active.lock);
 
@@ -3231,7 +3428,8 @@ static int tasklet_pin_guc_id(struct guc_submit_engine *gse,
 	GEM_BUG_ON(gse->total_num_rq_with_no_guc_id < 0);
 
 	list_for_each_entry_reverse(rq, &ce->guc_active.requests, sched.link)
-		if (request_has_no_guc_id(rq)) {
+		if (request_has_no_guc_id(rq) &&
+		    request_to_scheduling_context(rq) == ce) {
 			--ce->guc_num_rq_submit_no_id;
 			clear_bit(I915_FENCE_FLAG_GUC_ID_NOT_PINNED,
 				  &rq->fence.flags);
@@ -3551,7 +3749,7 @@ static void guc_bump_inflight_request_prio(struct i915_request *rq,
 
 static void guc_retire_inflight_request_prio(struct i915_request *rq)
 {
-	struct intel_context *ce = rq->context;
+	struct intel_context *ce = request_to_scheduling_context(rq);
 
 	spin_lock(&ce->guc_active.lock);
 	guc_prio_fini(rq, ce);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 807f76750cf4..d6d5bf0a5eb5 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -145,6 +145,14 @@ enum {
 	 * tasklet that the guc_id isn't pinned.
 	 */
 	I915_FENCE_FLAG_GUC_ID_NOT_PINNED,
+
+	/*
+	 * I915_FENCE_FLAG_SUBMIT_PARALLEL - request with a context in a
+	 * parent-child relationship (parallel submission, multi-lrc) should
+	 * trigger a submission to the GuC rather than just moving the context
+	 * tail.
+	 */
+	I915_FENCE_FLAG_SUBMIT_PARALLEL,
 };
 
 /**

From patchwork Tue Aug  3 22:29:20 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417547
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 470B1C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:58 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 12C4B60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:58 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 12C4B60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 48AAD6E167;
	Tue,  3 Aug 2021 22:12:04 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 420596E8E6;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745914"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745914"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512721"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 23/46] drm/i915/guc: Insert submit fences between requests in
 parent-child relationship
Date: Tue,  3 Aug 2021 15:29:20 -0700
Message-Id: <20210803222943.27686-24-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

The GuC must receive requests in the order submitted for contexts in a
parent-child relationship to function correctly. To ensure this, insert
a submit fence between the current request and last request submitted
for requests / contexts in a parent child relationship. This is
conceptually similar to a single timeline.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h       |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +-
 drivers/gpu/drm/i915/i915_request.c           | 120 ++++++++++++++----
 5 files changed, 105 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index bb4c14656067..98ef2d0f7a39 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -487,6 +487,8 @@ void intel_context_fini(struct intel_context *ce)
 {
 	struct intel_context *child, *next;
 
+	if (ce->last_rq)
+		i915_request_put(ce->last_rq);
 	if (ce->timeline)
 		intel_timeline_put(ce->timeline);
 	i915_vm_put(ce->vm);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 7ce3b3d2edb7..a302599e436a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -60,6 +60,11 @@ intel_context_to_parent(struct intel_context *ce)
 	return intel_context_is_child(ce) ? ce->parent : ce;
 }
 
+static inline bool intel_context_is_parallel(struct intel_context *ce)
+{
+	return intel_context_is_child(ce) || intel_context_is_parent(ce);
+}
+
 void intel_context_bind_parent_child(struct intel_context *parent,
 				     struct intel_context *child);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 9665cb31bab0..f4fc81f64921 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -225,6 +225,9 @@ struct intel_context {
 	 */
 	u8 guc_prio;
 	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
+
+	/* Last request submitted on a parent */
+	struct i915_request *last_rq;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d1d4a1e59e8d..1cb382f7d79d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -820,8 +820,7 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static inline bool is_multi_lrc_rq(struct i915_request *rq)
 {
-	return intel_context_is_child(rq->context) ||
-		intel_context_is_parent(rq->context);
+	return intel_context_is_parallel(rq->context);
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ce446716d092..2e51c8999088 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1546,36 +1546,62 @@ i915_request_await_object(struct i915_request *to,
 	return ret;
 }
 
+static inline bool is_parallel_rq(struct i915_request *rq)
+{
+	return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+	return intel_context_to_parent(rq->context);
+}
+
 static struct i915_request *
-__i915_request_add_to_timeline(struct i915_request *rq)
+__i915_request_ensure_parallel_ordering(struct i915_request *rq,
+					struct intel_timeline *timeline)
 {
-	struct intel_timeline *timeline = i915_request_timeline(rq);
 	struct i915_request *prev;
 
-	/*
-	 * Dependency tracking and request ordering along the timeline
-	 * is special cased so that we can eliminate redundant ordering
-	 * operations while building the request (we know that the timeline
-	 * itself is ordered, and here we guarantee it).
-	 *
-	 * As we know we will need to emit tracking along the timeline,
-	 * we embed the hooks into our request struct -- at the cost of
-	 * having to have specialised no-allocation interfaces (which will
-	 * be beneficial elsewhere).
-	 *
-	 * A second benefit to open-coding i915_request_await_request is
-	 * that we can apply a slight variant of the rules specialised
-	 * for timelines that jump between engines (such as virtual engines).
-	 * If we consider the case of virtual engine, we must emit a dma-fence
-	 * to prevent scheduling of the second request until the first is
-	 * complete (to maximise our greedy late load balancing) and this
-	 * precludes optimising to use semaphores serialisation of a single
-	 * timeline across engines.
-	 */
+	GEM_BUG_ON(!is_parallel_rq(rq));
+
+	prev = request_to_parent(rq)->last_rq;
+	if (prev) {
+		if (!__i915_request_is_complete(prev)) {
+			i915_sw_fence_await_sw_fence(&rq->submit,
+						     &prev->submit,
+						     &rq->submitq);
+
+			if (rq->engine->sched_engine->schedule)
+				__i915_sched_node_add_dependency(&rq->sched,
+								 &prev->sched,
+								 &rq->dep,
+								 0);
+		}
+		i915_request_put(prev);
+	}
+
+	request_to_parent(rq)->last_rq = i915_request_get(rq);
+
+	return to_request(__i915_active_fence_set(&timeline->last_request,
+						  &rq->fence));
+}
+
+static struct i915_request *
+__i915_request_ensure_ordering(struct i915_request *rq,
+			       struct intel_timeline *timeline)
+{
+	struct i915_request *prev;
+
+	GEM_BUG_ON(is_parallel_rq(rq));
+
 	prev = to_request(__i915_active_fence_set(&timeline->last_request,
 						  &rq->fence));
+
 	if (prev && !__i915_request_is_complete(prev)) {
 		bool uses_guc = intel_engine_uses_guc(rq->engine);
+		bool pow2 = is_power_of_2(READ_ONCE(prev->engine)->mask |
+					  rq->engine->mask);
+		bool same_context = prev->context == rq->context;
 
 		/*
 		 * The requests are supposed to be kept in order. However,
@@ -1583,13 +1609,11 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 		 * is used as a barrier for external modification to this
 		 * context.
 		 */
-		GEM_BUG_ON(prev->context == rq->context &&
+		GEM_BUG_ON(same_context &&
 			   i915_seqno_passed(prev->fence.seqno,
 					     rq->fence.seqno));
 
-		if ((!uses_guc &&
-		     is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
-		    (uses_guc && prev->context == rq->context))
+		if ((same_context && uses_guc) || (!uses_guc && pow2))
 			i915_sw_fence_await_sw_fence(&rq->submit,
 						     &prev->submit,
 						     &rq->submitq);
@@ -1604,6 +1628,50 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 							 0);
 	}
 
+	return prev;
+}
+
+static struct i915_request *
+__i915_request_add_to_timeline(struct i915_request *rq)
+{
+	struct intel_timeline *timeline = i915_request_timeline(rq);
+	struct i915_request *prev;
+
+	/*
+	 * Dependency tracking and request ordering along the timeline
+	 * is special cased so that we can eliminate redundant ordering
+	 * operations while building the request (we know that the timeline
+	 * itself is ordered, and here we guarantee it).
+	 *
+	 * As we know we will need to emit tracking along the timeline,
+	 * we embed the hooks into our request struct -- at the cost of
+	 * having to have specialised no-allocation interfaces (which will
+	 * be beneficial elsewhere).
+	 *
+	 * A second benefit to open-coding i915_request_await_request is
+	 * that we can apply a slight variant of the rules specialised
+	 * for timelines that jump between engines (such as virtual engines).
+	 * If we consider the case of virtual engine, we must emit a dma-fence
+	 * to prevent scheduling of the second request until the first is
+	 * complete (to maximise our greedy late load balancing) and this
+	 * precludes optimising to use semaphores serialisation of a single
+	 * timeline across engines.
+	 *
+	 * We do not order parallel submission requests on the timeline as each
+	 * parallel submission context has its own timeline and the ordering
+	 * rules for parallel requests are that they must be submitted in the
+	 * order received from the execbuf IOCTL. So rather than using the
+	 * timeline we store a pointer to last request submitted in the
+	 * relationship in the gem context and insert a submission fence
+	 * between that request and request passed into this function or
+	 * alternatively we use completion fence if gem context has a single
+	 * timeline and this is the first submission of an execbuf IOCTL.
+	 */
+	if (likely(!is_parallel_rq(rq)))
+		prev = __i915_request_ensure_ordering(rq, timeline);
+	else
+		prev = __i915_request_ensure_parallel_ordering(rq, timeline);
+
 	/*
 	 * Make sure that no request gazumped us - if it was allocated after
 	 * our i915_request_alloc() and called __i915_request_add() before

From patchwork Tue Aug  3 22:29:21 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417527
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 85569C43214
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:42 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 4B36B601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:42 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4B36B601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 090226E8F3;
	Tue,  3 Aug 2021 22:12:02 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 420726E8E7;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745915"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745915"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512722"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 24/46] drm/i915/guc: Implement multi-lrc reset
Date: Tue,  3 Aug 2021 15:29:21 -0700
Message-Id: <20210803222943.27686-25-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Update context and full GPU reset to work with multi-lrc. The idea is
parent context tracks all the active requests inflight for itself and
its' children. The parent context owns the reset replaying / canceling
requests as needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 12 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 57 ++++++++++++++-----
 2 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 98ef2d0f7a39..c327fd1c24c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -595,20 +595,22 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 
 struct i915_request *intel_context_find_active_request(struct intel_context *ce)
 {
+	struct intel_context *parent = intel_context_is_child(ce) ?
+		ce->parent : ce;
 	struct i915_request *rq, *active = NULL;
 	unsigned long flags;
 
 	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
 
-	spin_lock_irqsave(&ce->guc_active.lock, flags);
-	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+	spin_lock_irqsave(&parent->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &parent->guc_active.requests,
 				    sched.link) {
-		if (i915_request_completed(rq))
+		if (i915_request_completed(rq) && rq->context == ce)
 			break;
 
-		active = rq;
+		active = (rq->context == ce) ? rq : active;
 	}
-	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+	spin_unlock_irqrestore(&parent->guc_active.lock, flags);
 
 	return active;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1cb382f7d79d..30df1c8db491 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -818,6 +818,11 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static inline bool is_multi_lrc(struct intel_context *ce)
+{
+	return intel_context_is_parallel(ce);
+}
+
 static inline bool is_multi_lrc_rq(struct i915_request *rq)
 {
 	return intel_context_is_parallel(rq->context);
@@ -1381,6 +1386,8 @@ __unwind_incomplete_requests(struct intel_context *ce)
 		ce->engine->sched_engine;
 	unsigned long flags;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
 	spin_lock_irqsave(&sched_engine->lock, flags);
 	spin_lock(&ce->guc_active.lock);
 	list_for_each_entry_safe(rq, rn,
@@ -1413,8 +1420,12 @@ __unwind_incomplete_requests(struct intel_context *ce)
 
 static void __guc_reset_context(struct intel_context *ce, bool stalled)
 {
+	bool local_stalled;
 	struct i915_request *rq;
 	u32 head;
+	int i, number_children = ce->guc_number_children;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
 
 	intel_context_get(ce);
 
@@ -1426,22 +1437,32 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	 */
 	clr_context_enabled(ce);
 
-	rq = intel_context_find_active_request(ce);
-	if (!rq) {
-		head = ce->ring->tail;
-		stalled = false;
-		goto out_replay;
-	}
+	for (i = 0; i < number_children + 1; ++i) {
+		if (!intel_context_is_pinned(ce))
+			goto next_context;
+
+		local_stalled = false;
+		rq = intel_context_find_active_request(ce);
+		if (!rq) {
+			head = ce->ring->tail;
+			goto out_replay;
+		}
 
-	if (!i915_request_started(rq))
-		stalled = false;
+		GEM_BUG_ON(i915_active_is_idle(&ce->active));
+		head = intel_ring_wrap(ce->ring, rq->head);
 
-	GEM_BUG_ON(i915_active_is_idle(&ce->active));
-	head = intel_ring_wrap(ce->ring, rq->head);
-	__i915_request_reset(rq, stalled);
+		if (i915_request_started(rq))
+			local_stalled = true;
 
+		__i915_request_reset(rq, local_stalled && stalled);
 out_replay:
-	guc_reset_state(ce, head, stalled);
+		guc_reset_state(ce, head, local_stalled && stalled);
+next_context:
+		if (i != number_children)
+			ce = list_next_entry(ce, guc_child_link);
+	}
+	ce = intel_context_to_parent(ce);
+
 	__unwind_incomplete_requests(ce);
 	ce->guc_num_rq_submit_no_id = 0;
 	intel_context_put(ce);
@@ -1458,9 +1479,11 @@ void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
 	}
 
 	xa_for_each(&guc->context_lookup, index, ce)
-		if (intel_context_is_pinned(ce))
+		if (intel_context_is_pinned(ce) &&
+		    !intel_context_is_child(ce))
 			__guc_reset_context(ce, stalled);
 
+	/* GuC is blown away, drop all references to contexts */
 	xa_destroy(&guc->context_lookup);
 }
 
@@ -1513,7 +1536,8 @@ gse_cancel_requests(struct guc_submit_engine *gse)
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
-			struct intel_context *ce = rq->context;
+			struct intel_context *ce =
+				request_to_scheduling_context(rq);
 
 			list_del_init(&rq->sched.link);
 
@@ -1543,7 +1567,8 @@ void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 	int i;
 
 	xa_for_each(&guc->context_lookup, index, ce)
-		if (intel_context_is_pinned(ce))
+		if (intel_context_is_pinned(ce) &&
+		    !intel_context_is_child(ce))
 			guc_cancel_context_requests(ce);
 
 	for (i = 0; i < GUC_SUBMIT_ENGINE_MAX; ++i)
@@ -2823,6 +2848,8 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 	intel_wakeref_t wakeref;
 	unsigned long flags;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
+
 	gse_flush_submissions(ce_to_gse(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);

From patchwork Tue Aug  3 22:29:22 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417537
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AF767C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:49 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 77D7F60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:49 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 77D7F60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 292E36E93D;
	Tue,  3 Aug 2021 22:12:03 +0000 (UTC)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 659EB6E8F1;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="235745916"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="235745916"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512723"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 25/46] drm/i915/guc: Update debugfs for GuC multi-lrc
Date: Tue,  3 Aug 2021 15:29:22 -0700
Message-Id: <20210803222943.27686-26-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Display the workqueue status in debugfs for GuC contexts that are in
parent-child relationship.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +++++++++++++------
 1 file changed, 39 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 30df1c8db491..44a7582c9aed 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 		gse_log_submission_info(guc->gse[i], p, i);
 }
 
+static inline void guc_log_context(struct drm_printer *p,
+				   struct intel_context *ce)
+{
+	drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
+	drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+	drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+		   ce->ring->head,
+		   ce->lrc_reg_state[CTX_RING_HEAD]);
+	drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+		   ce->ring->tail,
+		   ce->lrc_reg_state[CTX_RING_TAIL]);
+	drm_printf(p, "\t\tContext Pin Count: %u\n",
+		   atomic_read(&ce->pin_count));
+	drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+		   atomic_read(&ce->guc_id_ref));
+	drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
+		   atomic_read(&ce->guc_num_rq_not_ready));
+	drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
+		   ce->guc_state.sched_state,
+		   atomic_read(&ce->guc_sched_state_no_lock));
+}
+
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p)
 {
 	struct intel_context *ce;
 	unsigned long index;
 	xa_for_each(&guc->context_lookup, index, ce) {
-		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
-		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
-		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
-			   ce->ring->head,
-			   ce->lrc_reg_state[CTX_RING_HEAD]);
-		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
-			   ce->ring->tail,
-			   ce->lrc_reg_state[CTX_RING_TAIL]);
-		drm_printf(p, "\t\tContext Pin Count: %u\n",
-			   atomic_read(&ce->pin_count));
-		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
-			   atomic_read(&ce->guc_id_ref));
-		drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
-			   atomic_read(&ce->guc_num_rq_not_ready));
-		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
-			   ce->guc_state.sched_state,
-			   atomic_read(&ce->guc_sched_state_no_lock));
+		GEM_BUG_ON(intel_context_is_child(ce));
 
+		guc_log_context(p, ce);
 		guc_log_context_priority(p, ce);
+
+		if (intel_context_is_parent(ce)) {
+			struct guc_process_desc *desc = __get_process_desc(ce);
+			struct intel_context *child;
+
+			drm_printf(p, "\t\tWQI Head: %u\n",
+				   READ_ONCE(desc->head));
+			drm_printf(p, "\t\tWQI Tail: %u\n",
+				   READ_ONCE(desc->tail));
+			drm_printf(p, "\t\tWQI Status: %u\n\n",
+				   READ_ONCE(desc->wq_status));
+
+			for_each_child(ce, child)
+				guc_log_context(p, child);
+		}
 	}
 }
 

From patchwork Tue Aug  3 22:29:23 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417565
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E0DB4C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:13 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id B1BE160EE8
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:13 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B1BE160EE8
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 2F69A6E40B;
	Tue,  3 Aug 2021 22:12:07 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id D6DBA6E8F5;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393472"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393472"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512724"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:54 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 26/46] drm/i915: Connect UAPI to GuC multi-lrc interface
Date: Tue,  3 Aug 2021 15:29:23 -0700
Message-Id: <20210803222943.27686-27-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 157 +++++++++++++++++-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   8 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |  12 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   6 +-
 .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 111 +++++++++++--
 include/uapi/drm/i915_drm.h                   | 128 ++++++++++++++
 9 files changed, 417 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index cff72679ad7c..2b0dd3ff4db8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -515,9 +515,149 @@ set_proto_ctx_engines_bond(struct i915_user_extension __user *base, void *data)
 	return 0;
 }
 
+static int
+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+				      void *data)
+{
+	struct i915_context_engines_parallel_submit __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_proto_ctx_engines *set = data;
+	struct drm_i915_private *i915 = set->i915;
+	u64 flags;
+	int err = 0, n, i, j;
+	u16 slot, width, num_siblings;
+	struct intel_engine_cs **siblings = NULL;
+	intel_engine_mask_t prev_mask;
+
+	/* Disabling for now */
+	return -ENODEV;
+
+	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+		return -ENODEV;
+
+	if (get_user(slot, &ext->engine_index))
+		return -EFAULT;
+
+	if (get_user(width, &ext->width))
+		return -EFAULT;
+
+	if (get_user(num_siblings, &ext->num_siblings))
+		return -EFAULT;
+
+	if (slot >= set->num_engines) {
+		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+			slot, set->num_engines);
+		return -EINVAL;
+	}
+
+	if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+		drm_dbg(&i915->drm,
+			"Invalid placement[%d], already occupied\n", slot);
+		return -EINVAL;
+	}
+
+	if (get_user(flags, &ext->flags))
+		return -EFAULT;
+
+	if (flags) {
+		drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+		return -EINVAL;
+	}
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (width < 2) {
+		drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+		return -EINVAL;
+	}
+
+	if (num_siblings < 1) {
+		drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+			num_siblings);
+		return -EINVAL;
+	}
+
+	siblings = kmalloc_array(num_siblings * width,
+				 sizeof(*siblings),
+				 GFP_KERNEL);
+	if (!siblings)
+		return -ENOMEM;
+
+	/* Create contexts / engines */
+	for (i = 0; i < width; ++i) {
+		intel_engine_mask_t current_mask = 0;
+		struct i915_engine_class_instance prev_engine;
+
+		for (j = 0; j < num_siblings; ++j) {
+			struct i915_engine_class_instance ci;
+
+			n = i * num_siblings + j;
+			if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+				err = -EFAULT;
+				goto out_err;
+			}
+
+			siblings[n] =
+				intel_engine_lookup_user(i915, ci.engine_class,
+							 ci.engine_instance);
+			if (!siblings[n]) {
+				drm_dbg(&i915->drm,
+					"Invalid sibling[%d]: { class:%d, inst:%d }\n",
+					n, ci.engine_class, ci.engine_instance);
+				err = -EINVAL;
+				goto out_err;
+			}
+
+			if (n) {
+				if (prev_engine.engine_class !=
+				    ci.engine_class) {
+					drm_dbg(&i915->drm,
+						"Mismatched class %d, %d\n",
+						prev_engine.engine_class,
+						ci.engine_class);
+					err = -EINVAL;
+					goto out_err;
+				}
+			}
+
+			prev_engine = ci;
+			current_mask |= siblings[n]->logical_mask;
+		}
+
+		if (i > 0) {
+			if (current_mask != prev_mask << 1) {
+				drm_dbg(&i915->drm,
+					"Non contiguous logical mask 0x%x, 0x%x\n",
+					prev_mask, current_mask);
+				err = -EINVAL;
+				goto out_err;
+			}
+		}
+		prev_mask = current_mask;
+	}
+
+	set->engines[slot].type = I915_GEM_ENGINE_TYPE_PARALLEL;
+	set->engines[slot].num_siblings = num_siblings;
+	set->engines[slot].width = width;
+	set->engines[slot].siblings = siblings;
+
+	return 0;
+
+out_err:
+	kfree(siblings);
+
+	return err;
+}
+
 static const i915_user_extension_fn set_proto_ctx_engines_extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_proto_ctx_engines_balance,
 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_proto_ctx_engines_bond,
+	[I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT] =
+		set_proto_ctx_engines_parallel_submit,
 };
 
 static int set_proto_ctx_engines(struct drm_i915_file_private *fpriv,
@@ -938,7 +1078,7 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
 
 	e = alloc_engines(num_engines);
 	for (n = 0; n < num_engines; n++) {
-		struct intel_context *ce;
+		struct intel_context *ce, *child;
 		int ret;
 
 		switch (pe[n].type) {
@@ -948,7 +1088,13 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
 
 		case I915_GEM_ENGINE_TYPE_BALANCED:
 			ce = intel_engine_create_virtual(pe[n].siblings,
-							 pe[n].num_siblings);
+							 pe[n].num_siblings, 0);
+			break;
+
+		case I915_GEM_ENGINE_TYPE_PARALLEL:
+			ce = intel_engine_create_parallel(pe[n].siblings,
+							  pe[n].num_siblings,
+							  pe[n].width);
 			break;
 
 		case I915_GEM_ENGINE_TYPE_INVALID:
@@ -969,6 +1115,13 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
 			err = ERR_PTR(ret);
 			goto free_engines;
 		}
+		for_each_child(ce, child) {
+			ret = intel_context_set_gem(child, ctx, pe->sseu);
+			if (ret) {
+				err = ERR_PTR(ret);
+				goto free_engines;
+			}
+		}
 	}
 	e->num_engines = num_engines;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 94c03a97cb77..7b096d83bca1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -78,6 +78,9 @@ enum i915_gem_engine_type {
 
 	/** @I915_GEM_ENGINE_TYPE_BALANCED: A load-balanced engine set */
 	I915_GEM_ENGINE_TYPE_BALANCED,
+
+	/** @I915_GEM_ENGINE_TYPE_PARALLEL: A parallel engine set */
+	I915_GEM_ENGINE_TYPE_PARALLEL,
 };
 
 /**
@@ -108,6 +111,9 @@ struct i915_gem_proto_engine {
 	/** @num_siblings: Number of balanced siblings */
 	unsigned int num_siblings;
 
+	/** @width: Width of each sibling */
+	unsigned int width;
+
 	/** @siblings: Balanced siblings */
 	struct intel_engine_cs **siblings;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index f4fc81f64921..9cdbea752014 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -55,9 +55,13 @@ struct intel_context_ops {
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
 
-	/* virtual engine/context interface */
+	/* virtual/parallel engine/context interface */
 	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
-						unsigned int count);
+						unsigned int count,
+						unsigned long flags);
+	struct intel_context *(*create_parallel)(struct intel_engine_cs **engines,
+						 unsigned int num_siblings,
+						 unsigned int width);
 	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
 					       unsigned int sibling);
 };
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 87579affb952..43f16a8347ee 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -279,9 +279,19 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 	return intel_engine_has_preemption(engine);
 }
 
+#define FORCE_VIRTUAL	BIT(0)
 struct intel_context *
 intel_engine_create_virtual(struct intel_engine_cs **siblings,
-			    unsigned int count);
+			    unsigned int count, unsigned long flags);
+
+static inline struct intel_context *
+intel_engine_create_parallel(struct intel_engine_cs **engines,
+			     unsigned int num_engines,
+			     unsigned int width)
+{
+	GEM_BUG_ON(!engines[0]->cops->create_parallel);
+	return engines[0]->cops->create_parallel(engines, num_engines, width);
+}
 
 static inline bool
 intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4d790f9a65dd..f66c75c77584 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1923,16 +1923,16 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
 
 struct intel_context *
 intel_engine_create_virtual(struct intel_engine_cs **siblings,
-			    unsigned int count)
+			    unsigned int count, unsigned long flags)
 {
 	if (count == 0)
 		return ERR_PTR(-EINVAL);
 
-	if (count == 1)
+	if (count == 1 && !(flags & FORCE_VIRTUAL))
 		return intel_context_create(siblings[0]);
 
 	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
-	return siblings[0]->cops->create_virtual(siblings, count);
+	return siblings[0]->cops->create_virtual(siblings, count, flags);
 }
 
 struct i915_request *
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index fc74ca28f245..769480e026bb 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -201,7 +201,8 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
 }
 
 static struct intel_context *
-execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
+			 unsigned long flags);
 
 static struct i915_request *
 __active_request(const struct intel_timeline * const tl,
@@ -3785,7 +3786,8 @@ static void virtual_submit_request(struct i915_request *rq)
 }
 
 static struct intel_context *
-execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
+			 unsigned long flags)
 {
 	struct virtual_engine *ve;
 	unsigned int n;
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index f12ffe797639..e876a9d88a5c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3733,7 +3733,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
 
 	for (n = 0; n < nctx; n++) {
-		ve[n] = intel_engine_create_virtual(siblings, nsibling);
+		ve[n] = intel_engine_create_virtual(siblings, nsibling, 0);
 		if (IS_ERR(ve[n])) {
 			err = PTR_ERR(ve[n]);
 			nctx = n;
@@ -3929,7 +3929,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
 	 * restrict it to our desired engine within the virtual engine.
 	 */
 
-	ve = intel_engine_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling, 0);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_close;
@@ -4060,7 +4060,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
 		i915_request_add(rq);
 	}
 
-	ce = intel_engine_create_virtual(siblings, nsibling);
+	ce = intel_engine_create_virtual(siblings, nsibling, 0);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out;
@@ -4112,7 +4112,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
 
 	/* XXX We do not handle oversubscription and fairness with normal rq */
 	for (n = 0; n < nsibling; n++) {
-		ce = intel_engine_create_virtual(siblings, nsibling);
+		ce = intel_engine_create_virtual(siblings, nsibling, 0);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
 			goto out;
@@ -4214,7 +4214,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
 	if (err)
 		goto out_scratch;
 
-	ve = intel_engine_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling, 0);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_scratch;
@@ -4354,7 +4354,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	ve = intel_engine_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling, 0);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_spin;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 44a7582c9aed..89528624710a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -82,7 +82,8 @@
  */
 
 static struct intel_context *
-guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
+		   unsigned long flags);
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
@@ -2514,8 +2515,6 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	__guc_context_post_unpin(ce);
 }
 
-/* Future patches will use this function */
-__maybe_unused
 static int guc_parent_context_pre_pin(struct intel_context *ce,
 				      struct i915_gem_ww_ctx *ww)
 {
@@ -2559,8 +2558,6 @@ static int guc_parent_context_pre_pin(struct intel_context *ce,
 	return err;
 }
 
-/* Future patches will use this function */
-__maybe_unused
 static void guc_parent_context_post_unpin(struct intel_context *ce)
 {
 	struct intel_context *child;
@@ -2576,8 +2573,6 @@ static void guc_parent_context_post_unpin(struct intel_context *ce)
 	}
 }
 
-/* Future patches will use this function */
-__maybe_unused
 static int guc_parent_context_pin(struct intel_context *ce)
 {
 	int ret, i = 0, j = 0;
@@ -2623,8 +2618,6 @@ static int guc_parent_context_pin(struct intel_context *ce)
 	return ret;
 }
 
-/* Future patches will use this function */
-__maybe_unused
 static void guc_parent_context_unpin(struct intel_context *ce)
 {
 	struct intel_context *child;
@@ -3048,8 +3041,6 @@ static void destroy_worker_func(struct work_struct *w)
 		intel_gt_pm_unpark_work_add(gt, destroy_worker);
 }
 
-/* Future patches will use this function */
-__maybe_unused
 static void guc_child_context_destroy(struct kref *kref)
 {
 	__guc_context_destroy(container_of(kref, struct intel_context, ref));
@@ -3272,6 +3263,11 @@ static void remove_from_context(struct i915_request *rq)
 	i915_request_notify_execute_cb_imm(rq);
 }
 
+static struct intel_context *
+guc_create_parallel(struct intel_engine_cs **engines,
+		    unsigned int num_siblings,
+		    unsigned int width);
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
@@ -3293,6 +3289,7 @@ static const struct intel_context_ops guc_context_ops = {
 	.destroy = guc_context_destroy,
 
 	.create_virtual = guc_create_virtual,
+	.create_parallel = guc_create_parallel,
 };
 
 static void __guc_signal_context_fence(struct intel_context *ce)
@@ -3782,6 +3779,91 @@ static void guc_retire_inflight_request_prio(struct i915_request *rq)
 	spin_unlock(&ce->guc_active.lock);
 }
 
+static const struct intel_context_ops virtual_parent_context_ops = {
+	.alloc = guc_virtual_context_alloc,
+
+	.pre_pin = guc_parent_context_pre_pin,
+	.pin = guc_parent_context_pin,
+	.unpin = guc_parent_context_unpin,
+	.post_unpin = guc_parent_context_post_unpin,
+
+	.ban = guc_context_ban,
+
+	.enter = guc_virtual_context_enter,
+	.exit = guc_virtual_context_exit,
+
+	.sched_disable = guc_context_sched_disable,
+
+	.destroy = guc_context_destroy,
+
+	.get_sibling = guc_virtual_get_sibling,
+};
+
+static const struct intel_context_ops virtual_child_context_ops = {
+	.alloc = guc_virtual_context_alloc,
+
+	.enter = guc_virtual_context_enter,
+	.exit = guc_virtual_context_exit,
+
+	.destroy = guc_child_context_destroy,
+};
+
+static struct intel_context *
+guc_create_parallel(struct intel_engine_cs **engines,
+		    unsigned int num_siblings,
+		    unsigned int width)
+{
+	struct intel_engine_cs **siblings = NULL;
+	struct intel_context *parent = NULL, *ce, *err;
+	int i, j;
+	int ret;
+
+	siblings = kmalloc_array(num_siblings,
+				 sizeof(*siblings),
+				 GFP_KERNEL);
+	if (!siblings)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < width; ++i) {
+		for (j = 0; j < num_siblings; ++j)
+			siblings[j] = engines[i * num_siblings + j];
+
+		ce = intel_engine_create_virtual(siblings, num_siblings,
+						 FORCE_VIRTUAL);
+		if (!ce) {
+			err = ERR_PTR(-ENOMEM);
+			goto unwind;
+		}
+
+		if (i == 0) {
+			parent = ce;
+		} else {
+			intel_context_bind_parent_child(parent, ce);
+			ret = intel_context_alloc_state(ce);
+			if (ret) {
+				err = ERR_PTR(ret);
+				goto unwind;
+			}
+		}
+	}
+
+	parent->ops = &virtual_parent_context_ops;
+	for_each_child(parent, ce)
+		ce->ops = &virtual_child_context_ops;
+
+	kfree(siblings);
+	return parent;
+
+unwind:
+	if (parent) {
+		for_each_child(parent, ce)
+			intel_context_put(ce);
+		intel_context_put(parent);
+	}
+	kfree(siblings);
+	return err;
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -4578,7 +4660,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 }
 
 static struct intel_context *
-guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count,
+		   unsigned long flags)
 {
 	struct guc_virtual_engine *ve;
 	struct intel_guc *guc;
@@ -4591,7 +4674,9 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 		return ERR_PTR(-ENOMEM);
 
 	guc = &siblings[0]->gt->uc.guc;
-	sched_engine = guc_to_sched_engine(guc, GUC_SUBMIT_ENGINE_SINGLE_LRC);
+	sched_engine = guc_to_sched_engine(guc, (flags & FORCE_VIRTUAL) ?
+					   GUC_SUBMIT_ENGINE_MULTI_LRC :
+					   GUC_SUBMIT_ENGINE_SINGLE_LRC);
 
 	ve->base.i915 = siblings[0]->i915;
 	ve->base.gt = siblings[0]->gt;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ef72e07fe08c..a16f0f8908de 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1821,6 +1821,7 @@ struct drm_i915_gem_context_param {
  * Extensions:
  *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
  *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
+ *   i915_context_engines_parallel_submit (I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)
  */
 #define I915_CONTEXT_PARAM_ENGINES	0xa
 
@@ -2046,6 +2047,132 @@ struct i915_context_engines_bond {
 	struct i915_engine_class_instance engines[N__]; \
 } __attribute__((packed)) name__
 
+/**
+ * struct i915_context_engines_parallel_submit - Configure engine for
+ * parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * interface.
+ *
+ * .. code-block:: none
+ *
+ *	Example 1 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=1,
+ *		     engines=CS[0],CS[1])
+ *
+ *	Results in the following valid placement:
+ *	CS[0], CS[1]
+ *
+ *	Example 2 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ *	Results in the following valid placements:
+ *	CS[0], CS[1]
+ *	CS[2], CS[3]
+ *
+ *	This can also be thought of as 2 virtual engines described by 2-D array
+ *	in the engines the field with bonds placed between each index of the
+ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ *	CS[3].
+ *	VE[0] = CS[0], CS[2]
+ *	VE[1] = CS[1], CS[3]
+ *
+ *	Example 3 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ *	Results in the following valid and invalid placements:
+ *	CS[0], CS[1]
+ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct i915_context_engines_parallel_submit {
+	/**
+	 * @base: base user extension.
+	 */
+	struct i915_user_extension base;
+
+	/**
+	 * @engine_index: slot for parallel engine
+	 */
+	__u16 engine_index;
+
+	/**
+	 * @width: number of contexts per parallel engine
+	 */
+	__u16 width;
+
+	/**
+	 * @num_siblings: number of siblings per context
+	 */
+	__u16 num_siblings;
+
+	/**
+	 * @mbz16: reserved for future use; must be zero
+	 */
+	__u16 mbz16;
+
+	/**
+	 * @flags: all undefined flags must be zero, currently not defined flags
+	 */
+	__u64 flags;
+
+	/**
+	 * @mbz64: reserved for future use; must be zero
+	 */
+	__u64 mbz64[3];
+
+	/**
+	 * @engines: 2-d array of engine instances to configure parallel engine
+	 *
+	 * length = width (i) * num_siblings (j)
+	 * index = j + i * num_siblings
+	 */
+	struct i915_engine_class_instance engines[0];
+
+} __packed;
+
+#define I915_DEFINE_CONTEXT_ENGINES_PARALLEL_SUBMIT(name__, N__) struct { \
+	struct i915_user_extension base; \
+	__u16 engine_index; \
+	__u16 width; \
+	__u16 num_siblings; \
+	__u16 mbz16; \
+	__u64 flags; \
+	__u64 mbz64[3]; \
+	struct i915_engine_class_instance engines[N__]; \
+} __attribute__((packed)) name__
+
 /**
  * DOC: Context Engine Map uAPI
  *
@@ -2105,6 +2232,7 @@ struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see i915_context_engines_load_balance */
 #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
 	struct i915_engine_class_instance engines[0];
 } __attribute__((packed));
 

From patchwork Tue Aug  3 22:29:24 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417569
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5B23AC4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:20 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 3229660184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:20 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3229660184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 4EC386E918;
	Tue,  3 Aug 2021 22:12:08 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 0AFAC6E147;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393473"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393473"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512725"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 27/46] drm/i915/doc: Update parallel submit doc to point to
 i915_drm.h
Date: Tue,  3 Aug 2021 15:29:24 -0700
Message-Id: <20210803222943.27686-28-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Update parallel submit doc to point to i915_drm.h

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 Documentation/gpu/rfc/i915_parallel_execbuf.h | 122 ------------------
 Documentation/gpu/rfc/i915_scheduler.rst      |   4 +-
 2 files changed, 2 insertions(+), 124 deletions(-)
 delete mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h b/Documentation/gpu/rfc/i915_parallel_execbuf.h
deleted file mode 100644
index 8cbe2c4e0172..000000000000
--- a/Documentation/gpu/rfc/i915_parallel_execbuf.h
+++ /dev/null
@@ -1,122 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2021 Intel Corporation
- */
-
-#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
-
-/**
- * struct drm_i915_context_engines_parallel_submit - Configure engine for
- * parallel submission.
- *
- * Setup a slot in the context engine map to allow multiple BBs to be submitted
- * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
- * in parallel. Multiple hardware contexts are created internally in the i915
- * run these BBs. Once a slot is configured for N BBs only N BBs can be
- * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
- * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
- * many BBs there are based on the slot's configuration. The N BBs are the last
- * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
- *
- * The default placement behavior is to create implicit bonds between each
- * context if each context maps to more than 1 physical engine (e.g. context is
- * a virtual engine). Also we only allow contexts of same engine class and these
- * contexts must be in logically contiguous order. Examples of the placement
- * behavior described below. Lastly, the default is to not allow BBs to
- * preempted mid BB rather insert coordinated preemption on all hardware
- * contexts between each set of BBs. Flags may be added in the future to change
- * both of these default behaviors.
- *
- * Returns -EINVAL if hardware context placement configuration is invalid or if
- * the placement configuration isn't supported on the platform / submission
- * interface.
- * Returns -ENODEV if extension isn't supported on the platform / submission
- * interface.
- *
- * .. code-block:: none
- *
- *	Example 1 pseudo code:
- *	CS[X] = generic engine of same class, logical instance X
- *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- *	set_engines(INVALID)
- *	set_parallel(engine_index=0, width=2, num_siblings=1,
- *		     engines=CS[0],CS[1])
- *
- *	Results in the following valid placement:
- *	CS[0], CS[1]
- *
- *	Example 2 pseudo code:
- *	CS[X] = generic engine of same class, logical instance X
- *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- *	set_engines(INVALID)
- *	set_parallel(engine_index=0, width=2, num_siblings=2,
- *		     engines=CS[0],CS[2],CS[1],CS[3])
- *
- *	Results in the following valid placements:
- *	CS[0], CS[1]
- *	CS[2], CS[3]
- *
- *	This can also be thought of as 2 virtual engines described by 2-D array
- *	in the engines the field with bonds placed between each index of the
- *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
- *	CS[3].
- *	VE[0] = CS[0], CS[2]
- *	VE[1] = CS[1], CS[3]
- *
- *	Example 3 pseudo code:
- *	CS[X] = generic engine of same class, logical instance X
- *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- *	set_engines(INVALID)
- *	set_parallel(engine_index=0, width=2, num_siblings=2,
- *		     engines=CS[0],CS[1],CS[1],CS[3])
- *
- *	Results in the following valid and invalid placements:
- *	CS[0], CS[1]
- *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
- */
-struct drm_i915_context_engines_parallel_submit {
-	/**
-	 * @base: base user extension.
-	 */
-	struct i915_user_extension base;
-
-	/**
-	 * @engine_index: slot for parallel engine
-	 */
-	__u16 engine_index;
-
-	/**
-	 * @width: number of contexts per parallel engine
-	 */
-	__u16 width;
-
-	/**
-	 * @num_siblings: number of siblings per context
-	 */
-	__u16 num_siblings;
-
-	/**
-	 * @mbz16: reserved for future use; must be zero
-	 */
-	__u16 mbz16;
-
-	/**
-	 * @flags: all undefined flags must be zero, currently not defined flags
-	 */
-	__u64 flags;
-
-	/**
-	 * @mbz64: reserved for future use; must be zero
-	 */
-	__u64 mbz64[3];
-
-	/**
-	 * @engines: 2-d array of engine instances to configure parallel engine
-	 *
-	 * length = width (i) * num_siblings (j)
-	 * index = j + i * num_siblings
-	 */
-	struct i915_engine_class_instance engines[0];
-
-} __packed;
-
diff --git a/Documentation/gpu/rfc/i915_scheduler.rst b/Documentation/gpu/rfc/i915_scheduler.rst
index cbda75065dad..d630f15ab795 100644
--- a/Documentation/gpu/rfc/i915_scheduler.rst
+++ b/Documentation/gpu/rfc/i915_scheduler.rst
@@ -135,8 +135,8 @@ Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
 drm_i915_context_engines_parallel_submit to the uAPI to implement this
 extension.
 
-.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h
-        :functions: drm_i915_context_engines_parallel_submit
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+        :functions: i915_context_engines_parallel_submit
 
 Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
 -------------------------------------------------------------------

From patchwork Tue Aug  3 22:29:25 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417557
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E2E96C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:07 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id A98F661037
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:07 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A98F661037
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 7C3456E94F;
	Tue,  3 Aug 2021 22:12:06 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 4A2EC6E8F8;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393474"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393474"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512726"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 28/46] drm/i915/guc: Add basic GuC multi-lrc selftest
Date: Tue,  3 Aug 2021 15:29:25 -0700
Message-Id: <20210803222943.27686-29-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Add very basic (single submission) multi-lrc selftest.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   1 +
 .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 168 ++++++++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 3 files changed, 170 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 89528624710a..bc6cb9adca92 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4772,5 +4772,6 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftest_guc_multi_lrc.c"
 #include "selftest_guc_flow_control.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
new file mode 100644
index 000000000000..82eb666bba51
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright �� 2019 Intel Corporation
+ */
+
+#include "selftests/igt_spinner.h"
+#include "selftests/igt_reset.h"
+#include "selftests/intel_scheduler_helpers.h"
+#include "gt/intel_engine_heartbeat.h"
+#include "gem/selftests/mock_context.h"
+
+static void logical_sort(struct intel_engine_cs **engines, int num_engines)
+{
+	struct intel_engine_cs *sorted[MAX_ENGINE_INSTANCE + 1];
+	int i, j;
+
+	for (i = 0; i < num_engines; ++i)
+		for (j = 0; j < MAX_ENGINE_INSTANCE + 1; ++j) {
+			if (engines[j]->logical_mask & BIT(i)) {
+				sorted[i] = engines[j];
+				break;
+			}
+		}
+
+	memcpy(*engines, *sorted,
+	       sizeof(struct intel_engine_cs *) * num_engines);
+}
+
+static struct intel_context *
+multi_lrc_create_parent(struct intel_gt *gt, u8 class,
+			unsigned long flags)
+{
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int i = 0;
+
+	for_each_engine(engine, gt, id) {
+		if (engine->class != class)
+			continue;
+
+		siblings[i++] = engine;
+	}
+
+	if (i <= 1)
+		return ERR_PTR(0);
+
+	logical_sort(siblings, i);
+
+	return intel_engine_create_parallel(siblings, 1, i);
+}
+
+static void multi_lrc_context_put(struct intel_context *ce)
+{
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	/*
+	 * Only the parent gets the creation ref put in the uAPI, the parent
+	 * itself is responsible for creation ref put on the children.
+	 */
+	intel_context_put(ce);
+}
+
+static struct i915_request *
+multi_lrc_nop_request(struct intel_context *ce)
+{
+	struct intel_context *child;
+	struct i915_request *rq, *child_rq;
+	int i = 0;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	rq = intel_context_create_request(ce);
+	if (IS_ERR(rq))
+		return rq;
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	for_each_child(ce, child) {
+		child_rq = intel_context_create_request(child);
+		if (IS_ERR(child_rq))
+			goto child_error;
+
+		if (++i == ce->guc_number_children)
+			set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+				&child_rq->fence.flags);
+		i915_request_add(child_rq);
+	}
+
+	return rq;
+
+child_error:
+	i915_request_put(rq);
+
+	return ERR_PTR(-ENOMEM);
+}
+
+static int __intel_guc_multi_lrc_basic(struct intel_gt *gt, unsigned int class)
+{
+	struct intel_context *parent;
+	struct i915_request *rq;
+	int ret;
+
+	parent = multi_lrc_create_parent(gt, class, 0);
+	if (IS_ERR(parent)) {
+		pr_err("Failed creating contexts: %ld", PTR_ERR(parent));
+		return PTR_ERR(parent);
+	} else if (!parent) {
+		pr_debug("Not enough engines in class: %d",
+			 VIDEO_DECODE_CLASS);
+		return 0;
+	}
+
+	rq = multi_lrc_nop_request(parent);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		pr_err("Failed creating requests: %d", ret);
+		goto out;
+	}
+
+	ret = intel_selftest_wait_for_rq(rq);
+	if (ret)
+		pr_err("Failed waiting on request: %d", ret);
+
+	i915_request_put(rq);
+
+	if (ret >= 0) {
+		ret = intel_gt_wait_for_idle(gt, HZ * 5);
+		if (ret < 0)
+			pr_err("GT failed to idle: %d\n", ret);
+	}
+
+out:
+	multi_lrc_context_put(parent);
+	return ret;
+}
+
+static int intel_guc_multi_lrc_basic(void *arg)
+{
+	struct intel_gt *gt = arg;
+	unsigned int class;
+	int ret;
+
+	for (class = 0; class < MAX_ENGINE_CLASS + 1; ++class) {
+		ret = __intel_guc_multi_lrc_basic(gt, class);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int intel_guc_multi_lrc(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(intel_guc_multi_lrc_basic),
+	};
+	struct intel_gt *gt = &i915->gt;
+
+	if (intel_gt_is_wedged(gt))
+		return 0;
+
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return 0;
+
+	return intel_gt_live_subtests(tests, gt);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index d9bd732b741a..2ddb72bbab69 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -48,5 +48,6 @@ selftest(execlists, intel_execlists_live_selftests)
 selftest(ring_submission, intel_ring_submission_live_selftests)
 selftest(perf, i915_perf_live_selftests)
 selftest(guc_flow_control, intel_guc_flow_control)
+selftest(guc_multi_lrc, intel_guc_multi_lrc)
 /* Here be dragons: keep last to run last! */
 selftest(late_gt_pm, intel_gt_pm_late_selftests)

From patchwork Tue Aug  3 22:29:26 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417551
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F0F2AC432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:02 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id B15CC60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:02 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B15CC60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 3F37C6E923;
	Tue,  3 Aug 2021 22:12:06 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id ADA3B6E17E;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393475"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393475"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512727"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 29/46] drm/i915/guc: Extend GuC flow control selftest for
 multi-lrc
Date: Tue,  3 Aug 2021 15:29:26 -0700
Message-Id: <20210803222943.27686-30-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Prove multi-lrc and single-lrc are independent.
Prove multi-lrc guc_ids flow control works.
Prove multi-lrc hanging the tastlet can recover from a GPU reset.

Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../i915/gt/uc/selftest_guc_flow_control.c    | 299 ++++++++++++++++++
 .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   |  15 +-
 2 files changed, 312 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
index f31ab2674b2b..9cfecf9d368e 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
@@ -110,6 +110,65 @@ static int nop_request_wait(struct intel_engine_cs *engine, bool kernel,
 	return ret;
 }
 
+static int multi_lrc_not_blocked(struct intel_gt *gt, bool flow_control)
+{
+	struct intel_guc *guc = &gt->uc.guc;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	struct guc_submit_engine *gse = guc->gse[GUC_SUBMIT_ENGINE_MULTI_LRC];
+	unsigned int reset_count = i915_reset_count(global);
+	u64 tasklets_submit_count = gse->tasklets_submit_count;
+	struct intel_context *parent;
+	struct i915_request *rq;
+	int ret;
+
+	parent = multi_lrc_create_parent(gt, VIDEO_DECODE_CLASS, 0);
+	if (IS_ERR(parent)) {
+		pr_err("Failed creating multi-lrc contexts: %ld",
+		       PTR_ERR(parent));
+		return PTR_ERR(parent);
+	} else if (!parent) {
+		pr_debug("Not enough engines in class: %d",
+			 VIDEO_DECODE_CLASS);
+		return 0;
+	}
+
+	rq = multi_lrc_nop_request(parent, NULL);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		pr_err("Failed creating multi-lrc requests: %d", ret);
+		goto out;
+	}
+
+	ret = intel_selftest_wait_for_rq(rq);
+	if (ret)
+		pr_err("Failed waiting on multi-lrc request: %d", ret);
+
+	i915_request_put(rq);
+	if (ret)
+		goto out;
+
+	if (!flow_control &&
+	    gse->tasklets_submit_count != tasklets_submit_count) {
+		pr_err("Flow control for multi-lrc unexpectedly kicked in\n");
+		ret = -EINVAL;
+	}
+
+	if (flow_control &&
+	    gse->tasklets_submit_count == tasklets_submit_count) {
+		pr_err("Flow control for multi-lrc did not kick in\n");
+		ret = -EINVAL;
+	}
+
+	if (i915_reset_count(global) != reset_count) {
+		pr_err("Unexpected GPU reset during multi-lrc submit\n");
+		ret = -EINVAL;
+	}
+
+out:
+	multi_lrc_context_put(parent);
+	return ret;
+}
+
 #define NUM_GUC_ID		256
 #define NUM_CONTEXT		1024
 #define NUM_RQ_PER_CONTEXT	2
@@ -240,6 +299,13 @@ static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang
 		goto err_spin_rq;
 	}
 
+	/* Ensure Multi-LRC not blocked */
+	ret = multi_lrc_not_blocked(gt, !limit_guc_ids);
+	if (ret < 0) {
+		pr_err("Multi-lrc can't make progress: %d\n", ret);
+		goto err_spin_rq;
+	}
+
 	/* Inject hang in flow control state machine */
 	if (hang) {
 		guc->gse_hang_expected = true;
@@ -559,6 +625,237 @@ static int intel_guc_flow_control_bad_desc_h2g(void *arg)
 	return __intel_guc_flow_control_deadlock_h2g(arg, true);
 }
 
+#define NUM_CONTEXT_MULTI_LRC	256
+
+static int
+__intel_guc_flow_control_multi_lrc_guc(void *arg, bool limit_guc_ids, bool hang)
+{
+	struct intel_gt *gt = arg;
+	struct intel_guc *guc = &gt->uc.guc;
+	struct guc_submit_engine *gse = guc->gse[GUC_SUBMIT_ENGINE_MULTI_LRC];
+	struct intel_context **contexts;
+	int ret = 0;
+	int i, j, k;
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *spin_rq, *last = NULL;
+	intel_wakeref_t wakeref;
+	struct intel_engine_cs *engine;
+	struct i915_gpu_error *global = &gt->i915->gpu_error;
+	unsigned int reset_count;
+	u64 tasklets_submit_count = gse->tasklets_submit_count;
+	u32 old_beat;
+
+	if (limit_guc_ids)
+		guc->num_guc_ids = NUM_GUC_ID;
+
+	contexts = kmalloc(sizeof(*contexts) * NUM_CONTEXT, GFP_KERNEL);
+	if (!contexts) {
+		pr_err("Context array allocation failed\n");
+		return -ENOMEM;
+	}
+
+	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
+
+	ce = intel_context_create(intel_selftest_find_any_engine(gt));
+	if (IS_ERR(ce)) {
+		ret = PTR_ERR(ce);
+		pr_err("Failed to create context: %d\n", ret);
+		goto err;
+	}
+
+	reset_count = i915_reset_count(global);
+	engine = ce->engine;
+
+	old_beat = engine->props.heartbeat_interval_ms;
+	if (hang) {
+		ret = intel_engine_set_heartbeat(engine, HEARTBEAT_INTERVAL);
+		if (ret) {
+			pr_err("Failed to boost heatbeat interval: %d\n", ret);
+			intel_context_put(ce);
+			goto err;
+		}
+	}
+
+	/* Create spinner to block requests in below loop */
+	ret = igt_spinner_init(&spin, engine->gt);
+	if (ret) {
+		pr_err("Failed to create spinner: %d\n", ret);
+		intel_context_put(ce);
+		goto err_heartbeat;
+	}
+	spin_rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	intel_context_put(ce);
+	if (IS_ERR(spin_rq)) {
+		ret = PTR_ERR(spin_rq);
+		pr_err("Failed to create spinner request: %d\n", ret);
+		goto err_spin_rq;
+	}
+	ret = __request_add_spin(spin_rq, &spin);
+	if (ret) {
+		pr_err("Failed to add Spinner request: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	for (i = 0; i < NUM_RQ_PER_CONTEXT; ++i) {
+		for (j = 0; j < NUM_CONTEXT_MULTI_LRC; ++j) {
+			for (k = 0; k < NUM_RQ_PER_CONTEXT; ++k) {
+				bool first_pass = !i && !k;
+
+				if (last)
+					i915_request_put(last);
+				last = NULL;
+				if (first_pass)
+					contexts[j] = multi_lrc_create_parent(gt, VIDEO_DECODE_CLASS, 0);
+				ce = contexts[j];
+
+				if (IS_ERR(ce)) {
+					ret = PTR_ERR(ce);
+					pr_err("Failed to create context: %d\n", ret);
+					goto err_spin_rq;
+				} else if (!ce) {
+					ret = 0;
+					goto err_spin_rq;
+				}
+
+				last = multi_lrc_nop_request(ce, spin_rq);
+				if (first_pass)
+					multi_lrc_context_put(ce);
+				if (IS_ERR(last)) {
+					ret = PTR_ERR(last);
+					pr_err("Failed to create request: %d\n", ret);
+					goto err_spin_rq;
+				}
+			}
+		}
+	}
+
+	/* Verify GuC submit engine state */
+	if (limit_guc_ids && !guc_ids_exhausted(gse)) {
+		pr_err("guc_ids not exhausted\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+	if (!limit_guc_ids && guc_ids_exhausted(gse)) {
+		pr_err("guc_ids exhausted\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+
+	/* Ensure no DoS from unready requests */
+	ret = multi_lrc_not_blocked(gt, true);
+	if (ret < 0) {
+		pr_err("Multi-lrc DoS: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Ensure Single-LRC not blocked, not in flow control */
+	ret = nop_request_wait(engine, false, !limit_guc_ids);
+	if (ret < 0) {
+		pr_err("User NOP request DoS: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Inject hang in flow control state machine */
+	if (hang) {
+		guc->gse_hang_expected = true;
+		guc->inject_bad_sched_disable = true;
+	}
+
+	/* Release blocked requests */
+	igt_spinner_end(&spin);
+	ret = intel_selftest_wait_for_rq(spin_rq);
+	if (ret) {
+		pr_err("Spin request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	i915_request_put(spin_rq);
+	igt_spinner_fini(&spin);
+	spin_rq = NULL;
+
+	/* Wait for last request / GT to idle */
+	ret = i915_request_wait(last, 0, hang ? HZ * 30 : HZ * 5);
+	if (ret < 0) {
+		pr_err("Last request failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	i915_request_put(last);
+	last = NULL;
+	ret = intel_gt_wait_for_idle(gt, HZ * 5);
+	if (ret < 0) {
+		pr_err("GT failed to idle: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+	/* Check state after idle */
+	if (guc_ids_exhausted(gse)) {
+		pr_err("guc_ids exhausted after last request signaled\n");
+		ret = -EINVAL;
+		goto err_spin_rq;
+	}
+	if (hang) {
+		if (i915_reset_count(global) == reset_count) {
+			pr_err("Failed to record a GPU reset\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+	} else {
+		if (i915_reset_count(global) != reset_count) {
+			pr_err("Unexpected GPU reset\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+		if (gse->tasklets_submit_count == tasklets_submit_count) {
+			pr_err("Flow control failed to kick in\n");
+			ret = -EINVAL;
+			goto err_spin_rq;
+		}
+	}
+
+	/* Verify requests can be submitted after flow control */
+	ret = nop_request_wait(engine, true, false);
+	if (ret < 0) {
+		pr_err("Kernel NOP failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+	ret = nop_request_wait(engine, false, false);
+	if (ret < 0) {
+		pr_err("User NOP failed to complete: %d\n", ret);
+		goto err_spin_rq;
+	}
+
+err_spin_rq:
+	if (spin_rq) {
+		igt_spinner_end(&spin);
+		intel_selftest_wait_for_rq(spin_rq);
+		i915_request_put(spin_rq);
+		igt_spinner_fini(&spin);
+		intel_gt_wait_for_idle(gt, HZ * 5);
+	}
+err_heartbeat:
+	if (last)
+		i915_request_put(last);
+	intel_engine_set_heartbeat(engine, old_beat);
+err:
+	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
+	guc->num_guc_ids = guc->max_guc_ids;
+	guc->gse_hang_expected = false;
+	guc->inject_bad_sched_disable = false;
+	kfree(contexts);
+
+	return ret;
+}
+
+static int intel_guc_flow_control_multi_lrc_guc_ids(void *arg)
+{
+	return __intel_guc_flow_control_multi_lrc_guc(arg, true, false);
+}
+
+static int intel_guc_flow_control_multi_lrc_hang(void *arg)
+{
+	return __intel_guc_flow_control_multi_lrc_guc(arg, true, true);
+}
+
 int intel_guc_flow_control(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -566,6 +863,8 @@ int intel_guc_flow_control(struct drm_i915_private *i915)
 		SUBTEST(intel_guc_flow_control_guc_ids),
 		SUBTEST(intel_guc_flow_control_lrcd_reg),
 		SUBTEST(intel_guc_flow_control_hang_state_machine),
+		SUBTEST(intel_guc_flow_control_multi_lrc_guc_ids),
+		SUBTEST(intel_guc_flow_control_multi_lrc_hang),
 		SUBTEST(intel_guc_flow_control_deadlock_h2g),
 		SUBTEST(intel_guc_flow_control_bad_desc_h2g),
 	};
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
index 82eb666bba51..21b4a79778ef 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
@@ -62,11 +62,12 @@ static void multi_lrc_context_put(struct intel_context *ce)
 }
 
 static struct i915_request *
-multi_lrc_nop_request(struct intel_context *ce)
+multi_lrc_nop_request(struct intel_context *ce, struct i915_request *from)
 {
 	struct intel_context *child;
 	struct i915_request *rq, *child_rq;
 	int i = 0;
+	int ret;
 
 	GEM_BUG_ON(!intel_context_is_parent(ce));
 
@@ -74,6 +75,16 @@ multi_lrc_nop_request(struct intel_context *ce)
 	if (IS_ERR(rq))
 		return rq;
 
+	if (from) {
+		ret = i915_sw_fence_await_dma_fence(&rq->submit,
+						    &from->fence, 0,
+						    I915_FENCE_GFP);
+		if (ret < 0) {
+			i915_request_put(rq);
+			return ERR_PTR(ret);
+		}
+	}
+
 	i915_request_get(rq);
 	i915_request_add(rq);
 
@@ -112,7 +123,7 @@ static int __intel_guc_multi_lrc_basic(struct intel_gt *gt, unsigned int class)
 		return 0;
 	}
 
-	rq = multi_lrc_nop_request(parent);
+	rq = multi_lrc_nop_request(parent, NULL);
 	if (IS_ERR(rq)) {
 		ret = PTR_ERR(rq);
 		pr_err("Failed creating requests: %d", ret);

From patchwork Tue Aug  3 22:29:27 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417563
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 93423C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:12 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 60F9860EE8
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:12 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 60F9860EE8
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id C0D2D6E922;
	Tue,  3 Aug 2021 22:12:06 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 67D566E8FA;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393477"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393477"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512728"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 30/46] drm/i915/guc: Implement no mid batch preemption for
 multi-lrc
Date: Tue,  3 Aug 2021 15:29:27 -0700
Message-Id: <20210803222943.27686-31-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between to parent and child is needed. This is implemented via custom
emit_bb_start & emit_fini_breadcrumb functions and enabled via by
default if a context is configured by set parallel extension.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   2 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 283 +++++++++++++++++-
 4 files changed, 287 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index c327fd1c24c2..f396993374da 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -629,7 +629,7 @@ void intel_context_bind_parent_child(struct intel_context *parent,
 	GEM_BUG_ON(intel_context_is_child(child));
 	GEM_BUG_ON(intel_context_is_parent(child));
 
-	parent->guc_number_children++;
+	child->guc_child_index = parent->guc_number_children++;
 	list_add_tail(&child->guc_child_link,
 		      &parent->guc_child_list);
 	child->parent = parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 9cdbea752014..fdc4890335b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -222,6 +222,9 @@ struct intel_context {
 	/* Number of children if parent */
 	u8 guc_number_children;
 
+	/* Child index if child */
+	u8 guc_child_index;
+
 	u8 parent_page; /* if set, page num reserved for parent context */
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 6910a0cdb8c8..a2fa0e9b9559 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -181,7 +181,7 @@ struct guc_process_desc {
 	u32 wq_status;
 	u32 engine_presence;
 	u32 priority;
-	u32 reserved[30];
+	u32 reserved[36];
 } __packed;
 
 #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index bc6cb9adca92..d61c45d1ac2c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_gpu_commands.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
@@ -463,10 +464,14 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 
 /*
  * When using multi-lrc submission an extra page in the context state is
- * reserved for the process descriptor and work queue.
+ * reserved for the process descriptor, work queue, and preempt BB boundary
+ * handshake between the parent + childlren contexts.
  *
  * The layout of this page is below:
  * 0						guc_process_desc
+ * + sizeof(struct guc_process_desc)		child go
+ * + CACHELINE_BYTES				child join ...
+ * + CACHELINE_BYTES ...
  * ...						unused
  * PAGE_SIZE / 2				work queue start
  * ...						work queue
@@ -2185,6 +2190,30 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 	return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
+static inline void clear_children_join_go_memory(struct intel_context *ce)
+{
+	u32 *mem = (u32 *)(__get_process_desc(ce) + 1);
+	u8 i;
+
+	for (i = 0; i < ce->guc_number_children + 1; ++i)
+		mem[i * (CACHELINE_BYTES / sizeof(u32))] = 0;
+}
+
+static inline u32 get_children_go_value(struct intel_context *ce)
+{
+	u32 *mem = (u32 *)(__get_process_desc(ce) + 1);
+
+	return mem[0];
+}
+
+static inline u32 get_children_join_value(struct intel_context *ce,
+					  u8 child_index)
+{
+	u32 *mem = (u32 *)(__get_process_desc(ce) + 1);
+
+	return mem[(child_index + 1) * (CACHELINE_BYTES / sizeof(u32))];
+}
+
 static void guc_context_policy_init(struct intel_engine_cs *engine,
 				    struct guc_lrc_desc *desc)
 {
@@ -2380,6 +2409,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
 			guc_context_policy_init(engine, desc);
 		}
+
+		clear_children_join_go_memory(ce);
 	}
 
 	/*
@@ -3808,6 +3839,31 @@ static const struct intel_context_ops virtual_child_context_ops = {
 	.destroy = guc_child_context_destroy,
 };
 
+/*
+ * The below override of the breadcrumbs is enabled when the user configures a
+ * context for parallel submission (multi-lrc, parent-child).
+ *
+ * The overridden breadcrumbs implements an algorithm which allows the GuC to
+ * safely preempt all the hw contexts configured for parallel submission
+ * between each BB. The contract between the i915 and GuC is if the parent
+ * context can be preempted, all the children can be preempted, and the GuC will
+ * always try to preempt the parent before the children. A handshake between the
+ * parent / children breadcrumbs ensures the i915 holds up its end of the deal
+ * creating a window to preempt between each set of BBs.
+ */
+static int emit_bb_start_parent_no_preempt_mid_batch(struct i915_request *rq,
+						     u64 offset, u32 len,
+						     const unsigned int flags);
+static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
+						    u64 offset, u32 len,
+						    const unsigned int flags);
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+						 u32 *cs);
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+						u32 *cs);
+
 static struct intel_context *
 guc_create_parallel(struct intel_engine_cs **engines,
 		    unsigned int num_siblings,
@@ -3851,6 +3907,20 @@ guc_create_parallel(struct intel_engine_cs **engines,
 	for_each_child(parent, ce)
 		ce->ops = &virtual_child_context_ops;
 
+	parent->engine->emit_bb_start =
+		emit_bb_start_parent_no_preempt_mid_batch;
+	parent->engine->emit_fini_breadcrumb =
+		emit_fini_breadcrumb_parent_no_preempt_mid_batch;
+	parent->engine->emit_fini_breadcrumb_dw =
+		12 + 4 * parent->guc_number_children;
+	for_each_child(parent, ce) {
+		ce->engine->emit_bb_start =
+			emit_bb_start_child_no_preempt_mid_batch;
+		ce->engine->emit_fini_breadcrumb =
+			emit_fini_breadcrumb_child_no_preempt_mid_batch;
+		ce->engine->emit_fini_breadcrumb_dw = 16;
+	}
+
 	kfree(siblings);
 	return parent;
 
@@ -4212,6 +4282,204 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 	guc->submission_selected = __guc_submission_selected(guc);
 }
 
+static inline u32 get_children_go_addr(struct intel_context *ce)
+{
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	return i915_ggtt_offset(ce->state) +
+		__get_process_desc_offset(ce) +
+		sizeof(struct guc_process_desc);
+}
+
+static inline u32 get_children_join_addr(struct intel_context *ce,
+					 u8 child_index)
+{
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	return get_children_go_addr(ce) + (child_index + 1) * CACHELINE_BYTES;
+}
+
+#define PARENT_GO_BB			1
+#define PARENT_GO_FINI_BREADCRUMB	0
+#define CHILD_GO_BB			1
+#define CHILD_GO_FINI_BREADCRUMB	0
+static int emit_bb_start_parent_no_preempt_mid_batch(struct i915_request *rq,
+						     u64 offset, u32 len,
+						     const unsigned int flags)
+{
+	struct intel_context *ce = rq->context;
+	u32 *cs;
+	u8 i;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	cs = intel_ring_begin(rq, 10 + 4 * ce->guc_number_children);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/* Wait on chidlren */
+	for (i = 0; i < ce->guc_number_children; ++i) {
+		*cs++ = (MI_SEMAPHORE_WAIT |
+			 MI_SEMAPHORE_GLOBAL_GTT |
+			 MI_SEMAPHORE_POLL |
+			 MI_SEMAPHORE_SAD_EQ_SDD);
+		*cs++ = PARENT_GO_BB;
+		*cs++ = get_children_join_addr(ce, i);
+		*cs++ = 0;
+	}
+
+	/* Turn off preemption */
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+	*cs++ = MI_NOOP;
+
+	/* Tell children go */
+	cs = gen8_emit_ggtt_write(cs,
+				  CHILD_GO_BB,
+				  get_children_go_addr(ce),
+				  0);
+
+	/* Jump to batch */
+	*cs++ = MI_BATCH_BUFFER_START_GEN8 |
+		(flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
+						    u64 offset, u32 len,
+						    const unsigned int flags)
+{
+	struct intel_context *ce = rq->context;
+	u32 *cs;
+
+	GEM_BUG_ON(!intel_context_is_child(ce));
+
+	cs = intel_ring_begin(rq, 12);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/* Signal parent */
+	cs = gen8_emit_ggtt_write(cs,
+				  PARENT_GO_BB,
+				  get_children_join_addr(ce->parent,
+							 ce->guc_child_index),
+				  0);
+
+	/* Wait parent on for go */
+	*cs++ = (MI_SEMAPHORE_WAIT |
+		 MI_SEMAPHORE_GLOBAL_GTT |
+		 MI_SEMAPHORE_POLL |
+		 MI_SEMAPHORE_SAD_EQ_SDD);
+	*cs++ = CHILD_GO_BB;
+	*cs++ = get_children_go_addr(ce->parent);
+	*cs++ = 0;
+
+	/* Turn off preemption */
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+
+	/* Jump to batch */
+	*cs++ = MI_BATCH_BUFFER_START_GEN8 |
+		(flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+						 u32 *cs)
+{
+	struct intel_context *ce = rq->context;
+	u8 i;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	/* Wait on children */
+	for (i = 0; i < ce->guc_number_children; ++i) {
+		*cs++ = (MI_SEMAPHORE_WAIT |
+			 MI_SEMAPHORE_GLOBAL_GTT |
+			 MI_SEMAPHORE_POLL |
+			 MI_SEMAPHORE_SAD_EQ_SDD);
+		*cs++ = PARENT_GO_FINI_BREADCRUMB;
+		*cs++ = get_children_join_addr(ce, i);
+		*cs++ = 0;
+	}
+
+	/* Turn on preemption */
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	*cs++ = MI_NOOP;
+
+	/* Tell children go */
+	cs = gen8_emit_ggtt_write(cs,
+				  CHILD_GO_FINI_BREADCRUMB,
+				  get_children_go_addr(ce),
+				  0);
+
+	/* Emit fini breadcrumb */
+	cs = gen8_emit_ggtt_write(cs,
+				  rq->fence.seqno,
+				  i915_request_active_timeline(rq)->hwsp_offset,
+				  0);
+
+	/* User interrupt */
+	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
+
+	rq->tail = intel_ring_offset(rq, cs);
+
+	return cs;
+}
+
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs)
+{
+	struct intel_context *ce = rq->context;
+
+	GEM_BUG_ON(!intel_context_is_child(ce));
+
+	/* Turn on preemption */
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	*cs++ = MI_NOOP;
+
+	/* Signal parent */
+	cs = gen8_emit_ggtt_write(cs,
+				  PARENT_GO_FINI_BREADCRUMB,
+				  get_children_join_addr(ce->parent,
+							 ce->guc_child_index),
+				  0);
+
+	/* Wait parent on for go */
+	*cs++ = (MI_SEMAPHORE_WAIT |
+		 MI_SEMAPHORE_GLOBAL_GTT |
+		 MI_SEMAPHORE_POLL |
+		 MI_SEMAPHORE_SAD_EQ_SDD);
+	*cs++ = CHILD_GO_FINI_BREADCRUMB;
+	*cs++ = get_children_go_addr(ce->parent);
+	*cs++ = 0;
+
+	/* Emit fini breadcrumb */
+	cs = gen8_emit_ggtt_write(cs,
+				  rq->fence.seqno,
+				  i915_request_active_timeline(rq)->hwsp_offset,
+				  0);
+
+	/* User interrupt */
+	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
+
+	rq->tail = intel_ring_offset(rq, cs);
+
+	return cs;
+}
+
 static inline struct intel_context *
 g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 {
@@ -4653,6 +4921,19 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 			drm_printf(p, "\t\tWQI Status: %u\n\n",
 				   READ_ONCE(desc->wq_status));
 
+			drm_printf(p, "\t\tNumber Children: %u\n\n",
+				   ce->guc_number_children);
+			if (ce->engine->emit_bb_start ==
+			    emit_bb_start_parent_no_preempt_mid_batch) {
+				u8 i;
+
+				drm_printf(p, "\t\tChildren Go: %u\n\n",
+					   get_children_go_value(ce));
+				for (i = 0; i < ce->guc_number_children; ++i)
+					drm_printf(p, "\t\tChildren Join: %u\n",
+						   get_children_join_value(ce, i));
+			}
+
 			for_each_child(ce, child)
 				guc_log_context(p, child);
 		}

From patchwork Tue Aug  3 22:29:28 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417577
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 32B51C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:28 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id F402760184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:27 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org F402760184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 6A3466E958;
	Tue,  3 Aug 2021 22:12:09 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 525A26E8E8;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393478"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393478"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512729"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 31/46] drm/i915: Move secure execbuf check to execbuf2
Date: Tue,  3 Aug 2021 15:29:28 -0700
Message-Id: <20210803222943.27686-32-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Goal is to remove all input sanity checks from the core submission.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 35 +++++++++++--------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1ed7475de454..70d352fc543f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3184,19 +3184,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.num_fences = 0;
 
 	eb.batch_flags = 0;
-	if (args->flags & I915_EXEC_SECURE) {
-		if (GRAPHICS_VER(i915) >= 11)
-			return -ENODEV;
-
-		/* Return -EPERM to trigger fallback code on old binaries. */
-		if (!HAS_SECURE_BATCHES(i915))
-			return -EPERM;
-
-		if (!drm_is_current_master(file) || !capable(CAP_SYS_ADMIN))
-			return -EPERM;
-
+	if (args->flags & I915_EXEC_SECURE)
 		eb.batch_flags |= I915_DISPATCH_SECURE;
-	}
 	if (args->flags & I915_EXEC_IS_PINNED)
 		eb.batch_flags |= I915_DISPATCH_PINNED;
 
@@ -3414,6 +3403,18 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+	if (args->flags & I915_EXEC_SECURE) {
+		if (GRAPHICS_VER(i915) >= 11)
+			return -ENODEV;
+
+		/* Return -EPERM to trigger fallback code on old binaries. */
+		if (!HAS_SECURE_BATCHES(i915))
+			return -EPERM;
+
+		if (!drm_is_current_master(file) || !capable(CAP_SYS_ADMIN))
+			return -EPERM;
+	}
+
 	err = i915_gem_check_execbuffer(args);
 	if (err)
 		return err;
@@ -3430,8 +3431,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			   u64_to_user_ptr(args->buffers_ptr),
 			   sizeof(*exec2_list) * count)) {
 		drm_dbg(&i915->drm, "copy %zd exec entries failed\n", count);
-		kvfree(exec2_list);
-		return -EFAULT;
+		err = -EFAULT;
+		goto err_copy;
 	}
 
 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list);
@@ -3476,6 +3477,12 @@ end:;
 
 	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
 	kvfree(exec2_list);
+
+	return err;
+
+err_copy:
+	kvfree(exec2_list);
+
 	return err;
 }
 

From patchwork Tue Aug  3 22:29:29 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417525
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0950AC43216
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:41 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id D097760184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:40 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D097760184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 0E0F26E934;
	Tue,  3 Aug 2021 22:12:02 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 770326E8F3;
 Tue,  3 Aug 2021 22:11:56 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393479"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393479"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512731"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 32/46] drm/i915: Move input/exec fence handling to
 i915_gem_execbuffer2
Date: Tue,  3 Aug 2021 15:29:29 -0700
Message-Id: <20210803222943.27686-33-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Move the job of creating an input/exec fences (from a file descriptor)
out of i915_gem_do_execbuffer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 75 +++++++++++--------
 1 file changed, 43 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 70d352fc543f..0416bcb551b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3146,11 +3146,12 @@ static int
 i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
 		       struct drm_i915_gem_execbuffer2 *args,
-		       struct drm_i915_gem_exec_object2 *exec)
+		       struct drm_i915_gem_exec_object2 *exec,
+		       struct dma_fence *in_fence,
+		       struct dma_fence *exec_fence)
 {
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct i915_execbuffer eb;
-	struct dma_fence *in_fence = NULL;
 	struct sync_file *out_fence = NULL;
 	struct i915_vma *batch;
 	int out_fence_fd = -1;
@@ -3197,25 +3198,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_ext;
 
-#define IN_FENCES (I915_EXEC_FENCE_IN | I915_EXEC_FENCE_SUBMIT)
-	if (args->flags & IN_FENCES) {
-		if ((args->flags & IN_FENCES) == IN_FENCES)
-			return -EINVAL;
-
-		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
-		if (!in_fence) {
-			err = -EINVAL;
-			goto err_ext;
-		}
-	}
-#undef IN_FENCES
-
 	if (args->flags & I915_EXEC_FENCE_OUT) {
 		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
-		if (out_fence_fd < 0) {
-			err = out_fence_fd;
-			goto err_in_fence;
-		}
+		if (out_fence_fd < 0)
+			goto err_ext;
 	}
 
 	err = eb_create(&eb);
@@ -3277,13 +3263,16 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_ext;
 	}
 
+	if (exec_fence) {
+		err = i915_request_await_execution(eb.request,
+						   exec_fence);
+		if (err < 0)
+			goto err_request;
+	}
+
 	if (in_fence) {
-		if (args->flags & I915_EXEC_FENCE_SUBMIT)
-			err = i915_request_await_execution(eb.request,
-							   in_fence);
-		else
-			err = i915_request_await_dma_fence(eb.request,
-							   in_fence);
+		err = i915_request_await_dma_fence(eb.request,
+						   in_fence);
 		if (err < 0)
 			goto err_request;
 	}
@@ -3363,8 +3352,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_out_fence:
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
-err_in_fence:
-	dma_fence_put(in_fence);
 err_ext:
 	put_fence_array(eb.fences, eb.num_fences);
 	return err;
@@ -3395,6 +3382,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_execbuffer2 *args = data;
 	struct drm_i915_gem_exec_object2 *exec2_list;
+	struct dma_fence *in_fence = NULL;
+	struct dma_fence *exec_fence = NULL;
 	const size_t count = args->buffer_count;
 	int err;
 
@@ -3419,13 +3408,33 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		return err;
 
+	if (args->flags & I915_EXEC_FENCE_IN) {
+		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!in_fence)
+			return -EINVAL;
+	}
+
+	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
+		if (in_fence) {
+			err = -EINVAL;
+			goto err_exec_fence;
+		}
+
+		exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!exec_fence) {
+			err = -EINVAL;
+			goto err_exec_fence;
+		}
+	}
+
 	/* Allocate extra slots for use by the command parser */
 	exec2_list = kvmalloc_array(count + 2, eb_element_size(),
 				    __GFP_NOWARN | GFP_KERNEL);
 	if (exec2_list == NULL) {
 		drm_dbg(&i915->drm, "Failed to allocate exec list for %zd buffers\n",
 			count);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto err_alloc;
 	}
 	if (copy_from_user(exec2_list,
 			   u64_to_user_ptr(args->buffers_ptr),
@@ -3435,7 +3444,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		goto err_copy;
 	}
 
-	err = i915_gem_do_execbuffer(dev, file, args, exec2_list);
+	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, in_fence,
+				     exec_fence);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -3476,12 +3486,13 @@ end:;
 	}
 
 	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
-	kvfree(exec2_list);
-
-	return err;
 
 err_copy:
 	kvfree(exec2_list);
+err_alloc:
+	dma_fence_put(exec_fence);
+err_exec_fence:
+	dma_fence_put(in_fence);
 
 	return err;
 }

From patchwork Tue Aug  3 22:29:30 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417581
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D83D0C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:30 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id AD28E60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:30 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AD28E60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id ECB496E96C;
	Tue,  3 Aug 2021 22:12:10 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 350E96E167;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393480"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393480"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512732"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 33/46] drm/i915: Move output fence handling to
 i915_gem_execbuffer2
Date: Tue,  3 Aug 2021 15:29:30 -0700
Message-Id: <20210803222943.27686-34-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Move the job of creating a new file descriptor and passing it back to
userspace to i915_gem_execbuffer2.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 45 ++++++++++---------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0416bcb551b0..66f1819fcebc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3148,13 +3148,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_i915_gem_execbuffer2 *args,
 		       struct drm_i915_gem_exec_object2 *exec,
 		       struct dma_fence *in_fence,
-		       struct dma_fence *exec_fence)
+		       struct dma_fence *exec_fence,
+		       int out_fence_fd)
 {
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct i915_execbuffer eb;
 	struct sync_file *out_fence = NULL;
 	struct i915_vma *batch;
-	int out_fence_fd = -1;
 	int err;
 
 	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
@@ -3198,15 +3198,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_ext;
 
-	if (args->flags & I915_EXEC_FENCE_OUT) {
-		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
-		if (out_fence_fd < 0)
-			goto err_ext;
-	}
-
 	err = eb_create(&eb);
 	if (err)
-		goto err_out_fence;
+		goto err_ext;
 
 	GEM_BUG_ON(!eb.lut_size);
 
@@ -3283,7 +3277,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_request;
 	}
 
-	if (out_fence_fd != -1) {
+	if (out_fence_fd >= 0) {
 		out_fence = sync_file_create(&eb.request->fence);
 		if (!out_fence) {
 			err = -ENOMEM;
@@ -3313,14 +3307,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		signal_fence_array(&eb);
 
 	if (out_fence) {
-		if (err == 0) {
+		if (err == 0)
 			fd_install(out_fence_fd, out_fence->file);
-			args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */
-			args->rsvd2 |= (u64)out_fence_fd << 32;
-			out_fence_fd = -1;
-		} else {
+		else
 			fput(out_fence->file);
-		}
 	}
 
 	if (unlikely(eb.gem_context->syncobj)) {
@@ -3349,9 +3339,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	i915_gem_context_put(eb.gem_context);
 err_destroy:
 	eb_destroy(&eb);
-err_out_fence:
-	if (out_fence_fd != -1)
-		put_unused_fd(out_fence_fd);
 err_ext:
 	put_fence_array(eb.fences, eb.num_fences);
 	return err;
@@ -3384,6 +3371,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_exec_object2 *exec2_list;
 	struct dma_fence *in_fence = NULL;
 	struct dma_fence *exec_fence = NULL;
+	int out_fence_fd = -1;
 	const size_t count = args->buffer_count;
 	int err;
 
@@ -3427,6 +3415,14 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
+	if (args->flags & I915_EXEC_FENCE_OUT) {
+		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
+		if (out_fence_fd < 0) {
+			err = out_fence_fd;
+			goto err_out_fence;
+		}
+	}
+
 	/* Allocate extra slots for use by the command parser */
 	exec2_list = kvmalloc_array(count + 2, eb_element_size(),
 				    __GFP_NOWARN | GFP_KERNEL);
@@ -3445,7 +3441,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	}
 
 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, in_fence,
-				     exec_fence);
+				     exec_fence, out_fence_fd);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -3485,11 +3481,20 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 end:;
 	}
 
+	if (!err && out_fence_fd >= 0) {
+		args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */
+		args->rsvd2 |= (u64)out_fence_fd << 32;
+		out_fence_fd = -1;
+	}
+
 	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
 
 err_copy:
 	kvfree(exec2_list);
 err_alloc:
+	if (out_fence_fd >= 0)
+		put_unused_fd(out_fence_fd);
+err_out_fence:
 	dma_fence_put(exec_fence);
 err_exec_fence:
 	dma_fence_put(in_fence);

From patchwork Tue Aug  3 22:29:31 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417545
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 707EBC4320E
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:56 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 416DD60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:12:56 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 416DD60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 593206E8FA;
	Tue,  3 Aug 2021 22:12:05 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 753C26E8FD;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393481"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393481"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512733"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 34/46] drm/i915: Return output fence from
 i915_gem_do_execbuffer
Date: Tue,  3 Aug 2021 15:29:31 -0700
Message-Id: <20210803222943.27686-35-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Move the job of creating a new sync fence and installing it onto a file
descriptor to i915_gem_execbuffer2.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 39 +++++++++----------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 66f1819fcebc..40311583f03d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3149,11 +3149,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_i915_gem_exec_object2 *exec,
 		       struct dma_fence *in_fence,
 		       struct dma_fence *exec_fence,
-		       int out_fence_fd)
+		       struct dma_fence **out_fence)
 {
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct i915_execbuffer eb;
-	struct sync_file *out_fence = NULL;
 	struct i915_vma *batch;
 	int err;
 
@@ -3277,14 +3276,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_request;
 	}
 
-	if (out_fence_fd >= 0) {
-		out_fence = sync_file_create(&eb.request->fence);
-		if (!out_fence) {
-			err = -ENOMEM;
-			goto err_request;
-		}
-	}
-
 	/*
 	 * Whilst this request exists, batch_obj will be on the
 	 * active_list, and so will hold the active reference. Only when this
@@ -3306,12 +3297,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (eb.fences)
 		signal_fence_array(&eb);
 
-	if (out_fence) {
-		if (err == 0)
-			fd_install(out_fence_fd, out_fence->file);
-		else
-			fput(out_fence->file);
-	}
+	if (!err && out_fence)
+		*out_fence = dma_fence_get(&eb.request->fence);
 
 	if (unlikely(eb.gem_context->syncobj)) {
 		drm_syncobj_replace_fence(eb.gem_context->syncobj,
@@ -3369,6 +3356,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_execbuffer2 *args = data;
 	struct drm_i915_gem_exec_object2 *exec2_list;
+	struct dma_fence **out_fence_p = NULL;
+	struct dma_fence *out_fence = NULL;
 	struct dma_fence *in_fence = NULL;
 	struct dma_fence *exec_fence = NULL;
 	int out_fence_fd = -1;
@@ -3421,6 +3410,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			err = out_fence_fd;
 			goto err_out_fence;
 		}
+		out_fence_p = &out_fence;
 	}
 
 	/* Allocate extra slots for use by the command parser */
@@ -3441,7 +3431,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	}
 
 	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, in_fence,
-				     exec_fence, out_fence_fd);
+				     exec_fence, out_fence_p);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -3482,9 +3472,18 @@ end:;
 	}
 
 	if (!err && out_fence_fd >= 0) {
-		args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */
-		args->rsvd2 |= (u64)out_fence_fd << 32;
-		out_fence_fd = -1;
+		struct sync_file *sync_fence;
+
+		sync_fence = sync_file_create(out_fence);
+		if (sync_fence) {
+			fd_install(out_fence_fd, sync_fence->file);
+			args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */
+			args->rsvd2 |= (u64)out_fence_fd << 32;
+			out_fence_fd = -1;
+		}
+		dma_fence_put(out_fence);
+	} else if (out_fence) {
+		dma_fence_put(out_fence);
 	}
 
 	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;

From patchwork Tue Aug  3 22:29:32 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417579
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7F0F3C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:29 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 514EE60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:29 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 514EE60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 913E66E96A;
	Tue,  3 Aug 2021 22:12:10 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 951796E195;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393482"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393482"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512735"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 35/46] drm/i915: Store batch index in struct i915_execbuffer
Date: Tue,  3 Aug 2021 15:29:32 -0700
Message-Id: <20210803222943.27686-36-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

This will help with upcoming extensions where more than 1 batch can be
submitted in a single execbuf IOCTL.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 40311583f03d..1f1f477e46b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -252,6 +252,9 @@ struct i915_execbuffer {
 	struct eb_vma *batch; /** identity of the batch obj/vma */
 	struct i915_vma *trampoline; /** trampoline used for chaining */
 
+	/* batch_index in vma list */
+	unsigned int batch_index;
+
 	/** actual size of execobj[] as we may extend it for the cmdparser */
 	unsigned int buffer_count;
 
@@ -361,6 +364,11 @@ static int eb_create(struct i915_execbuffer *eb)
 		eb->lut_size = -eb->buffer_count;
 	}
 
+	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
+		eb->batch_index = 0;
+	else
+		eb->batch_index = eb->args->buffer_count - 1;
+
 	return 0;
 }
 
@@ -735,14 +743,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
 	} while (1);
 }
 
-static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
-{
-	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
-		return 0;
-	else
-		return eb->buffer_count - 1;
-}
-
 static int eb_select_context(struct i915_execbuffer *eb)
 {
 	struct i915_gem_context *ctx;
@@ -852,7 +852,6 @@ static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct drm_i915_private *i915 = eb->i915;
-	unsigned int batch = eb_batch_index(eb);
 	unsigned int i;
 	int err = 0;
 
@@ -873,7 +872,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 			goto err;
 		}
 
-		eb_add_vma(eb, i, batch, vma);
+		eb_add_vma(eb, i, eb->batch_index, vma);
 
 		if (i915_gem_object_is_userptr(vma->obj)) {
 			err = i915_gem_object_userptr_submit_init(vma->obj);

From patchwork Tue Aug  3 22:29:33 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417549
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A0819C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:00 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 755FE60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:00 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 755FE60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 660076E94D;
	Tue,  3 Aug 2021 22:12:05 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 9AF556E23F;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393483"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393483"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512736"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:55 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 36/46] drm/i915: Allow callers of i915_gem_do_execbuffer to
 override the batch index
Date: Tue,  3 Aug 2021 15:29:33 -0700
Message-Id: <20210803222943.27686-37-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Allow specifying the batch directly over what is inferred from passed in
execbuf flags.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1f1f477e46b4..707e12725f74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3146,6 +3146,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
 		       struct drm_i915_gem_execbuffer2 *args,
 		       struct drm_i915_gem_exec_object2 *exec,
+		       int batch_index,
 		       struct dma_fence *in_fence,
 		       struct dma_fence *exec_fence,
 		       struct dma_fence **out_fence)
@@ -3202,6 +3203,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	GEM_BUG_ON(!eb.lut_size);
 
+	if (batch_index >= 0)
+		eb.batch_index = batch_index;
+
 	err = eb_select_context(&eb);
 	if (unlikely(err))
 		goto err_destroy;
@@ -3429,7 +3433,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		goto err_copy;
 	}
 
-	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, in_fence,
+	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, -1, in_fence,
 				     exec_fence, out_fence_p);
 
 	/*

From patchwork Tue Aug  3 22:29:34 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417567
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C5497C43216
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:18 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 9D88560EE8
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:18 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9D88560EE8
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id D66166E91D;
	Tue,  3 Aug 2021 22:12:08 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id BC4036E8FF;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393484"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393484"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512737"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 37/46] drm/i915: Teach execbuf there can be more than one
 batch in the objects list
Date: Tue,  3 Aug 2021 15:29:34 -0700
Message-Id: <20210803222943.27686-38-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

In case of multiple batches all batches will be at the beginning in the
exex objects array or at the end based on the existing execbuffer2 flag.

Batches not executed in the current execbuf call will not be processed
for relocations or but will be pinned in same manner as the current
batch.

This will enable multiple do_execbuf calls with a single exec object
array in a later patch.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 31 +++++++++++++------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 707e12725f74..2835ef8734e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -255,6 +255,9 @@ struct i915_execbuffer {
 	/* batch_index in vma list */
 	unsigned int batch_index;
 
+	/* number of batches in execbuf IOCTL */
+	unsigned int num_batches;
+
 	/** actual size of execobj[] as we may extend it for the cmdparser */
 	unsigned int buffer_count;
 
@@ -511,6 +514,14 @@ static bool platform_has_relocs_enabled(const struct i915_execbuffer *eb)
 	return false;
 }
 
+static inline bool
+is_batch_buffer(struct i915_execbuffer *eb, unsigned int buffer_idx)
+{
+	return eb->args->flags & I915_EXEC_BATCH_FIRST ?
+		buffer_idx <= eb->num_batches :
+		buffer_idx >= eb->args->buffer_count - eb->num_batches;
+}
+
 static int
 eb_validate_vma(struct i915_execbuffer *eb,
 		struct drm_i915_gem_exec_object2 *entry,
@@ -562,11 +573,10 @@ eb_validate_vma(struct i915_execbuffer *eb,
 
 static void
 eb_add_vma(struct i915_execbuffer *eb,
-	   unsigned int i, unsigned batch_idx,
-	   struct i915_vma *vma)
+	   unsigned int buffer_idx, struct i915_vma *vma)
 {
-	struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
-	struct eb_vma *ev = &eb->vma[i];
+	struct drm_i915_gem_exec_object2 *entry = &eb->exec[buffer_idx];
+	struct eb_vma *ev = &eb->vma[buffer_idx];
 
 	ev->vma = vma;
 	ev->exec = entry;
@@ -591,14 +601,15 @@ eb_add_vma(struct i915_execbuffer *eb,
 	 * Note that actual hangs have only been observed on gen7, but for
 	 * paranoia do it everywhere.
 	 */
-	if (i == batch_idx) {
+	if (is_batch_buffer(eb, buffer_idx)) {
 		if (entry->relocation_count &&
 		    !(ev->flags & EXEC_OBJECT_PINNED))
 			ev->flags |= __EXEC_OBJECT_NEEDS_BIAS;
 		if (eb->reloc_cache.has_fence)
 			ev->flags |= EXEC_OBJECT_NEEDS_FENCE;
 
-		eb->batch = ev;
+		if (buffer_idx == eb->batch_index)
+			eb->batch = ev;
 	}
 }
 
@@ -872,7 +883,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 			goto err;
 		}
 
-		eb_add_vma(eb, i, eb->batch_index, vma);
+		eb_add_vma(eb, i, vma);
 
 		if (i915_gem_object_is_userptr(vma->obj)) {
 			err = i915_gem_object_userptr_submit_init(vma->obj);
@@ -3147,6 +3158,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_i915_gem_execbuffer2 *args,
 		       struct drm_i915_gem_exec_object2 *exec,
 		       int batch_index,
+		       unsigned int num_batches,
 		       struct dma_fence *in_fence,
 		       struct dma_fence *exec_fence,
 		       struct dma_fence **out_fence)
@@ -3203,6 +3215,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	GEM_BUG_ON(!eb.lut_size);
 
+	eb.num_batches = num_batches;
 	if (batch_index >= 0)
 		eb.batch_index = batch_index;
 
@@ -3433,8 +3446,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		goto err_copy;
 	}
 
-	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, -1, in_fence,
-				     exec_fence, out_fence_p);
+	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, -1, 1,
+				     in_fence, exec_fence, out_fence_p);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore

From patchwork Tue Aug  3 22:29:35 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417585
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 74FCBC4320A
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:36 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 43CD460184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:36 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 43CD460184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id CEAC66E17E;
	Tue,  3 Aug 2021 22:12:11 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id D374D6E904;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393485"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393485"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512738"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 38/46] drm/i915: Only track object dependencies on first
 request
Date: Tue,  3 Aug 2021 15:29:35 -0700
Message-Id: <20210803222943.27686-39-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Only track object dependencies on the first request generated from the
execbuf, this help with the upcoming multi-bb execbuf extension.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 2835ef8734e5..b224b28530d1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2239,7 +2239,7 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
 	return err;
 }
 
-static int eb_move_to_gpu(struct i915_execbuffer *eb)
+static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
 {
 	const unsigned int count = eb->buffer_count;
 	unsigned int i = count;
@@ -2281,7 +2281,7 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
 				flags &= ~EXEC_OBJECT_ASYNC;
 		}
 
-		if (err == 0 && !(flags & EXEC_OBJECT_ASYNC)) {
+		if (err == 0 && first && !(flags & EXEC_OBJECT_ASYNC)) {
 			err = i915_request_await_object
 				(eb->request, obj, flags & EXEC_OBJECT_WRITE);
 		}
@@ -2525,14 +2525,15 @@ static int eb_parse(struct i915_execbuffer *eb)
 	return err;
 }
 
-static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
+static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch,
+		     bool first)
 {
 	int err;
 
 	if (intel_context_nopreempt(eb->context))
 		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags);
 
-	err = eb_move_to_gpu(eb);
+	err = eb_move_to_gpu(eb, first);
 	if (err)
 		return err;
 
@@ -3304,7 +3305,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		intel_gt_buffer_pool_mark_active(eb.batch_pool, eb.request);
 
 	trace_i915_request_queue(eb.request, eb.batch_flags);
-	err = eb_submit(&eb, batch);
+	err = eb_submit(&eb, batch, true);
 
 err_request:
 	i915_request_get(eb.request);

From patchwork Tue Aug  3 22:29:36 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417553
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6091C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:05 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id A2ECE60184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:05 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A2ECE60184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 25F3C6E909;
	Tue,  3 Aug 2021 22:12:06 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 003F46E909;
 Tue,  3 Aug 2021 22:11:57 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393486"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393486"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512739"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 39/46] drm/i915: Force parallel contexts to use copy engine
 for reloc
Date: Tue,  3 Aug 2021 15:29:36 -0700
Message-Id: <20210803222943.27686-40-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Submitting to a subset of hardware contexts is not allowed, so use the
copy engine for GPU relocations when using a parallel context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b224b28530d1..b6143973ac67 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1386,7 +1386,8 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	if (err)
 		goto err_unmap;
 
-	if (engine == eb->context->engine) {
+	if (engine == eb->context->engine &&
+	    !intel_context_is_parallel(eb->context)) {
 		rq = i915_request_create(eb->context);
 	} else {
 		struct intel_context *ce = eb->reloc_context;
@@ -1483,7 +1484,8 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		if (eb_use_cmdparser(eb))
 			return ERR_PTR(-EWOULDBLOCK);
 
-		if (!reloc_can_use_engine(engine)) {
+		if (!reloc_can_use_engine(engine) ||
+		    intel_context_is_parallel(eb->context)) {
 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
 			if (!engine)
 				return ERR_PTR(-ENODEV);

From patchwork Tue Aug  3 22:29:37 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417583
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 09346C4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:33 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id CBEC2601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:32 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CBEC2601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 5A5F36E8F5;
	Tue,  3 Aug 2021 22:12:11 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 1A3D86E90B;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393487"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393487"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512742"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 40/46] drm/i915: Multi-batch execbuffer2
Date: Tue,  3 Aug 2021 15:29:37 -0700
Message-Id: <20210803222943.27686-41-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

For contexts with width set to two or more, we add a mode to execbuf2
which implies there are N batch buffers in the buffer list, each of
which will be sent to one of the engines from the engine map array
(I915_CONTEXT_PARAM_ENGINES, I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT).

Those N batches can either be first N, or last N objects in the list as
controlled by the existing execbuffer2 flag.

The N batches will be submitted to consecutive engines from the previously
configured allowed engine array starting at index 0.

Input and output fences are fully supported, with the latter getting
signalled when all batch buffers have completed.

Last, it isn't safe for subsequent batches to touch any objects written
to by a multi-BB submission until all the batches in that submission
complete. As such all batches in a multi-BB submission must be combined
into a single composite fence and put into the dma reseveration excl
fence slot.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 262 +++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
 drivers/gpu/drm/i915/i915_vma.c               |  13 +-
 drivers/gpu/drm/i915/i915_vma.h               |  16 +-
 5 files changed, 266 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b6143973ac67..ecdb583cc2eb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -252,6 +252,9 @@ struct i915_execbuffer {
 	struct eb_vma *batch; /** identity of the batch obj/vma */
 	struct i915_vma *trampoline; /** trampoline used for chaining */
 
+	/** used for excl fence in dma_resv objects when > 1 BB submitted */
+	struct dma_fence *composite_fence;
+
 	/* batch_index in vma list */
 	unsigned int batch_index;
 
@@ -367,11 +370,6 @@ static int eb_create(struct i915_execbuffer *eb)
 		eb->lut_size = -eb->buffer_count;
 	}
 
-	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
-		eb->batch_index = 0;
-	else
-		eb->batch_index = eb->args->buffer_count - 1;
-
 	return 0;
 }
 
@@ -2241,7 +2239,7 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
 	return err;
 }
 
-static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
+static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first, bool last)
 {
 	const unsigned int count = eb->buffer_count;
 	unsigned int i = count;
@@ -2289,8 +2287,16 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
 		}
 
 		if (err == 0)
-			err = i915_vma_move_to_active(vma, eb->request,
-						      flags | __EXEC_OBJECT_NO_RESERVE);
+			err = _i915_vma_move_to_active(vma, eb->request,
+						       flags | __EXEC_OBJECT_NO_RESERVE,
+						       !last ?
+						       NULL :
+						       eb->composite_fence ?
+						       eb->composite_fence :
+						       &eb->request->fence,
+						       eb->composite_fence ?
+						       eb->composite_fence :
+						       &eb->request->fence);
 	}
 
 #ifdef CONFIG_MMU_NOTIFIER
@@ -2528,14 +2534,14 @@ static int eb_parse(struct i915_execbuffer *eb)
 }
 
 static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch,
-		     bool first)
+		     bool first, bool last)
 {
 	int err;
 
 	if (intel_context_nopreempt(eb->context))
 		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags);
 
-	err = eb_move_to_gpu(eb, first);
+	err = eb_move_to_gpu(eb, first, last);
 	if (err)
 		return err;
 
@@ -2748,7 +2754,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 }
 
 static int
-eb_select_engine(struct i915_execbuffer *eb)
+eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
 {
 	struct intel_context *ce;
 	unsigned int idx;
@@ -2763,6 +2769,18 @@ eb_select_engine(struct i915_execbuffer *eb)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
+	if (batch_number > 0) {
+		struct intel_context *parent = ce;
+
+		GEM_BUG_ON(!intel_context_is_parent(parent));
+
+		for_each_child(parent, ce)
+			if (!--batch_number)
+				break;
+		intel_context_put(parent);
+		intel_context_get(ce);
+	}
+
 	intel_gt_pm_get(ce->engine->gt);
 
 	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
@@ -3155,13 +3173,49 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
 				    eb);
 }
 
+static int setup_composite_fence(struct i915_execbuffer *eb,
+				 struct dma_fence **out_fence,
+				 unsigned int num_batches)
+{
+	struct dma_fence_array *fence_array;
+	struct dma_fence **fences = kmalloc(num_batches * sizeof(*fences),
+					    GFP_KERNEL);
+	struct intel_context *parent = intel_context_to_parent(eb->context);
+	int i;
+
+	if (!fences)
+		return -ENOMEM;
+
+	for (i = 0; i < num_batches; ++i)
+		fences[i] = out_fence[i];
+
+	fence_array = dma_fence_array_create(num_batches,
+					     fences,
+					     parent->fence_context,
+					     ++parent->seqno,
+					     false);
+	if (!fence_array) {
+		kfree(fences);
+		return -ENOMEM;
+	}
+
+	/* Move ownership to the dma_fence_array created above */
+	for (i = 0; i < num_batches; ++i)
+		dma_fence_get(fences[i]);
+
+	eb->composite_fence = &fence_array->base;
+
+	return 0;
+}
+
 static int
 i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
 		       struct drm_i915_gem_execbuffer2 *args,
 		       struct drm_i915_gem_exec_object2 *exec,
-		       int batch_index,
+		       unsigned int batch_index,
 		       unsigned int num_batches,
+		       unsigned int batch_number,
 		       struct dma_fence *in_fence,
 		       struct dma_fence *exec_fence,
 		       struct dma_fence **out_fence)
@@ -3170,6 +3224,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	struct i915_execbuffer eb;
 	struct i915_vma *batch;
 	int err;
+	bool first = batch_number == 0;
+	bool last = batch_number + 1 == num_batches;
 
 	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
 	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS &
@@ -3194,6 +3250,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.batch_start_offset = args->batch_start_offset;
 	eb.batch_len = args->batch_len;
 	eb.trampoline = NULL;
+	eb.composite_fence = NULL;
 
 	eb.fences = NULL;
 	eb.num_fences = 0;
@@ -3219,14 +3276,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	GEM_BUG_ON(!eb.lut_size);
 
 	eb.num_batches = num_batches;
-	if (batch_index >= 0)
-		eb.batch_index = batch_index;
+	eb.batch_index = batch_index;
 
 	err = eb_select_context(&eb);
 	if (unlikely(err))
 		goto err_destroy;
 
-	err = eb_select_engine(&eb);
+	err = eb_select_engine(&eb, batch_number);
 	if (unlikely(err))
 		goto err_context;
 
@@ -3275,6 +3331,23 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_ext;
 	}
 
+	if (out_fence) {
+		/* Move ownership to caller (i915_gem_execbuffer2_ioctl) */
+		out_fence[batch_number] = dma_fence_get(&eb.request->fence);
+
+		/*
+		 * Need to create a composite fence (dma_fence_array,
+		 * eb.composite_fence) for the excl fence of the dma_resv
+		 * objects as each BB can write to the object. Since we create
+		 */
+		if (num_batches > 1 && last) {
+			err = setup_composite_fence(&eb, out_fence,
+						    num_batches);
+			if (err < 0)
+				goto err_request;
+		}
+	}
+
 	if (exec_fence) {
 		err = i915_request_await_execution(eb.request,
 						   exec_fence);
@@ -3307,17 +3380,27 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		intel_gt_buffer_pool_mark_active(eb.batch_pool, eb.request);
 
 	trace_i915_request_queue(eb.request, eb.batch_flags);
-	err = eb_submit(&eb, batch, true);
+	err = eb_submit(&eb, batch, first, last);
 
 err_request:
+	if (last)
+		set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+			&eb.request->fence.flags);
+
 	i915_request_get(eb.request);
 	err = eb_request_add(&eb, err);
 
 	if (eb.fences)
 		signal_fence_array(&eb);
 
-	if (!err && out_fence)
-		*out_fence = dma_fence_get(&eb.request->fence);
+	/*
+	 * Ownership of the composite fence (dma_fence_array,
+	 * eb.composite_fence) has been moved to the dma_resv objects these BB
+	 * write to in i915_vma_move_to_active. It is ok to release the creation
+	 * reference of this fence now.
+	 */
+	if (eb.composite_fence)
+		dma_fence_put(eb.composite_fence);
 
 	if (unlikely(eb.gem_context->syncobj)) {
 		drm_syncobj_replace_fence(eb.gem_context->syncobj,
@@ -3368,6 +3451,17 @@ static bool check_buffer_count(size_t count)
 	return !(count < 1 || count > INT_MAX || count > SIZE_MAX / sz - 1);
 }
 
+/* Release fences from the dma_fence_get in i915_gem_do_execbuffer. */
+static inline void put_out_fences(struct dma_fence **out_fences,
+				  unsigned int num_batches)
+{
+	int i;
+
+	for (i = 0; i < num_batches; ++i)
+		if (out_fences[i])
+			dma_fence_put(out_fences[i]);
+}
+
 int
 i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			   struct drm_file *file)
@@ -3375,13 +3469,16 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_execbuffer2 *args = data;
 	struct drm_i915_gem_exec_object2 *exec2_list;
-	struct dma_fence **out_fence_p = NULL;
-	struct dma_fence *out_fence = NULL;
+	struct dma_fence **out_fences = NULL;
 	struct dma_fence *in_fence = NULL;
 	struct dma_fence *exec_fence = NULL;
 	int out_fence_fd = -1;
 	const size_t count = args->buffer_count;
 	int err;
+	struct i915_gem_context *ctx;
+	struct intel_context *parent = NULL;
+	unsigned int num_batches = 1, i;
+	bool is_parallel = false;
 
 	if (!check_buffer_count(count)) {
 		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
@@ -3404,10 +3501,39 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	if (err)
 		return err;
 
+	ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	if (i915_gem_context_user_engines(ctx)) {
+		parent = i915_gem_context_get_engine(ctx, args->flags &
+						     I915_EXEC_RING_MASK);
+		if (IS_ERR(parent)) {
+			err = PTR_ERR(parent);
+			goto err_context;
+		}
+
+		if (intel_context_is_parent(parent)) {
+			if (args->batch_len) {
+				err = -EINVAL;
+				goto err_context;
+			}
+
+			num_batches = parent->guc_number_children + 1;
+			if (num_batches > count) {
+				i915_gem_context_put(ctx);
+				goto err_parent;
+			}
+			is_parallel = true;
+		}
+	}
+
 	if (args->flags & I915_EXEC_FENCE_IN) {
 		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
-		if (!in_fence)
-			return -EINVAL;
+		if (!in_fence) {
+			err = -EINVAL;
+			goto err_parent;
+		}
 	}
 
 	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
@@ -3423,13 +3549,25 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	if (args->flags & I915_EXEC_FENCE_OUT) {
-		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
-		if (out_fence_fd < 0) {
-			err = out_fence_fd;
+	/*
+	 * We always allocate out fences when doing multi-BB submission as
+	 * this is required to create an excl fence for any dma buf objects
+	 * these BBs touch.
+	 */
+	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
+		out_fences = kcalloc(num_batches, sizeof(*out_fences),
+				     GFP_KERNEL);
+		if (!out_fences) {
+			err = -ENOMEM;
 			goto err_out_fence;
 		}
-		out_fence_p = &out_fence;
+		if (args->flags & I915_EXEC_FENCE_OUT) {
+			out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
+			if (out_fence_fd < 0) {
+				err = out_fence_fd;
+				goto err_out_fence;
+			}
+		}
 	}
 
 	/* Allocate extra slots for use by the command parser */
@@ -3449,8 +3587,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 		goto err_copy;
 	}
 
-	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, -1, 1,
-				     in_fence, exec_fence, out_fence_p);
+	/*
+	 * Downstream submission code expects all parallel submissions to occur
+	 * in intel_context sequence, thus only 1 submission can happen at a
+	 * time.
+	 */
+	if (is_parallel)
+		mutex_lock(&parent->parallel_submit);
+
+	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
+				     args->flags & I915_EXEC_BATCH_FIRST ?
+				     0 : count - num_batches,
+				     num_batches,
+				     0,
+				     in_fence,
+				     exec_fence,
+				     out_fences);
+
+	for (i = 1; err == 0 && i < num_batches; i++)
+		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
+					     args->flags & I915_EXEC_BATCH_FIRST ?
+					     i : count - num_batches + i,
+					     num_batches,
+					     i,
+					     NULL,
+					     NULL,
+					     out_fences);
+
+	if (is_parallel)
+		mutex_unlock(&parent->parallel_submit);
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
@@ -3491,8 +3656,31 @@ end:;
 	}
 
 	if (!err && out_fence_fd >= 0) {
+		struct dma_fence *out_fence = NULL;
 		struct sync_file *sync_fence;
 
+		if (is_parallel) {
+			struct dma_fence_array *fence_array;
+
+			/*
+			 * The dma_fence_array now owns out_fences (from
+			 * dma_fence_get in i915_gem_do_execbuffer) assuming
+			 * successful creation of dma_fence_array.
+			 */
+			fence_array = dma_fence_array_create(num_batches,
+							     out_fences,
+							     parent->fence_context,
+							     ++parent->seqno,
+							     false);
+			if (!fence_array)
+				goto put_out_fences;
+
+			out_fence = &fence_array->base;
+			out_fences = NULL;
+		} else {
+			out_fence = out_fences[0];
+		}
+
 		sync_fence = sync_file_create(out_fence);
 		if (sync_fence) {
 			fd_install(out_fence_fd, sync_fence->file);
@@ -3500,9 +3688,15 @@ end:;
 			args->rsvd2 |= (u64)out_fence_fd << 32;
 			out_fence_fd = -1;
 		}
+
+		/*
+		 * The sync_file now owns out_fence, drop the creation
+		 * reference.
+		 */
 		dma_fence_put(out_fence);
-	} else if (out_fence) {
-		dma_fence_put(out_fence);
+	} else if (out_fences) {
+put_out_fences:
+		put_out_fences(out_fences, num_batches);
 	}
 
 	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
@@ -3513,9 +3707,15 @@ end:;
 	if (out_fence_fd >= 0)
 		put_unused_fd(out_fence_fd);
 err_out_fence:
+	kfree(out_fences);
 	dma_fence_put(exec_fence);
 err_exec_fence:
 	dma_fence_put(in_fence);
+err_parent:
+	if (parent)
+		intel_context_put(parent);
+err_context:
+	i915_gem_context_put(ctx);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index f396993374da..2c07f5f22c94 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -472,6 +472,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
+	mutex_init(&ce->parallel_submit);
+	ce->fence_context = dma_fence_context_alloc(1);
+
 	/*
 	 * Initialize fence to be complete as this is expected to be complete
 	 * unless there is a pending schedule disable outstanding.
@@ -498,6 +501,8 @@ void intel_context_fini(struct intel_context *ce)
 		for_each_child_safe(ce, child, next)
 			intel_context_put(child);
 
+	mutex_destroy(&ce->parallel_submit);
+
 	mutex_destroy(&ce->pin_mutex);
 	i915_active_fini(&ce->active);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index fdc4890335b7..8af9ace4c052 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -235,6 +235,15 @@ struct intel_context {
 
 	/* Last request submitted on a parent */
 	struct i915_request *last_rq;
+
+	/* Parallel submission mutex */
+	struct mutex parallel_submit;
+
+	/* Fence context for parallel submission */
+	u64 fence_context;
+
+	/* Seqno for parallel submission */
+	u32 seqno;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 4b7fc4647e46..ed4e790276a9 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1234,9 +1234,11 @@ int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq)
 	return i915_active_add_request(&vma->active, rq);
 }
 
-int i915_vma_move_to_active(struct i915_vma *vma,
-			    struct i915_request *rq,
-			    unsigned int flags)
+int _i915_vma_move_to_active(struct i915_vma *vma,
+			     struct i915_request *rq,
+			     unsigned int flags,
+			     struct dma_fence *shared_fence,
+			     struct dma_fence *excl_fence)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 	int err;
@@ -1257,7 +1259,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 			intel_frontbuffer_put(front);
 		}
 
-		dma_resv_add_excl_fence(vma->resv, &rq->fence);
+		dma_resv_add_excl_fence(vma->resv, excl_fence);
 		obj->write_domain = I915_GEM_DOMAIN_RENDER;
 		obj->read_domains = 0;
 	} else {
@@ -1267,7 +1269,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 				return err;
 		}
 
-		dma_resv_add_shared_fence(vma->resv, &rq->fence);
+		if (shared_fence)
+			dma_resv_add_shared_fence(vma->resv, shared_fence);
 		obj->write_domain = 0;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index ed69f66c7ab0..a36da651dbff 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -57,9 +57,19 @@ static inline bool i915_vma_is_active(const struct i915_vma *vma)
 
 int __must_check __i915_vma_move_to_active(struct i915_vma *vma,
 					   struct i915_request *rq);
-int __must_check i915_vma_move_to_active(struct i915_vma *vma,
-					 struct i915_request *rq,
-					 unsigned int flags);
+
+int __must_check _i915_vma_move_to_active(struct i915_vma *vma,
+					  struct i915_request *rq,
+					  unsigned int flags,
+					  struct dma_fence *shared_fence,
+					  struct dma_fence *excl_fence);
+static inline int __must_check
+i915_vma_move_to_active(struct i915_vma *vma,
+			struct i915_request *rq,
+			unsigned int flags)
+{
+	return _i915_vma_move_to_active(vma, rq, flags, &rq->fence, &rq->fence);
+}
 
 #define __i915_vma_flags(v) ((unsigned long *)&(v)->flags.counter)
 

From patchwork Tue Aug  3 22:29:38 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417587
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7E44CC432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:38 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 4DBE560184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:38 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4DBE560184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id D924D6E95E;
	Tue,  3 Aug 2021 22:12:12 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 3D8E46E252;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393488"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393488"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512743"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 41/46] drm/i915: Eliminate unnecessary VMA calls for multi-BB
 submission
Date: Tue,  3 Aug 2021 15:29:38 -0700
Message-Id: <20210803222943.27686-42-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Certain VMA functions in the execbuf IOCTL only need to be called on
first or last BB of a multi-BB submission. eb_relocate() on the first
and eb_release_vmas() on the last. Doing so will save CPU / GPU cycles.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 127 +++++++++++-------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  14 +-
 2 files changed, 83 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index ecdb583cc2eb..70784779872a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -270,7 +270,7 @@ struct i915_execbuffer {
 	/** list of vma that have execobj.relocation_count */
 	struct list_head relocs;
 
-	struct i915_gem_ww_ctx ww;
+	struct i915_gem_ww_ctx *ww;
 
 	/**
 	 * Track the most recently used object for relocations, as we
@@ -448,7 +448,7 @@ eb_pin_vma(struct i915_execbuffer *eb,
 		pin_flags |= PIN_GLOBAL;
 
 	/* Attempt to reuse the current location if available */
-	err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
+	err = i915_vma_pin_ww(vma, eb->ww, 0, 0, pin_flags);
 	if (err == -EDEADLK)
 		return err;
 
@@ -457,11 +457,11 @@ eb_pin_vma(struct i915_execbuffer *eb,
 			return err;
 
 		/* Failing that pick any _free_ space if suitable */
-		err = i915_vma_pin_ww(vma, &eb->ww,
-					     entry->pad_to_size,
-					     entry->alignment,
-					     eb_pin_flags(entry, ev->flags) |
-					     PIN_USER | PIN_NOEVICT);
+		err = i915_vma_pin_ww(vma, eb->ww,
+				      entry->pad_to_size,
+				      entry->alignment,
+				      eb_pin_flags(entry, ev->flags) |
+				      PIN_USER | PIN_NOEVICT);
 		if (unlikely(err))
 			return err;
 	}
@@ -643,9 +643,9 @@ static int eb_reserve_vma(struct i915_execbuffer *eb,
 			return err;
 	}
 
-	err = i915_vma_pin_ww(vma, &eb->ww,
-			   entry->pad_to_size, entry->alignment,
-			   eb_pin_flags(entry, ev->flags) | pin_flags);
+	err = i915_vma_pin_ww(vma, eb->ww,
+			      entry->pad_to_size, entry->alignment,
+			      eb_pin_flags(entry, ev->flags) | pin_flags);
 	if (err)
 		return err;
 
@@ -940,7 +940,7 @@ static int eb_lock_vmas(struct i915_execbuffer *eb)
 		struct eb_vma *ev = &eb->vma[i];
 		struct i915_vma *vma = ev->vma;
 
-		err = i915_gem_object_lock(vma->obj, &eb->ww);
+		err = i915_gem_object_lock(vma->obj, eb->ww);
 		if (err)
 			return err;
 	}
@@ -1020,12 +1020,13 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 	}
 }
 
-static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
+static void eb_release_vmas(struct i915_execbuffer *eb, bool final,
+			    bool unreserve)
 {
 	const unsigned int count = eb->buffer_count;
 	unsigned int i;
 
-	for (i = 0; i < count; i++) {
+	for (i = 0; unreserve && i < count; i++) {
 		struct eb_vma *ev = &eb->vma[i];
 		struct i915_vma *vma = ev->vma;
 
@@ -1237,7 +1238,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 		if (err)
 			return ERR_PTR(err);
 
-		vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0,
+		vma = i915_gem_object_ggtt_pin_ww(obj, eb->ww, NULL, 0, 0,
 						  PIN_MAPPABLE |
 						  PIN_NONBLOCK /* NOWARN */ |
 						  PIN_NOEVICT);
@@ -1361,7 +1362,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	}
 	eb->reloc_pool = NULL;
 
-	err = i915_gem_object_lock(pool->obj, &eb->ww);
+	err = i915_gem_object_lock(pool->obj, eb->ww);
 	if (err)
 		goto err_pool;
 
@@ -1380,7 +1381,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto err_unmap;
 	}
 
-	err = i915_vma_pin_ww(batch, &eb->ww, 0, 0, PIN_USER | PIN_NONBLOCK);
+	err = i915_vma_pin_ww(batch, eb->ww, 0, 0, PIN_USER | PIN_NONBLOCK);
 	if (err)
 		goto err_unmap;
 
@@ -1402,7 +1403,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 			eb->reloc_context = ce;
 		}
 
-		err = intel_context_pin_ww(ce, &eb->ww);
+		err = intel_context_pin_ww(ce, eb->ww);
 		if (err)
 			goto err_unpin;
 
@@ -2017,8 +2018,8 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb,
 	}
 
 	/* We may process another execbuffer during the unlock... */
-	eb_release_vmas(eb, false);
-	i915_gem_ww_ctx_fini(&eb->ww);
+	eb_release_vmas(eb, false, true);
+	i915_gem_ww_ctx_fini(eb->ww);
 
 	if (rq) {
 		/* nonblocking is always false */
@@ -2062,7 +2063,7 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb,
 		err = eb_reinit_userptr(eb);
 
 err_relock:
-	i915_gem_ww_ctx_init(&eb->ww, true);
+	i915_gem_ww_ctx_init(eb->ww, true);
 	if (err)
 		goto out;
 
@@ -2119,8 +2120,8 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb,
 
 err:
 	if (err == -EDEADLK) {
-		eb_release_vmas(eb, false);
-		err = i915_gem_ww_ctx_backoff(&eb->ww);
+		eb_release_vmas(eb, false, true);
+		err = i915_gem_ww_ctx_backoff(eb->ww);
 		if (!err)
 			goto repeat_validate;
 	}
@@ -2152,7 +2153,7 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb,
 	return err;
 }
 
-static int eb_relocate_parse(struct i915_execbuffer *eb)
+static int eb_relocate_parse(struct i915_execbuffer *eb, bool first)
 {
 	int err;
 	struct i915_request *rq = NULL;
@@ -2189,14 +2190,16 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
 	/* only throttle once, even if we didn't need to throttle */
 	throttle = false;
 
-	err = eb_validate_vmas(eb);
-	if (err == -EAGAIN)
-		goto slow;
-	else if (err)
-		goto err;
+	if (first) {
+		err = eb_validate_vmas(eb);
+		if (err == -EAGAIN)
+			goto slow;
+		else if (err)
+			goto err;
+	}
 
 	/* The objects are in their final locations, apply the relocations. */
-	if (eb->args->flags & __EXEC_HAS_RELOC) {
+	if (eb->args->flags & __EXEC_HAS_RELOC && first) {
 		struct eb_vma *ev;
 
 		list_for_each_entry(ev, &eb->relocs, reloc_link) {
@@ -2211,13 +2214,13 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
 			goto slow;
 	}
 
-	if (!err)
+	if (!err && first)
 		err = eb_parse(eb);
 
 err:
 	if (err == -EDEADLK) {
-		eb_release_vmas(eb, false);
-		err = i915_gem_ww_ctx_backoff(&eb->ww);
+		eb_release_vmas(eb, false, true);
+		err = i915_gem_ww_ctx_backoff(eb->ww);
 		if (!err)
 			goto retry;
 	}
@@ -2398,7 +2401,7 @@ shadow_batch_pin(struct i915_execbuffer *eb,
 	if (IS_ERR(vma))
 		return vma;
 
-	err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, flags);
+	err = i915_vma_pin_ww(vma, eb->ww, 0, 0, flags);
 	if (err)
 		return ERR_PTR(err);
 
@@ -2412,7 +2415,7 @@ static struct i915_vma *eb_dispatch_secure(struct i915_execbuffer *eb, struct i9
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (eb->batch_flags & I915_DISPATCH_SECURE)
-		return i915_gem_object_ggtt_pin_ww(vma->obj, &eb->ww, NULL, 0, 0, 0);
+		return i915_gem_object_ggtt_pin_ww(vma->obj, eb->ww, NULL, 0, 0, 0);
 
 	return NULL;
 }
@@ -2458,7 +2461,7 @@ static int eb_parse(struct i915_execbuffer *eb)
 		eb->batch_pool = pool;
 	}
 
-	err = i915_gem_object_lock(pool->obj, &eb->ww);
+	err = i915_gem_object_lock(pool->obj, eb->ww);
 	if (err)
 		goto err;
 
@@ -2666,7 +2669,7 @@ static struct i915_request *eb_pin_engine(struct i915_execbuffer *eb, bool throt
 	 * GGTT space, so do this first before we reserve a seqno for
 	 * ourselves.
 	 */
-	err = intel_context_pin_ww(ce, &eb->ww);
+	err = intel_context_pin_ww(ce, eb->ww);
 	if (err)
 		return ERR_PTR(err);
 
@@ -3218,7 +3221,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		       unsigned int batch_number,
 		       struct dma_fence *in_fence,
 		       struct dma_fence *exec_fence,
-		       struct dma_fence **out_fence)
+		       struct dma_fence **out_fence,
+		       struct i915_gem_ww_ctx *ww)
 {
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct i915_execbuffer eb;
@@ -3239,7 +3243,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 	eb.exec = exec;
 	eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1);
-	eb.vma[0].vma = NULL;
+	if (first)
+		eb.vma[0].vma = NULL;
 	eb.reloc_pool = eb.batch_pool = NULL;
 	eb.reloc_context = NULL;
 
@@ -3251,6 +3256,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.batch_len = args->batch_len;
 	eb.trampoline = NULL;
 	eb.composite_fence = NULL;
+	eb.ww = ww;
 
 	eb.fences = NULL;
 	eb.num_fences = 0;
@@ -3269,9 +3275,14 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_ext;
 
-	err = eb_create(&eb);
-	if (err)
-		goto err_ext;
+	if (first) {
+		err = eb_create(&eb);
+		if (err)
+			goto err_ext;
+	} else {
+		eb.lut_size = -eb.buffer_count;
+	}
+
 
 	GEM_BUG_ON(!eb.lut_size);
 
@@ -3286,15 +3297,22 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_context;
 
-	err = eb_lookup_vmas(&eb);
-	if (err) {
-		eb_release_vmas(&eb, true);
-		goto err_engine;
+	if (first) {
+		err = eb_lookup_vmas(&eb);
+		if (err) {
+			eb_release_vmas(&eb, true, true);
+			goto err_engine;
+		}
+
+	} else {
+		eb.batch = &eb.vma[eb.batch_index];
 	}
 
-	i915_gem_ww_ctx_init(&eb.ww, true);
 
-	err = eb_relocate_parse(&eb);
+	if (first)
+		i915_gem_ww_ctx_init(eb.ww, true);
+
+	err = eb_relocate_parse(&eb, first);
 	if (err) {
 		/*
 		 * If the user expects the execobject.offset and
@@ -3307,7 +3325,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		goto err_vma;
 	}
 
-	ww_acquire_done(&eb.ww.ctx);
+	if (first)
+		ww_acquire_done(&eb.ww->ctx);
 
 	batch = eb.batch->vma;
 
@@ -3410,11 +3429,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	i915_request_put(eb.request);
 
 err_vma:
-	eb_release_vmas(&eb, true);
+	eb_release_vmas(&eb, true, err || last);
 	if (eb.trampoline)
 		i915_vma_unpin(eb.trampoline);
 	WARN_ON(err == -EDEADLK);
-	i915_gem_ww_ctx_fini(&eb.ww);
+	if (err || last)
+		i915_gem_ww_ctx_fini(eb.ww);
 
 	if (eb.batch_pool)
 		intel_gt_buffer_pool_put(eb.batch_pool);
@@ -3476,6 +3496,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	const size_t count = args->buffer_count;
 	int err;
 	struct i915_gem_context *ctx;
+	struct i915_gem_ww_ctx ww;
 	struct intel_context *parent = NULL;
 	unsigned int num_batches = 1, i;
 	bool is_parallel = false;
@@ -3602,7 +3623,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 				     0,
 				     in_fence,
 				     exec_fence,
-				     out_fences);
+				     out_fences,
+				     &ww);
 
 	for (i = 1; err == 0 && i < num_batches; i++)
 		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
@@ -3612,7 +3634,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 					     i,
 					     NULL,
 					     NULL,
-					     out_fences);
+					     out_fences,
+					     &ww);
 
 	if (is_parallel)
 		mutex_unlock(&parent->parallel_submit);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 16162fc2782d..710d2700e5b4 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -32,11 +32,11 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
-	err = i915_gem_object_lock(obj, &eb->ww);
+	err = i915_gem_object_lock(obj, eb->ww);
 	if (err)
 		return err;
 
-	err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
+	err = i915_vma_pin_ww(vma, eb->ww, 0, 0, PIN_USER | PIN_HIGH);
 	if (err)
 		return err;
 
@@ -106,10 +106,12 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 static int igt_gpu_reloc(void *arg)
 {
 	struct i915_execbuffer eb;
+	struct i915_gem_ww_ctx ww;
 	struct drm_i915_gem_object *scratch;
 	int err = 0;
 	u32 *map;
 
+	eb.ww = &ww;
 	eb.i915 = arg;
 
 	scratch = i915_gem_object_create_internal(eb.i915, 4096);
@@ -141,20 +143,20 @@ static int igt_gpu_reloc(void *arg)
 		eb.reloc_pool = NULL;
 		eb.reloc_context = NULL;
 
-		i915_gem_ww_ctx_init(&eb.ww, false);
+		i915_gem_ww_ctx_init(eb.ww, false);
 retry:
-		err = intel_context_pin_ww(eb.context, &eb.ww);
+		err = intel_context_pin_ww(eb.context, eb.ww);
 		if (!err) {
 			err = __igt_gpu_reloc(&eb, scratch);
 
 			intel_context_unpin(eb.context);
 		}
 		if (err == -EDEADLK) {
-			err = i915_gem_ww_ctx_backoff(&eb.ww);
+			err = i915_gem_ww_ctx_backoff(eb.ww);
 			if (!err)
 				goto retry;
 		}
-		i915_gem_ww_ctx_fini(&eb.ww);
+		i915_gem_ww_ctx_fini(eb.ww);
 
 		if (eb.reloc_pool)
 			intel_gt_buffer_pool_put(eb.reloc_pool);

From patchwork Tue Aug  3 22:29:39 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417589
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1C7E7C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:41 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id D5F2460184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:40 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D5F2460184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id F3CB06E961;
	Tue,  3 Aug 2021 22:12:12 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 4A8FF6E8E9;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393489"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393489"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512744"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 42/46] drm/i915: Hold all parallel requests until last
 request,
 properly handle error
Date: Tue,  3 Aug 2021 15:29:39 -0700
Message-Id: <20210803222943.27686-43-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Hold all parallel requests, via a submit fence, until the last request
is generated. If an error occurs in the middle of generating the
requests, skip the requests signal the backend of the error via a
request flag.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 40 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h           |  9 +++++
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 70784779872a..64af5c704ca7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3351,7 +3351,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	}
 
 	if (out_fence) {
-		/* Move ownership to caller (i915_gem_execbuffer2_ioctl) */
+		/*
+		 * Move ownership to caller (i915_gem_execbuffer2_ioctl), this
+		 * must be done before anything in this function can jump to the
+		 * 'err_request' label so the caller can safely cleanup any
+		 * errors.
+		 */
 		out_fence[batch_number] = dma_fence_get(&eb.request->fence);
 
 		/*
@@ -3402,10 +3407,21 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	err = eb_submit(&eb, batch, first, last);
 
 err_request:
-	if (last)
+	if (last || err)
 		set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
 			&eb.request->fence.flags);
 
+	/*
+	 * If the execbuf IOCTL is generating more than 1 request, we hold all
+	 * the requests until the last request has been generated in case any of
+	 * the requests hit an error. If an error is hit the caller is
+	 * responsible for flaging all the requests generated with an error. The
+	 * caller is always responsible for releasing the fence on the first
+	 * request.
+	 */
+	if (intel_context_is_parallel(eb.context) && first)
+		i915_sw_fence_await(&eb.request->submit);
+
 	i915_request_get(eb.request);
 	err = eb_request_add(&eb, err);
 
@@ -3498,7 +3514,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 	struct i915_gem_context *ctx;
 	struct i915_gem_ww_ctx ww;
 	struct intel_context *parent = NULL;
-	unsigned int num_batches = 1, i;
+	unsigned int num_batches = 1, i = 0, j;
 	bool is_parallel = false;
 
 	if (!check_buffer_count(count)) {
@@ -3637,8 +3653,24 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 					     out_fences,
 					     &ww);
 
-	if (is_parallel)
+	if (is_parallel) {
+		/*
+		 * Mark all requests generated with an error if any of the
+		 * requests encountered an error.
+		 */
+		for (j = 0; err && j < i; ++j)
+			if (out_fences[j]) {
+				__i915_request_skip(to_request(out_fences[j]));
+				set_bit(I915_FENCE_FLAG_SKIP_PARALLEL,
+					&out_fences[j]->flags);
+			}
+
+		/* Release fence on first request generated */
+		if (out_fences[0])
+			i915_sw_fence_complete(&to_request(out_fences[0])->submit);
+
 		mutex_unlock(&parent->parallel_submit);
+	}
 
 	/*
 	 * Now that we have begun execution of the batchbuffer, we ignore
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index d6d5bf0a5eb5..7f3f66ddf21b 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -153,6 +153,15 @@ enum {
 	 * tail.
 	 */
 	I915_FENCE_FLAG_SUBMIT_PARALLEL,
+
+	/*
+	 * I915_FENCE_FLAG_SKIP_PARALLEL - request with a context in a
+	 * parent-child relationship (parallel submission, multi-lrc) that
+	 * hit an error while generating requests in the execbuf IOCTL.
+	 * Indicates this request should be skipped as another request in
+	 * submission / relationship encoutered an error.
+	 */
+	I915_FENCE_FLAG_SKIP_PARALLEL,
 };
 
 /**

From patchwork Tue Aug  3 22:29:40 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417573
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1EB4DC4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:25 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id DE01560184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:24 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DE01560184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 978106E94E;
	Tue,  3 Aug 2021 22:12:09 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 625136E90D;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393490"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393490"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512748"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 43/46] drm/i915/guc: Handle errors in multi-lrc requests
Date: Tue,  3 Aug 2021 15:29:40 -0700
Message-Id: <20210803222943.27686-44-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arrises because with multi-lrc
breadcrumbs there is a handshake between the parent and children to make
forwad progress. If all the requests are not present this handshake
doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++++++++++++++++++-
 1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d61c45d1ac2c..cd1893edf43a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4394,8 +4394,8 @@ static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
 }
 
 static u32 *
-emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
-						 u32 *cs)
+__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+						   u32 *cs)
 {
 	struct intel_context *ce = rq->context;
 	u8 i;
@@ -4423,6 +4423,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
 				  get_children_go_addr(ce),
 				  0);
 
+	return cs;
+}
+
+/*
+ * If this true, a submission of multi-lrc requests had an error and the
+ * requests need to be skipped. The front end (execuf IOCTL) should've called
+ * i915_request_skip which squashes the BB but we still need to emit the fini
+ * breadrcrumbs seqno write. At this point we don't know how many of the
+ * requests in the multi-lrc submission were generated so we can't do the
+ * handshake between the parent and children (e.g. if 4 requests should be
+ * generated but 2nd hit an error only 1 would be seen by the GuC backend).
+ * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
+ * has occurred on any of the requests in submission / relationship.
+ */
+static inline bool skip_handshake(struct i915_request *rq)
+{
+	return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
+}
+
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+						 u32 *cs)
+{
+	struct intel_context *ce = rq->context;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	if (unlikely(skip_handshake(rq))) {
+		memset(cs, 0, sizeof(u32) *
+		       (ce->engine->emit_fini_breadcrumb_dw - 6));
+		cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+	} else {
+		cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
+	}
+
 	/* Emit fini breadcrumb */
 	cs = gen8_emit_ggtt_write(cs,
 				  rq->fence.seqno,
@@ -4439,7 +4474,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
 }
 
 static u32 *
-emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs)
+__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+						  u32 *cs)
 {
 	struct intel_context *ce = rq->context;
 
@@ -4465,6 +4501,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs
 	*cs++ = get_children_go_addr(ce->parent);
 	*cs++ = 0;
 
+	return cs;
+}
+
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+						u32 *cs)
+{
+	struct intel_context *ce = rq->context;
+
+	GEM_BUG_ON(!intel_context_is_child(ce));
+
+	if (unlikely(skip_handshake(rq))) {
+		memset(cs, 0, sizeof(u32) *
+		       (ce->engine->emit_fini_breadcrumb_dw - 6));
+		cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+	} else {
+		cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs);
+	}
+
 	/* Emit fini breadcrumb */
 	cs = gen8_emit_ggtt_write(cs,
 				  rq->fence.seqno,

From patchwork Tue Aug  3 22:29:41 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417575
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BD68CC432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:26 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 903B360184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:26 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 903B360184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 1950C6E955;
	Tue,  3 Aug 2021 22:12:10 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 6E13F6E910;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393492"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393492"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:57 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512750"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:56 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 44/46] drm/i915: Enable multi-bb execbuf
Date: Tue,  3 Aug 2021 15:29:41 -0700
Message-Id: <20210803222943.27686-45-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Enable multi-bb execbuf by enabling the set_parallel extension.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 2b0dd3ff4db8..ac886b82796d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -529,9 +529,6 @@ set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
 	struct intel_engine_cs **siblings = NULL;
 	intel_engine_mask_t prev_mask;
 
-	/* Disabling for now */
-	return -ENODEV;
-
 	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
 		return -ENODEV;
 

From patchwork Tue Aug  3 22:29:42 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417571
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ACCF6C432BE
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:23 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 7D5B5601FC
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:23 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7D5B5601FC
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 49AE16E956;
	Tue,  3 Aug 2021 22:12:09 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 7B0766E918;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393493"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393493"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:57 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512752"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:57 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 45/46] drm/i915/execlists: Weak parallel submission support
 for execlists
Date: Tue,  3 Aug 2021 15:29:42 -0700
Message-Id: <20210803222943.27686-46-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Basically doing as little as possible to support this
interface for execlists - basically just passing submit fences between
each request generated and virtual engines are not allowed. This is on
par with what is there for the existing (hopefully soon deprecated)
bonding interface.

As with the GuC implementation the pinning interface laying is broken.
This will get cleaned up once the pre_pin / post_unpin laying violations
get fixed. Rather than try to fix it here, just do what the GuC is doing
in the meantime.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   9 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |   1 -
 .../drm/i915/gt/intel_execlists_submission.c  | 201 +++++++++++++++++-
 3 files changed, 205 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ac886b82796d..b199d59bd2c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -529,9 +529,6 @@ set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
 	struct intel_engine_cs **siblings = NULL;
 	intel_engine_mask_t prev_mask;
 
-	if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
-		return -ENODEV;
-
 	if (get_user(slot, &ext->engine_index))
 		return -EFAULT;
 
@@ -541,6 +538,12 @@ set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
 	if (get_user(num_siblings, &ext->num_siblings))
 		return -EFAULT;
 
+	if (!intel_uc_uses_guc_submission(&i915->gt.uc) && num_siblings != 1) {
+		drm_dbg(&i915->drm, "Only 1 sibling (%d) supported in non-GuC mode\n",
+			num_siblings);
+		return -EINVAL;
+	}
+
 	if (slot >= set->num_engines) {
 		drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
 			slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2c07f5f22c94..8e90a4a0b7b0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -627,7 +627,6 @@ void intel_context_bind_parent_child(struct intel_context *parent,
 	 * Callers responsibility to validate that this function is used
 	 * correctly but we use GEM_BUG_ON here ensure that they do.
 	 */
-	GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
 	GEM_BUG_ON(intel_context_is_pinned(parent));
 	GEM_BUG_ON(intel_context_is_child(parent));
 	GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 769480e026bb..5e0f4983de75 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -927,8 +927,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 
 static bool ctx_single_port_submission(const struct intel_context *ce)
 {
-	return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
-		intel_context_force_single_submission(ce));
+	return intel_context_force_single_submission(ce);
 }
 
 static bool can_merge_ctx(const struct intel_context *prev,
@@ -2602,6 +2601,203 @@ static void execlists_context_cancel_request(struct intel_context *ce,
 				      current->comm);
 }
 
+static int execlists_parent_context_pre_pin(struct intel_context *ce,
+					    struct i915_gem_ww_ctx *ww)
+{
+	struct intel_context *child;
+	int err, i = 0, j = 0;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	for_each_child(ce, child) {
+		err = i915_active_acquire(&child->active);
+		if (unlikely(err))
+			goto unwind_active;
+		++i;
+	}
+
+	for_each_child(ce, child) {
+		err = __execlists_context_pre_pin(child, child->engine, ww);
+		if (unlikely(err))
+			goto unwind_pre_pin;
+		++j;
+	}
+
+	err = __execlists_context_pre_pin(ce, ce->engine, ww);
+	if (unlikely(err))
+		goto unwind_pre_pin;
+
+	return 0;
+
+unwind_pre_pin:
+	for_each_child(ce, child) {
+		if (!j--)
+			break;
+		lrc_post_unpin(child);
+	}
+
+unwind_active:
+	for_each_child(ce, child) {
+		if (!i--)
+			break;
+		i915_active_release(&child->active);
+	}
+
+	return err;
+}
+
+static void execlists_parent_context_post_unpin(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	for_each_child(ce, child)
+		lrc_post_unpin(child);
+	lrc_post_unpin(ce);
+
+	for_each_child(ce, child) {
+		intel_context_get(child);
+		i915_active_release(&child->active);
+		intel_context_put(child);
+	}
+}
+
+static int execlists_parent_context_pin(struct intel_context *ce)
+{
+	int ret, i = 0, j = 0;
+	struct intel_context *child;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	for_each_child(ce, child) {
+		ret = lrc_pin(child, child->engine);
+		if (unlikely(ret))
+			goto unwind_pin;
+		++i;
+	}
+	ret = lrc_pin(ce, ce->engine);
+	if (unlikely(ret))
+		goto unwind_pin;
+
+	return 0;
+
+unwind_pin:
+	for_each_child(ce, child) {
+		if (++j > i)
+			break;
+		lrc_unpin(child);
+	}
+
+	return ret;
+}
+
+static void execlists_parent_context_unpin(struct intel_context *ce)
+{
+	struct intel_context *child;
+
+	GEM_BUG_ON(!intel_context_is_parent(ce));
+
+	for_each_child(ce, child)
+		lrc_unpin(child);
+	lrc_unpin(ce);
+}
+
+static const struct intel_context_ops parent_context_ops = {
+	.flags = COPS_HAS_INFLIGHT,
+
+	.alloc = execlists_context_alloc,
+
+	.cancel_request = execlists_context_cancel_request,
+
+	.pre_pin = execlists_parent_context_pre_pin,
+	.pin = execlists_parent_context_pin,
+	.unpin = execlists_parent_context_unpin,
+	.post_unpin = execlists_parent_context_post_unpin,
+
+	.enter = intel_context_enter_engine,
+	.exit = intel_context_exit_engine,
+
+	.destroy = lrc_destroy,
+};
+
+static const struct intel_context_ops child_context_ops = {
+	.flags = COPS_HAS_INFLIGHT,
+
+	.alloc = execlists_context_alloc,
+
+	.cancel_request = execlists_context_cancel_request,
+
+	.enter = intel_context_enter_engine,
+	.exit = intel_context_exit_engine,
+
+	.destroy = lrc_destroy,
+};
+
+static struct intel_context *
+execlists_create_parallel(struct intel_engine_cs **engines,
+			  unsigned int num_siblings,
+			  unsigned int width)
+{
+	struct intel_engine_cs **siblings = NULL;
+	struct intel_context *parent = NULL, *ce, *err;
+	int i, j;
+	int ret;
+
+	GEM_BUG_ON(num_siblings != 1);
+
+	siblings = kmalloc_array(num_siblings,
+				 sizeof(*siblings),
+				 GFP_KERNEL);
+	if (!siblings)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < width; ++i) {
+		for (j = 0; j < num_siblings; ++j)
+			siblings[j] = engines[i * num_siblings + j];
+
+		ce = intel_context_create(siblings[0]);
+		if (!ce) {
+			err = ERR_PTR(-ENOMEM);
+			goto unwind;
+		}
+
+		if (i == 0) {
+			parent = ce;
+		} else {
+			intel_context_bind_parent_child(parent, ce);
+			ret = intel_context_alloc_state(ce);
+			if (ret) {
+				err = ERR_PTR(ret);
+				goto unwind;
+			}
+		}
+	}
+
+	intel_context_set_nopreempt(parent);
+	intel_context_set_single_submission(parent);
+	for_each_child(parent, ce) {
+		intel_context_set_nopreempt(ce);
+		intel_context_set_single_submission(ce);
+	}
+
+	parent->ops = &parent_context_ops;
+	for_each_child(parent, ce)
+		ce->ops = &child_context_ops;
+
+	kfree(siblings);
+	return parent;
+
+unwind:
+	if (parent) {
+		for_each_child(parent, ce)
+			intel_context_put(ce);
+		intel_context_put(parent);
+	}
+	kfree(siblings);
+	return err;
+}
+
 static const struct intel_context_ops execlists_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
@@ -2620,6 +2816,7 @@ static const struct intel_context_ops execlists_context_ops = {
 	.reset = lrc_reset,
 	.destroy = lrc_destroy,
 
+	.create_parallel = execlists_create_parallel,
 	.create_virtual = execlists_create_virtual,
 };
 

From patchwork Tue Aug  3 22:29:43 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Matthew Brost <matthew.brost@intel.com>
X-Patchwork-Id: 12417591
Return-Path: <SRS0=2S93=M2=lists.freedesktop.org=dri-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 68D1DC4338F
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:42 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 34C2560184
	for <dri-devel@archiver.kernel.org>; Tue,  3 Aug 2021 22:13:42 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 34C2560184
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 115056E96E;
	Tue,  3 Aug 2021 22:12:14 +0000 (UTC)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 92AED6E91B;
 Tue,  3 Aug 2021 22:11:58 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="193393494"
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="193393494"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:57 -0700
X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="511512755"
Received: from dhiatt-server.jf.intel.com ([10.54.81.3])
 by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Aug 2021 15:11:57 -0700
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [PATCH 46/46] drm/i915/guc: Add delay before disabling scheduling on
 contexts
Date: Tue,  3 Aug 2021 15:29:43 -0700
Message-Id: <20210803222943.27686-47-matthew.brost@intel.com>
X-Mailer: git-send-email 2.28.0
In-Reply-To: <20210803222943.27686-1-matthew.brost@intel.com>
References: <20210803222943.27686-1-matthew.brost@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Some workloads use lots of contexts that continually pin / unpin
contexts. With GuC submission an unpin translates to a schedule disable
H2G which puts pressure on both the i915 and GuC. A schedule disable can
also block future requests from being submitted until the operation
completes. None of this is ideal.

Add a configurable, via debugfs, delay period before the schedule
disable is issued. Default delay period is 1 second. The delay period is
skipped if more than 3/4 of the guc_ids are in use.

This patch also updates the selftests to turn off this delay period as
this extra time would likely cause many selftests to fail. Follow up
patches will fix all the selftests and enable the delay period.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 .../i915/gem/selftests/i915_gem_coherency.c   |   2 +-
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  |   2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   2 +-
 .../drm/i915/gem/selftests/i915_gem_object.c  |   2 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h       |   9 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   8 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   7 +
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |  28 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 322 +++++++++++++++++-
 .../i915/gt/uc/selftest_guc_flow_control.c    |  19 +-
 drivers/gpu/drm/i915/i915_selftest.h          |   2 +
 drivers/gpu/drm/i915/i915_trace.h             |  10 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   2 +-
 drivers/gpu/drm/i915/selftests/i915_perf.c    |   2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |   2 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c     |   2 +-
 18 files changed, 405 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index b199d59bd2c4..1553287e5491 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1298,7 +1298,7 @@ static void engines_idle_release(struct i915_gem_context *ctx,
 		int err;
 
 		/* serialises with execbuf */
-		set_bit(CONTEXT_CLOSED_BIT, &ce->flags);
+		intel_context_close(ce);
 		if (!intel_context_pin_if_active(ce))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
index 13b088cc787e..a666d7e610f5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
@@ -434,5 +434,5 @@ int i915_gem_coherency_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_gem_coherency),
 	};
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index ffae7df5e4d7..2c92afa9d608 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -474,5 +474,5 @@ int i915_gem_dmabuf_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_dmabuf_import_same_driver_lmem_smem),
 	};
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index b20f5621f62b..4745c78a48de 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1414,5 +1414,5 @@ int i915_gem_mman_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_mmap_gpu),
 	};
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c
index 740ee8086a27..ae1361c7c4cf 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c
@@ -95,5 +95,5 @@ int i915_gem_object_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_gem_huge),
 	};
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 8e90a4a0b7b0..96643040defd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -472,6 +472,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
+	INIT_LIST_HEAD(&ce->guc_sched_disable_link);
+
 	mutex_init(&ce->parallel_submit);
 	ce->fence_context = dma_fence_context_alloc(1);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index a302599e436a..f4c9036f7f03 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -215,6 +215,15 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce)
 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
 }
 
+static inline void intel_context_close(struct intel_context *ce)
+{
+	set_bit(CONTEXT_CLOSED_BIT, &ce->flags);
+
+	trace_intel_context_close(ce);
+	if (ce->ops->close)
+		ce->ops->close(ce);
+}
+
 static inline bool intel_context_is_closed(const struct intel_context *ce)
 {
 	return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 8af9ace4c052..53f00657a45c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -11,6 +11,7 @@
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/types.h>
+#include <linux/ktime.h>
 
 #include "i915_active_types.h"
 #include "i915_sw_fence.h"
@@ -38,6 +39,7 @@ struct intel_context_ops {
 	int (*alloc)(struct intel_context *ce);
 
 	void (*ban)(struct intel_context *ce, struct i915_request *rq);
+	void (*close)(struct intel_context *ce);
 
 	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww);
 	int (*pin)(struct intel_context *ce);
@@ -203,6 +205,12 @@ struct intel_context {
 	 */
 	struct list_head guc_id_link;
 
+	/*
+	 * GuC schedule disable link / time
+	 */
+	struct list_head guc_sched_disable_link;
+	ktime_t guc_sched_disable_time;
+
 	/* GuC context blocked fence */
 	struct i915_sw_fence guc_blocked;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 30a0f364db8f..90b5b657d411 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -60,6 +60,7 @@ struct intel_guc {
 	struct ida guc_ids;
 	u32 num_guc_ids;
 	u32 max_guc_ids;
+	u32 guc_ids_in_use[GUC_SUBMIT_ENGINE_MAX];
 	unsigned long *guc_ids_bitmap;
 #define MAX_GUC_ID_ORDER	(order_base_2(MAX_ENGINE_INSTANCE + 1))
 	struct list_head guc_id_list_no_ref[MAX_GUC_ID_ORDER + 1];
@@ -69,6 +70,12 @@ struct intel_guc {
 	struct list_head destroyed_contexts;
 	struct intel_gt_pm_unpark_work destroy_worker;
 
+	spinlock_t sched_disable_lock;	/* protects schedule disable list */
+	struct list_head sched_disable_list;
+	struct hrtimer sched_disable_timer;
+#define SCHED_DISABLE_DELAY_NS	1000000000
+	u64 sched_disable_delay_ns;
+
 	bool submission_supported;
 	bool submission_selected;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 7c479c5e7b3a..53a6f3da6cce 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -80,12 +80,40 @@ static int guc_num_id_set(void *data, u64 val)
 }
 DEFINE_SIMPLE_ATTRIBUTE(guc_num_id_fops, guc_num_id_get, guc_num_id_set, "%lld\n");
 
+static int guc_sched_disable_delay_ns_get(void *data, u64 *val)
+{
+	struct intel_guc *guc = data;
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	*val = guc->sched_disable_delay_ns;
+
+	return 0;
+}
+
+static int guc_sched_disable_delay_ns_set(void *data, u64 val)
+{
+	struct intel_guc *guc = data;
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	guc->sched_disable_delay_ns = val;
+
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ns_fops,
+			guc_sched_disable_delay_ns_get,
+			guc_sched_disable_delay_ns_set, "%lld\n");
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
 		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
 		{ "guc_num_id", &guc_num_id_fops, NULL },
+		{ "guc_sched_disable_delay_ns", &guc_sched_disable_delay_ns_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index cd1893edf43a..dc0d6a099bee 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -654,11 +654,15 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
 	return (timeout < 0) ? timeout : 0;
 }
 
+static void sched_disable_contexts_flush(struct intel_guc *guc);
+
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 {
 	if (!intel_uc_uses_guc_submission(&guc_to_gt(guc)->uc))
 		return 0;
 
+	sched_disable_contexts_flush(guc);
+
 	return intel_guc_wait_for_pending_msg(guc,
 					      &guc->outstanding_submission_g2h,
 					      true, timeout);
@@ -1135,6 +1139,7 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
 static void guc_cancel_context_requests(struct intel_context *ce);
 static void guc_blocked_fence_complete(struct intel_context *ce);
+static void sched_disable_context_delete(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
@@ -1160,6 +1165,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		deregister = context_wait_for_deregister_to_register(ce);
 		banned = context_banned(ce);
 		init_sched_state(ce);
+		sched_disable_context_delete(ce);
 
 		if (pending_enable || destroyed || deregister) {
 			atomic_dec(&guc->outstanding_submission_g2h);
@@ -1299,6 +1305,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 
 	intel_gt_park_heartbeats(guc_to_gt(guc));
 	disable_submission(guc);
+	hrtimer_cancel(&guc->sched_disable_timer);
 	guc->interrupts.disable(guc);
 
 	/* Flush IRQ handler */
@@ -1656,6 +1663,8 @@ static void guc_lrcd_reg_fini(struct intel_guc *guc);
 
 static void destroy_worker_func(struct work_struct *w);
 
+static enum hrtimer_restart sched_disable_timer_func(struct hrtimer *hrtimer);
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1687,6 +1696,13 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	INIT_LIST_HEAD(&guc->destroyed_contexts);
 	intel_gt_pm_unpark_work_init(&guc->destroy_worker, destroy_worker_func);
 
+	spin_lock_init(&guc->sched_disable_lock);
+	INIT_LIST_HEAD(&guc->sched_disable_list);
+	hrtimer_init(&guc->sched_disable_timer, CLOCK_MONOTONIC,
+		     HRTIMER_MODE_REL);
+	guc->sched_disable_timer.function = sched_disable_timer_func;
+	guc->sched_disable_delay_ns = SCHED_DISABLE_DELAY_NS;
+
 	return 0;
 }
 
@@ -1852,6 +1868,12 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	if (unlikely(ret < 0))
 		return ret;
 
+	if (intel_context_is_parent(ce))
+		guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_MULTI_LRC] +=
+			order_base_2(ce->guc_number_children + 1);
+	else
+		guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_SINGLE_LRC]++;
+
 	ce->guc_id = ret;
 	return 0;
 }
@@ -1860,13 +1882,18 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
 	GEM_BUG_ON(intel_context_is_child(ce));
 	if (!context_guc_id_invalid(ce)) {
-		if (intel_context_is_parent(ce))
+		if (intel_context_is_parent(ce)) {
+			guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_MULTI_LRC] -=
+				order_base_2(ce->guc_number_children + 1);
 			bitmap_release_region(guc->guc_ids_bitmap, ce->guc_id,
 					      order_base_2(ce->guc_number_children
 							   + 1));
-		else
+		} else {
+			guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_SINGLE_LRC]--;
 			ida_simple_remove(&guc->guc_ids, ce->guc_id);
+		}
 		clr_lrc_desc_registered(guc, ce->guc_id);
+
 		set_context_guc_id_invalid(ce);
 	}
 	if (!list_empty(&ce->guc_id_link))
@@ -1931,9 +1958,13 @@ static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce,
 			 * from another context that has more guc_id that itself.
 			 */
 			if (cn_o2 != ce_o2) {
+				guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_MULTI_LRC] -=
+					order_base_2(cn->guc_number_children + 1);
 				bitmap_release_region(guc->guc_ids_bitmap,
 						      cn->guc_id,
 						      cn_o2);
+				guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_MULTI_LRC] +=
+					order_base_2(ce->guc_number_children + 1);
 				bitmap_allocate_region(guc->guc_ids_bitmap,
 						       ce->guc_id,
 						       ce_o2);
@@ -2538,7 +2569,7 @@ static void guc_context_unpin(struct intel_context *ce)
 	__guc_context_unpin(ce);
 
 	if (likely(!intel_context_is_barrier(ce)))
-		intel_engine_pm_put(ce->engine);
+		intel_engine_pm_put_async(ce->engine);
 }
 
 static void guc_context_post_unpin(struct intel_context *ce)
@@ -2665,11 +2696,11 @@ static void guc_parent_context_unpin(struct intel_context *ce)
 
 	for_each_engine_masked(engine, ce->engine->gt,
 			       ce->engine->mask, tmp)
-		intel_engine_pm_put(engine);
+		intel_engine_pm_put_async(engine);
 	for_each_child(ce, child)
 		for_each_engine_masked(engine, child->engine->gt,
 				       child->engine->mask, tmp)
-			intel_engine_pm_put(engine);
+			intel_engine_pm_put_async(engine);
 }
 
 static void __guc_context_sched_enable(struct intel_guc *guc,
@@ -2788,6 +2819,8 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
 
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
+	sched_disable_context_delete(ce);
+
 	with_intel_runtime_pm(runtime_pm, wakeref)
 		__guc_context_sched_disable(guc, ce, guc_id);
 
@@ -2914,8 +2947,202 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 								     1);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
+
+	sched_disable_context_delete(ce);
+}
+
+#define next_sched_disable_time(guc, now, ce) \
+	(guc->sched_disable_delay_ns - \
+	 (ktime_to_ns(now) - ktime_to_ns(ce->guc_sched_disable_time)))
+static void ____sched_disable_context_delete(struct intel_guc *guc,
+					     struct intel_context *ce)
+{
+	bool is_first;
+
+	lockdep_assert_held(&guc->sched_disable_lock);
+	GEM_BUG_ON(intel_context_is_child(ce));
+	GEM_BUG_ON(list_empty(&ce->guc_sched_disable_link));
+
+	is_first = list_is_first(&ce->guc_sched_disable_link,
+				 &guc->sched_disable_list);
+	list_del_init(&ce->guc_sched_disable_link);
+	if (list_empty(&guc->sched_disable_list)) {
+		hrtimer_try_to_cancel(&guc->sched_disable_timer);
+	} else if (is_first) {
+		struct intel_context *first =
+			list_first_entry(&guc->sched_disable_list,
+					 typeof(*first),
+					 guc_sched_disable_link);
+		u64 next_time = next_sched_disable_time(guc, ktime_get(),
+							first);
+
+		hrtimer_start(&guc->sched_disable_timer,
+			      ns_to_ktime(next_time),
+			      HRTIMER_MODE_REL_PINNED);
+	}
+}
+
+static void __sched_disable_context_delete(struct intel_guc *guc,
+					   struct intel_context *ce)
+{
+	lockdep_assert_held(&guc->sched_disable_lock);
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	if (!list_empty(&ce->guc_sched_disable_link)) {
+		intel_context_sched_disable_unpin(ce);
+		____sched_disable_context_delete(guc, ce);
+	}
+}
+
+static void sched_disable_context_delete(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	if (!list_empty(&ce->guc_sched_disable_link)) {
+		spin_lock_irqsave(&guc->sched_disable_lock, flags);
+		__sched_disable_context_delete(guc, ce);
+		spin_unlock_irqrestore(&guc->sched_disable_lock, flags);
+	}
+}
+
+static void sched_disable_context_add(struct intel_guc *guc,
+				      struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(intel_context_is_child(ce));
+	GEM_BUG_ON(!list_empty(&ce->guc_sched_disable_link));
+
+	ce->guc_sched_disable_time = ktime_get();
+
+	spin_lock_irqsave(&guc->sched_disable_lock, flags);
+	if (list_empty(&guc->sched_disable_list))
+		hrtimer_start(&guc->sched_disable_timer,
+			      ns_to_ktime(guc->sched_disable_delay_ns),
+			      HRTIMER_MODE_REL_PINNED);
+	list_add_tail(&ce->guc_sched_disable_link, &guc->sched_disable_list);
+	spin_unlock_irqrestore(&guc->sched_disable_lock, flags);
+}
+
+static void sched_disable_contexts_flush(struct intel_guc *guc)
+{
+	struct intel_runtime_pm *runtime_pm = &guc_to_gt(guc)->i915->runtime_pm;
+	struct intel_context *ce, *cn;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_disable_lock, flags);
+
+	list_for_each_entry_safe_reverse(ce, cn, &guc->sched_disable_list,
+					 guc_sched_disable_link) {
+		intel_wakeref_t wakeref;
+		bool enabled;
+		u16 guc_id;
+
+		list_del_init(&ce->guc_sched_disable_link);
+
+		spin_lock(&ce->guc_state.lock);
+		enabled = context_enabled(ce);
+		if (unlikely(!enabled || submission_disabled(guc))) {
+			if (enabled)
+				clr_context_enabled(ce);
+			spin_unlock(&ce->guc_state.lock);
+			intel_context_sched_disable_unpin(ce);
+			continue;
+		}
+		if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+			spin_unlock(&ce->guc_state.lock);
+			continue;
+		}
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock(&ce->guc_state.lock);
+
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			__guc_context_sched_disable(guc, ce, guc_id);
+	}
+
+	hrtimer_try_to_cancel(&guc->sched_disable_timer);
+
+	spin_unlock_irqrestore(&guc->sched_disable_lock, flags);
 }
 
+#define should_sched_be_disabled(guc, now, ce) \
+	((ktime_to_ns(now) - ktime_to_ns(ce->guc_sched_disable_time)) > \
+	(guc->sched_disable_delay_ns / 4) * 3)
+static enum hrtimer_restart sched_disable_timer_func(struct hrtimer *hrtimer)
+{
+	struct intel_guc *guc = container_of(hrtimer, struct intel_guc,
+					     sched_disable_timer);
+	struct intel_runtime_pm *runtime_pm = &guc_to_gt(guc)->i915->runtime_pm;
+	struct intel_context *ce, *cn;
+	unsigned long flags;
+	ktime_t now;
+
+	if (list_empty(&guc->sched_disable_list))
+		return HRTIMER_NORESTART;
+
+	now = ktime_get();
+
+	spin_lock_irqsave(&guc->sched_disable_lock, flags);
+
+	list_for_each_entry_safe_reverse(ce, cn, &guc->sched_disable_list,
+					 guc_sched_disable_link) {
+		intel_wakeref_t wakeref;
+		bool enabled;
+		u16 guc_id;
+
+		/*
+		 * If a context has been waiting for 3/4 of its delay or more,
+		 * issue the schedule disable. Using this heuristic allows more
+		 * than 1 context to have its scheduling disabled when this
+		 * timer is run.
+		 */
+		if (!should_sched_be_disabled(guc, now, ce))
+			break;
+
+		list_del_init(&ce->guc_sched_disable_link);
+
+		spin_lock(&ce->guc_state.lock);
+		enabled = context_enabled(ce);
+		if (unlikely(!enabled || submission_disabled(guc))) {
+			if (enabled)
+				clr_context_enabled(ce);
+			spin_unlock(&ce->guc_state.lock);
+			intel_context_sched_disable_unpin(ce);
+			continue;
+		}
+		if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+			spin_unlock(&ce->guc_state.lock);
+			continue;
+		}
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock(&ce->guc_state.lock);
+
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			__guc_context_sched_disable(guc, ce, guc_id);
+	}
+
+	if (!list_empty(&guc->sched_disable_list)) {
+		struct intel_context *first =
+			list_first_entry(&guc->sched_disable_list,
+					 typeof(*first),
+					 guc_sched_disable_link);
+		u64 next_time = next_sched_disable_time(guc, now, first);
+
+		hrtimer_forward(hrtimer, now, ns_to_ktime(next_time));
+		spin_unlock_irqrestore(&guc->sched_disable_lock, flags);
+
+		return HRTIMER_RESTART;
+	} else {
+		spin_unlock_irqrestore(&guc->sched_disable_lock, flags);
+
+		return HRTIMER_NORESTART;
+	}
+}
+
+#define guc_id_pressure(max, in_use)	(in_use > (max / 4) * 3)
 static void guc_context_sched_disable(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
@@ -2924,8 +3151,14 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	intel_wakeref_t wakeref;
 	u16 guc_id;
 	bool enabled;
+	int guc_id_index = intel_context_is_parent(ce) ?
+		GUC_SUBMIT_ENGINE_MULTI_LRC : GUC_SUBMIT_ENGINE_SINGLE_LRC;
+	int max_guc_ids = intel_context_is_parent(ce) ?
+	       NUMBER_MULTI_LRC_GUC_ID(guc) :
+	       guc->num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
 
 	GEM_BUG_ON(intel_context_is_child(ce));
+	GEM_BUG_ON(!list_empty(&ce->guc_sched_disable_link));
 
 	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
@@ -2936,6 +3169,18 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	if (!context_enabled(ce))
 		goto unpin;
 
+	/*
+	 * If no guc_id pressure and the context isn't closed we delay the
+	 * schedule disable to not to continuously disable / enable scheduling
+	 * putting pressure on both the i915 and GuC. Delay is configurable via
+	 * debugfs, default 1s.
+	 */
+	if (!guc_id_pressure(max_guc_ids, guc->guc_ids_in_use[guc_id_index]) &&
+	    !intel_context_is_closed(ce) && guc->sched_disable_delay_ns) {
+		sched_disable_context_add(guc, ce);
+		return;
+	}
+
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 
 	/*
@@ -3294,6 +3539,58 @@ static void remove_from_context(struct i915_request *rq)
 	i915_request_notify_execute_cb_imm(rq);
 }
 
+static void __guc_context_close(struct intel_guc *guc,
+				struct intel_context *ce)
+{
+	lockdep_assert_held(&guc->sched_disable_lock);
+	GEM_BUG_ON(intel_context_is_child(ce));
+
+	if (!list_empty(&ce->guc_sched_disable_link)) {
+		struct intel_runtime_pm *runtime_pm =
+			ce->engine->uncore->rpm;
+		intel_wakeref_t wakeref;
+		bool enabled;
+		u16 guc_id;
+
+		spin_lock(&ce->guc_state.lock);
+		enabled = context_enabled(ce);
+		if (unlikely(!enabled || submission_disabled(guc))) {
+			if (enabled)
+				clr_context_enabled(ce);
+			spin_unlock(&ce->guc_state.lock);
+			intel_context_sched_disable_unpin(ce);
+			goto update_list;
+		}
+		if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+			spin_unlock(&ce->guc_state.lock);
+			goto update_list;
+		}
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock(&ce->guc_state.lock);
+
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			__guc_context_sched_disable(guc, ce, guc_id);
+update_list:
+		____sched_disable_context_delete(guc, ce);
+	}
+}
+
+static void guc_context_close(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
+
+	/*
+	 * If we close the context and a schedule disable is pending a delay, do
+	 * it immediately.
+	 */
+	if (!list_empty(&ce->guc_sched_disable_link)) {
+		spin_lock_irqsave(&guc->sched_disable_lock, flags);
+		__guc_context_close(guc, ce);
+		spin_unlock_irqrestore(&guc->sched_disable_lock, flags);
+	}
+}
+
 static struct intel_context *
 guc_create_parallel(struct intel_engine_cs **engines,
 		    unsigned int num_siblings,
@@ -3308,6 +3605,7 @@ static const struct intel_context_ops guc_context_ops = {
 	.post_unpin = guc_context_post_unpin,
 
 	.ban = guc_context_ban,
+	.close = guc_context_close,
 
 	.cancel_request = guc_context_cancel_request,
 
@@ -3538,6 +3836,10 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	rq->reserved_space -= GUC_REQUEST_SIZE;
 
+	GEM_BUG_ON(!list_empty(&ce->guc_sched_disable_link) &&
+		   atomic_read(&ce->pin_count) < 3);
+	sched_disable_context_delete(ce);
+
 	/*
 	 * guc_ids are exhausted or a heuristic is met indicating too many
 	 * guc_ids are waiting on requests with submission dependencies (not
@@ -3667,7 +3969,7 @@ static void guc_virtual_context_unpin(struct intel_context *ce)
 	__guc_context_unpin(ce);
 
 	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
-		intel_engine_pm_put(engine);
+		intel_engine_pm_put_async(engine);
 }
 
 static void guc_virtual_context_enter(struct intel_context *ce)
@@ -3708,6 +4010,7 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.post_unpin = guc_context_post_unpin,
 
 	.ban = guc_context_ban,
+	.close = guc_context_close,
 
 	.cancel_request = guc_context_cancel_request,
 
@@ -3819,6 +4122,7 @@ static const struct intel_context_ops virtual_parent_context_ops = {
 	.post_unpin = guc_parent_context_post_unpin,
 
 	.ban = guc_context_ban,
+	.close = guc_context_close,
 
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
@@ -4924,7 +5228,11 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
 		   atomic_read(&guc->outstanding_submission_g2h));
 	drm_printf(p, "GuC Number GuC IDs: %d\n", guc->num_guc_ids);
-	drm_printf(p, "GuC Max Number GuC IDs: %d\n\n", guc->max_guc_ids);
+	drm_printf(p, "GuC Max Number GuC IDs: %d\n", guc->max_guc_ids);
+	drm_printf(p, "GuC single-lrc GuC IDs in use: %d\n",
+		   guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_SINGLE_LRC]);
+	drm_printf(p, "GuC multi-lrc GuC IDs in use: %d\n",
+		   guc->guc_ids_in_use[GUC_SUBMIT_ENGINE_MULTI_LRC]);
 	drm_printf(p, "GuC max context registered: %u\n\n",
 		   guc->lrcd_reg.max_idx);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
index 9cfecf9d368e..ad70b3159ce4 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_flow_control.c
@@ -174,7 +174,8 @@ static int multi_lrc_not_blocked(struct intel_gt *gt, bool flow_control)
 #define NUM_RQ_PER_CONTEXT	2
 #define HEARTBEAT_INTERVAL	1500
 
-static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang)
+static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids,
+					bool hang, bool sched_disable_delay)
 {
 	struct intel_gt *gt = arg;
 	struct intel_guc *guc = &gt->uc.guc;
@@ -203,6 +204,9 @@ static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang
 	if (limit_guc_ids)
 		guc->num_guc_ids = NUM_GUC_ID;
 
+	if (sched_disable_delay)
+		guc->sched_disable_delay_ns = SCHED_DISABLE_DELAY_NS / 5;
+
 	ce = intel_context_create(intel_selftest_find_any_engine(gt));
 	if (IS_ERR(ce)) {
 		ret = PTR_ERR(ce);
@@ -391,6 +395,7 @@ static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang
 	guc->num_guc_ids = guc->max_guc_ids;
 	guc->gse_hang_expected = false;
 	guc->inject_bad_sched_disable = false;
+	guc->sched_disable_delay_ns = 0;
 	kfree(contexts);
 
 	return ret;
@@ -398,17 +403,22 @@ static int __intel_guc_flow_control_guc(void *arg, bool limit_guc_ids, bool hang
 
 static int intel_guc_flow_control_guc_ids(void *arg)
 {
-	return __intel_guc_flow_control_guc(arg, true, false);
+	return __intel_guc_flow_control_guc(arg, true, false, false);
+}
+
+static int intel_guc_flow_control_guc_ids_sched_disable_delay(void *arg)
+{
+	return __intel_guc_flow_control_guc(arg, true, false, true);
 }
 
 static int intel_guc_flow_control_lrcd_reg(void *arg)
 {
-	return __intel_guc_flow_control_guc(arg, false, false);
+	return __intel_guc_flow_control_guc(arg, false, false, false);
 }
 
 static int intel_guc_flow_control_hang_state_machine(void *arg)
 {
-	return __intel_guc_flow_control_guc(arg, true, true);
+	return __intel_guc_flow_control_guc(arg, true, true, false);
 }
 
 #define NUM_RQ_STRESS_CTBS	0x4000
@@ -861,6 +871,7 @@ int intel_guc_flow_control(struct drm_i915_private *i915)
 	static const struct i915_subtest tests[] = {
 		SUBTEST(intel_guc_flow_control_stress_ctbs),
 		SUBTEST(intel_guc_flow_control_guc_ids),
+		SUBTEST(intel_guc_flow_control_guc_ids_sched_disable_delay),
 		SUBTEST(intel_guc_flow_control_lrcd_reg),
 		SUBTEST(intel_guc_flow_control_hang_state_machine),
 		SUBTEST(intel_guc_flow_control_multi_lrc_guc_ids),
diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h
index f54de0499be7..bf464db7affe 100644
--- a/drivers/gpu/drm/i915/i915_selftest.h
+++ b/drivers/gpu/drm/i915/i915_selftest.h
@@ -92,12 +92,14 @@ int __i915_subtests(const char *caller,
 			T, ARRAY_SIZE(T), data)
 #define i915_live_subtests(T, data) ({ \
 	typecheck(struct drm_i915_private *, data); \
+	(data)->gt.uc.guc.sched_disable_delay_ns = 0; \
 	__i915_subtests(__func__, \
 			__i915_live_setup, __i915_live_teardown, \
 			T, ARRAY_SIZE(T), data); \
 })
 #define intel_gt_live_subtests(T, data) ({ \
 	typecheck(struct intel_gt *, data); \
+	(data)->uc.guc.sched_disable_delay_ns = 0; \
 	__i915_subtests(__func__, \
 			__intel_gt_live_setup, __intel_gt_live_teardown, \
 			T, ARRAY_SIZE(T), data); \
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 806ad688274b..57ba7065d5ab 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -933,6 +933,11 @@ DEFINE_EVENT(intel_context, intel_context_reset,
 	     TP_ARGS(ce)
 );
 
+DEFINE_EVENT(intel_context, intel_context_close,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_ban,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1035,6 +1040,11 @@ trace_intel_context_reset(struct intel_context *ce)
 {
 }
 
+static inline void
+trace_intel_context_close(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_ban(struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index f843a5040706..d54c280217fe 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -2112,5 +2112,5 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 
 	GEM_BUG_ON(offset_in_page(i915->ggtt.vm.total));
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
index 9e9a6cb1d9e5..86bad00cca95 100644
--- a/drivers/gpu/drm/i915/selftests/i915_perf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
@@ -431,7 +431,7 @@ int i915_perf_live_selftests(struct drm_i915_private *i915)
 	if (err)
 		return err;
 
-	err = i915_subtests(tests, i915);
+	err = i915_live_subtests(tests, i915);
 
 	destroy_empty_config(&i915->perf);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index d67710d10615..afbf88865a8b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1693,7 +1693,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }
 
 static int switch_to_kernel_sync(struct intel_context *ce, int err)
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index dd0607254a95..f4b157451851 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -1085,5 +1085,5 @@ int i915_vma_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_vma_remapped_gtt),
 	};
 
-	return i915_subtests(tests, i915);
+	return i915_live_subtests(tests, i915);
 }