From patchwork Mon Aug 16 13:51:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68DC5C4338F for ; Mon, 16 Aug 2021 13:57:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1C3AD632CF for ; Mon, 16 Aug 2021 13:57:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1C3AD632CF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7A84D89E59; Mon, 16 Aug 2021 13:56:59 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0623B89E36; Mon, 16 Aug 2021 13:56:57 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607057" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607057" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091157" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:18 -0700 Message-Id: <20210816135139.10060-2-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/22] drm/i915/guc: Fix blocked context accounting X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Prior to this patch the blocked context counter was cleared on init_sched_state (used during registering a context & resets) which is incorrect. This state needs to be persistent or the counter can read the incorrect value resulting in scheduling never getting enabled again. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost Reviewed-by: Daniel Vetter Cc: --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 87d8dc8f51b9..69faa39da178 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -152,7 +152,7 @@ static inline void init_sched_state(struct intel_context *ce) { /* Only should be called from guc_lrc_desc_pin() */ atomic_set(&ce->guc_sched_state_no_lock, 0); - ce->guc_state.sched_state = 0; + ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK; } static inline bool From patchwork Mon Aug 16 13:51:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438577 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30A33C4338F for ; Mon, 16 Aug 2021 13:57:24 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 00EFB632DB for ; Mon, 16 Aug 2021 13:57:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 00EFB632DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1AB9489F06; Mon, 16 Aug 2021 13:57:07 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id A21A089E36; Mon, 16 Aug 2021 13:56:58 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607058" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607058" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091158" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:19 -0700 Message-Id: <20210816135139.10060-3-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 02/22] drm/i915/guc: Fix outstanding G2H accounting X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" A small race that could result in incorrect accounting of the number of outstanding G2H. Basically prior to this patch we did not increment the number of outstanding G2H if we encoutered a GT reset while sending a H2G. This was incorrect as the context state had already been updated to anticipate a G2H response thus the counter should be incremented. Fixes: f4eb1f3fe946 ("drm/i915/guc: Ensure G2H response has space in buffer") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 69faa39da178..b5d3972ae164 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -360,11 +360,13 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, { int err; - err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); - - if (!err && g2h_len_dw) + if (g2h_len_dw) atomic_inc(&guc->outstanding_submission_g2h); + err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + if (err == -EBUSY && g2h_len_dw) + atomic_dec(&guc->outstanding_submission_g2h); + return err; } From patchwork Mon Aug 16 13:51:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40F5FC4338F for ; Mon, 16 Aug 2021 13:57:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 13D68632D1 for ; Mon, 16 Aug 2021 13:57:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 13D68632D1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D79389E3B; Mon, 16 Aug 2021 13:57:04 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id EE0E489E3B; Mon, 16 Aug 2021 13:56:58 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607059" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607059" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091161" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:20 -0700 Message-Id: <20210816135139.10060-4-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/22] drm/i915/guc: Unwind context requests in reverse order X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When unwinding requests on a reset context, if other requests in the context are in the priority list the requests could be resubmitted out of seqno order. Traverse the list of active requests in reverse and append to the head of the priority list to fix this. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b5d3972ae164..bc51caba50d0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -799,9 +799,9 @@ __unwind_incomplete_requests(struct intel_context *ce) spin_lock_irqsave(&sched_engine->lock, flags); spin_lock(&ce->guc_active.lock); - list_for_each_entry_safe(rq, rn, - &ce->guc_active.requests, - sched.link) { + list_for_each_entry_safe_reverse(rq, rn, + &ce->guc_active.requests, + sched.link) { if (i915_request_completed(rq)) continue; @@ -818,7 +818,7 @@ __unwind_incomplete_requests(struct intel_context *ce) } GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); - list_add_tail(&rq->sched.link, pl); + list_add(&rq->sched.link, pl); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); spin_lock(&ce->guc_active.lock); From patchwork Mon Aug 16 13:51:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60A78C4338F for ; Mon, 16 Aug 2021 13:57:20 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2E2C7632D1 for ; Mon, 16 Aug 2021 13:57:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2E2C7632D1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2875889F08; Mon, 16 Aug 2021 13:57:06 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id F198B89E3F; Mon, 16 Aug 2021 13:56:58 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607061" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607061" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091163" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:57 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:21 -0700 Message-Id: <20210816135139.10060-5-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/22] drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Don't drop ce->guc_active.lock when unwinding a context after reset. At one point we had to drop this because of a lock inversion but that is no longer the case. It is much safer to hold the lock so let's do that. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index bc51caba50d0..3cd2da6f5c03 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -806,8 +806,6 @@ __unwind_incomplete_requests(struct intel_context *ce) continue; list_del_init(&rq->sched.link); - spin_unlock(&ce->guc_active.lock); - __i915_request_unsubmit(rq); /* Push the request back into the queue for later resubmission. */ @@ -820,8 +818,6 @@ __unwind_incomplete_requests(struct intel_context *ce) list_add(&rq->sched.link, pl); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - - spin_lock(&ce->guc_active.lock); } spin_unlock(&ce->guc_active.lock); spin_unlock_irqrestore(&sched_engine->lock, flags); From patchwork Mon Aug 16 13:51:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58759C432BE for ; Mon, 16 Aug 2021 13:57:28 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2E1F7632D6 for ; Mon, 16 Aug 2021 13:57:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2E1F7632D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EE40F89E3F; Mon, 16 Aug 2021 13:57:11 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5055E89E36; Mon, 16 Aug 2021 13:56:59 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607062" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607062" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091164" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:22 -0700 Message-Id: <20210816135139.10060-6-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 05/22] drm/i915/guc: Workaround reset G2H is received after schedule done G2H X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If the context is reset as a result of the request cancelation the context reset G2H is received after schedule disable done G2H which is likely the wrong order. The schedule disable done G2H release the waiting request cancelation code which resubmits the context. This races with the context reset G2H which also wants to resubmit the context but in this case it really should be a NOP as request cancelation code owns the resubmit. Use some clever tricks of checking the context state to seal this race until if / when the GuC firmware is fixed. v2: (Checkpatch) - Fix typos Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost Cc: --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 ++++++++++++++++--- 1 file changed, 37 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 3cd2da6f5c03..c3b7bf7319dd 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -826,17 +826,35 @@ __unwind_incomplete_requests(struct intel_context *ce) static void __guc_reset_context(struct intel_context *ce, bool stalled) { struct i915_request *rq; + unsigned long flags; u32 head; + bool skip = false; intel_context_get(ce); /* - * GuC will implicitly mark the context as non-schedulable - * when it sends the reset notification. Make sure our state - * reflects this change. The context will be marked enabled - * on resubmission. + * GuC will implicitly mark the context as non-schedulable when it sends + * the reset notification. Make sure our state reflects this change. The + * context will be marked enabled on resubmission. + * + * XXX: If the context is reset as a result of the request cancellation + * this G2H is received after the schedule disable complete G2H which is + * likely wrong as this creates a race between the request cancellation + * code re-submitting the context and this G2H handler. This likely + * should be fixed in the GuC but until if / when that gets fixed we + * need to workaround this. Convert this function to a NOP if a pending + * enable is in flight as this indicates that a request cancellation has + * occurred. */ - clr_context_enabled(ce); + spin_lock_irqsave(&ce->guc_state.lock, flags); + if (likely(!context_pending_enable(ce))) { + clr_context_enabled(ce); + } else { + skip = true; + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (unlikely(skip)) + goto out_put; rq = intel_context_find_active_request(ce); if (!rq) { @@ -855,6 +873,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled) out_replay: guc_reset_state(ce, head, stalled); __unwind_incomplete_requests(ce); +out_put: intel_context_put(ce); } @@ -1599,6 +1618,13 @@ static void guc_context_cancel_request(struct intel_context *ce, guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head), true); } + + /* + * XXX: Racey if context is reset, see comment in + * __guc_reset_context(). + */ + flush_work(&ce_to_guc(ce)->ct.requests.worker); + guc_context_unblock(ce); } } @@ -2719,7 +2745,12 @@ static void guc_handle_context_reset(struct intel_guc *guc, { trace_intel_context_reset(ce); - if (likely(!intel_context_is_banned(ce))) { + /* + * XXX: Racey if request cancellation has occurred, see comment in + * __guc_reset_context(). + */ + if (likely(!intel_context_is_banned(ce) && + !context_blocked(ce))) { capture_error_state(guc, ce); guc_context_replay(ce); } From patchwork Mon Aug 16 13:51:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8D62C4320A for ; Mon, 16 Aug 2021 13:57:26 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 80A49632DB for ; Mon, 16 Aug 2021 13:57:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 80A49632DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9DD2089EEB; Mon, 16 Aug 2021 13:57:08 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5058D89E50; Mon, 16 Aug 2021 13:56:59 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607064" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607064" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091166" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:23 -0700 Message-Id: <20210816135139.10060-7-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 06/22] drm/i915/execlists: Do not propagate errors to dependent fences X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Progagating errors to dependent fences is wrong, don't do it. Selftest in following patch exposes this bug. Fixes: 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status to children on unhold") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de5f9c86b9a4..cafb0608ffb4 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq) if (p->flags & I915_DEPENDENCY_WEAK) continue; - /* Propagate any change in error status */ - if (rq->fence.error) - i915_request_set_error_once(w, rq->fence.error); - if (w->engine != rq->engine) continue; From patchwork Mon Aug 16 13:51:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FD0DC4338F for ; Mon, 16 Aug 2021 13:57:29 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EFFEB632DB for ; Mon, 16 Aug 2021 13:57:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EFFEB632DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1BE0589EFF; Mon, 16 Aug 2021 13:57:13 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id BFAE889E3B; Mon, 16 Aug 2021 13:56:59 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607065" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607065" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091167" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:24 -0700 Message-Id: <20210816135139.10060-8-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 07/22] drm/i915/selftests: Add a cancel request selftest that triggers a reset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Add a cancel request selftest that results in an engine reset to cancel the request as it is non-preemptable. Also insert a NOP request after the cancelled request and confirm that it completely successfully. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/selftests/i915_request.c | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index d67710d10615..e2c5db77f087 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -772,6 +772,98 @@ static int __cancel_completed(struct intel_engine_cs *engine) return err; } +static int __cancel_reset(struct intel_engine_cs *engine) +{ + struct intel_context *ce; + struct igt_spinner spin; + struct i915_request *rq, *nop; + unsigned long preempt_timeout_ms; + int err = 0; + + preempt_timeout_ms = engine->props.preempt_timeout_ms; + engine->props.preempt_timeout_ms = 100; + + if (igt_spinner_init(&spin, engine->gt)) + goto out_restore; + + ce = intel_context_create(engine); + if (IS_ERR(ce)) { + err = PTR_ERR(ce); + goto out_spin; + } + + rq = igt_spinner_create_request(&spin, ce, MI_NOOP); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out_ce; + } + + pr_debug("%s: Cancelling active request\n", engine->name); + i915_request_get(rq); + i915_request_add(rq); + if (!igt_wait_for_spinner(&spin, rq)) { + struct drm_printer p = drm_info_printer(engine->i915->drm.dev); + + pr_err("Failed to start spinner on %s\n", engine->name); + intel_engine_dump(engine, &p, "%s\n", engine->name); + err = -ETIME; + goto out_rq; + } + + nop = intel_context_create_request(ce); + if (IS_ERR(nop)) + goto out_nop; + i915_request_get(nop); + i915_request_add(nop); + + i915_request_cancel(rq, -EINTR); + + if (i915_request_wait(rq, 0, HZ) < 0) { + struct drm_printer p = drm_info_printer(engine->i915->drm.dev); + + pr_err("%s: Failed to cancel hung request\n", engine->name); + intel_engine_dump(engine, &p, "%s\n", engine->name); + err = -ETIME; + goto out_nop; + } + + if (rq->fence.error != -EINTR) { + pr_err("%s: fence not cancelled (%u)\n", + engine->name, rq->fence.error); + err = -EINVAL; + goto out_nop; + } + + if (i915_request_wait(nop, 0, HZ) < 0) { + struct drm_printer p = drm_info_printer(engine->i915->drm.dev); + + pr_err("%s: Failed to complete nop request\n", engine->name); + intel_engine_dump(engine, &p, "%s\n", engine->name); + err = -ETIME; + goto out_nop; + } + + if (nop->fence.error != 0) { + pr_err("%s: Nop request errored (%u)\n", + engine->name, nop->fence.error); + err = -EINVAL; + } + +out_nop: + i915_request_put(nop); +out_rq: + i915_request_put(rq); +out_ce: + intel_context_put(ce); +out_spin: + igt_spinner_fini(&spin); +out_restore: + engine->props.preempt_timeout_ms = preempt_timeout_ms; + if (err) + pr_err("%s: %s error %d\n", __func__, engine->name, err); + return err; +} + static int live_cancel_request(void *arg) { struct drm_i915_private *i915 = arg; @@ -804,6 +896,14 @@ static int live_cancel_request(void *arg) return err; if (err2) return err2; + + /* Expects reset so call outside of igt_live_test_* */ + err = __cancel_reset(engine); + if (err) + return err; + + if (igt_flush_test(i915)) + return -EIO; } return 0; From patchwork Mon Aug 16 13:51:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FEF8C4338F for ; Mon, 16 Aug 2021 13:57:37 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6581C632D6 for ; Mon, 16 Aug 2021 13:57:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6581C632D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9C7C289F43; Mon, 16 Aug 2021 13:57:16 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id D106989E3F; Mon, 16 Aug 2021 13:56:59 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607067" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607067" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091170" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:25 -0700 Message-Id: <20210816135139.10060-9-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/22] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When unblocking a context, do not enable scheduling if the context is banned, guc_id invalid, or not registered. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c3b7bf7319dd..353899634fa8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1579,6 +1579,9 @@ static void guc_context_unblock(struct intel_context *ce) spin_lock_irqsave(&ce->guc_state.lock, flags); if (unlikely(submission_disabled(guc) || + intel_context_is_banned(ce) || + context_guc_id_invalid(ce) || + !lrc_desc_registered(guc, ce->guc_id) || !intel_context_is_pinned(ce) || context_pending_disable(ce) || context_blocked(ce) > 1)) { From patchwork Mon Aug 16 13:51:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05CE3C432BE for ; Mon, 16 Aug 2021 13:57:35 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C9553632DF for ; Mon, 16 Aug 2021 13:57:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C9553632DF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1F04F89F33; Mon, 16 Aug 2021 13:57:14 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 73BE989E7B; Mon, 16 Aug 2021 13:57:00 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607068" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607068" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091172" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:26 -0700 Message-Id: <20210816135139.10060-10-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/22] drm/i915/selftests: Fix memory corruption in live_lrc_isolation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" GuC submission has exposed an existing memory corruption in live_lrc_isolation. We believe that some writes to the watchdog offsets in the LRC (0x178 & 0x17c) can result in trashing of portions of the address space. With GuC submission there are additional objects which can move the context redzone into the space that is trashed. To workaround this avoid poisoning the watchdog. v2: (Daniel Vetter) - Add VLK ref in code to workaround Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/selftest_lrc.c | 29 +++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index b0977a3b699b..cdc6ae48a1e1 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1074,6 +1074,32 @@ record_registers(struct intel_context *ce, goto err_after; } +static u32 safe_offset(u32 offset, u32 reg) +{ + /* XXX skip testing of watchdog - VLK-22772 */ + if (offset == 0x178 || offset == 0x17c) + reg = 0; + + return reg; +} + +static int get_offset_mask(struct intel_engine_cs *engine) +{ + if (GRAPHICS_VER(engine->i915) < 12) + return 0xfff; + + switch (engine->class) { + default: + case RENDER_CLASS: + return 0x07ff; + case COPY_ENGINE_CLASS: + return 0x0fff; + case VIDEO_DECODE_CLASS: + case VIDEO_ENHANCEMENT_CLASS: + return 0x3fff; + } +} + static struct i915_vma *load_context(struct intel_context *ce, u32 poison) { struct i915_vma *batch; @@ -1117,7 +1143,8 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison) len = (len + 1) / 2; *cs++ = MI_LOAD_REGISTER_IMM(len); while (len--) { - *cs++ = hw[dw]; + *cs++ = safe_offset(hw[dw] & get_offset_mask(ce->engine), + hw[dw]); *cs++ = poison; dw += 2; } From patchwork Mon Aug 16 13:51:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F897C4338F for ; Mon, 16 Aug 2021 13:57:26 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EFD04632DB for ; Mon, 16 Aug 2021 13:57:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EFD04632DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 64FF089EB4; Mon, 16 Aug 2021 13:57:08 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id CEEFB89E9E; Mon, 16 Aug 2021 13:57:01 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607070" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607070" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091173" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:27 -0700 Message-Id: <20210816135139.10060-11-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/22] drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" While debugging an issue with full GT resets I went down a rabbit hole thinking the scrubbing of lost G2H wasn't working correctly. This proved to be incorrect as this was working just fine but this chase inspired me to write a selftest to prove that this works. This simple selftest injects errors dropping various G2H and then issues a full GT reset proving that the scrubbing of these G2H doesn't blow up. v2: (Daniel Vetter) - Use ifdef instead of macros for selftests v3: (Checkpatch) - A space after 'switch' statement Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context_types.h | 18 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 25 ++++ drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 126 ++++++++++++++++++ .../drm/i915/selftests/i915_live_selftests.h | 1 + .../i915/selftests/intel_scheduler_helpers.c | 12 ++ .../i915/selftests/intel_scheduler_helpers.h | 2 + 6 files changed, 184 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc.c diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index e54351a170e2..3a73f3117873 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -198,6 +198,24 @@ struct intel_context { */ u8 guc_prio; u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM]; + +#ifdef CONFIG_DRM_I915_SELFTEST + /** + * @drop_schedule_enable: Force drop of schedule enable G2H for selftest + */ + bool drop_schedule_enable; + + /** + * @drop_schedule_disable: Force drop of schedule disable G2H for + * selftest + */ + bool drop_schedule_disable; + + /** + * @drop_deregister: Force drop of deregister G2H for selftest + */ + bool drop_deregister; +#endif }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 353899634fa8..bffd0199dc15 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2634,6 +2634,13 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, trace_intel_context_deregister_done(ce); +#ifdef CONFIG_DRM_I915_SELFTEST + if (unlikely(ce->drop_deregister)) { + ce->drop_deregister = false; + return 0; + } +#endif + if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -2688,10 +2695,24 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, trace_intel_context_sched_done(ce); if (context_pending_enable(ce)) { +#ifdef CONFIG_DRM_I915_SELFTEST + if (unlikely(ce->drop_schedule_enable)) { + ce->drop_schedule_enable = false; + return 0; + } +#endif + clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { bool banned; +#ifdef CONFIG_DRM_I915_SELFTEST + if (unlikely(ce->drop_schedule_disable)) { + ce->drop_schedule_disable = false; + return 0; + } +#endif + /* * Unpin must be done before __guc_signal_context_fence, * otherwise a race exists between the requests getting @@ -3068,3 +3089,7 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve) return false; } + +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +#include "selftest_guc.c" +#endif diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c new file mode 100644 index 000000000000..264e2f705c17 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright �� 2021 Intel Corporation + */ + +#include "selftests/intel_scheduler_helpers.h" + +static struct i915_request *nop_user_request(struct intel_context *ce, + struct i915_request *from) +{ + struct i915_request *rq; + int ret; + + rq = intel_context_create_request(ce); + if (IS_ERR(rq)) + return rq; + + if (from) { + ret = i915_sw_fence_await_dma_fence(&rq->submit, + &from->fence, 0, + I915_FENCE_GFP); + if (ret < 0) { + i915_request_put(rq); + return ERR_PTR(ret); + } + } + + i915_request_get(rq); + i915_request_add(rq); + + return rq; +} + +static int intel_guc_scrub_ctbs(void *arg) +{ + struct intel_gt *gt = arg; + int ret = 0; + int i; + struct i915_request *last[3] = {NULL, NULL, NULL}, *rq; + intel_wakeref_t wakeref; + struct intel_engine_cs *engine; + struct intel_context *ce; + + wakeref = intel_runtime_pm_get(gt->uncore->rpm); + engine = intel_selftest_find_any_engine(gt); + + /* Submit requests and inject errors forcing G2H to be dropped */ + for (i = 0; i < 3; ++i) { + ce = intel_context_create(engine); + if (IS_ERR(ce)) { + ret = PTR_ERR(ce); + pr_err("Failed to create context, %d: %d\n", i, ret); + goto err; + } + + switch (i) { + case 0: + ce->drop_schedule_enable = true; + break; + case 1: + ce->drop_schedule_disable = true; + break; + case 2: + ce->drop_deregister = true; + break; + } + + rq = nop_user_request(ce, NULL); + intel_context_put(ce); + + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + pr_err("Failed to create request, %d: %d\n", i, ret); + goto err; + } + + last[i] = rq; + } + + for (i = 0; i < 3; ++i) { + ret = i915_request_wait(last[i], 0, HZ); + if (ret < 0) { + pr_err("Last request failed to complete: %d\n", ret); + goto err; + } + i915_request_put(last[i]); + last[i] = NULL; + } + + /* Force all H2G / G2H to be submitted / processed */ + intel_gt_retire_requests(gt); + msleep(500); + + /* Scrub missing G2H */ + intel_gt_handle_error(engine->gt, -1, 0, "selftest reset"); + + ret = intel_gt_wait_for_idle(gt, HZ); + if (ret < 0) { + pr_err("GT failed to idle: %d\n", ret); + goto err; + } + +err: + for (i = 0; i < 3; ++i) + if (last[i]) + i915_request_put(last[i]); + intel_runtime_pm_put(gt->uncore->rpm, wakeref); + + return ret; +} + +int intel_guc_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(intel_guc_scrub_ctbs), + }; + struct intel_gt *gt = &i915->gt; + + if (intel_gt_is_wedged(gt)) + return 0; + + if (!intel_uc_uses_guc_submission(>->uc)) + return 0; + + return intel_gt_live_subtests(tests, gt); +} diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h index cfa5c4165a4f..3cf6758931f9 100644 --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h @@ -47,5 +47,6 @@ selftest(execlists, intel_execlists_live_selftests) selftest(ring_submission, intel_ring_submission_live_selftests) selftest(perf, i915_perf_live_selftests) selftest(slpc, intel_slpc_live_selftests) +selftest(guc, intel_guc_live_selftests) /* Here be dragons: keep last to run last! */ selftest(late_gt_pm, intel_gt_pm_late_selftests) diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c index 4b328346b48a..310fb83c527e 100644 --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c @@ -14,6 +14,18 @@ #define REDUCED_PREEMPT 10 #define WAIT_FOR_RESET_TIME 10000 +struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + return engine; + + pr_err("No valid engine found!\n"); + return NULL; +} + int intel_selftest_modify_policy(struct intel_engine_cs *engine, struct intel_selftest_saved_policy *saved, u32 modify_type) diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h index 35c098601ac0..ae60bb507f45 100644 --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h @@ -10,6 +10,7 @@ struct i915_request; struct intel_engine_cs; +struct intel_gt; struct intel_selftest_saved_policy { u32 flags; @@ -23,6 +24,7 @@ enum selftest_scheduler_modify { SELFTEST_SCHEDULER_MODIFY_FAST_RESET, }; +struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt); int intel_selftest_modify_policy(struct intel_engine_cs *engine, struct intel_selftest_saved_policy *saved, enum selftest_scheduler_modify modify_type); From patchwork Mon Aug 16 13:51:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7527C4338F for ; Mon, 16 Aug 2021 13:57:49 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7398E632E0 for ; Mon, 16 Aug 2021 13:57:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7398E632E0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7CBE589F55; Mon, 16 Aug 2021 13:57:22 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8A28589F01; Mon, 16 Aug 2021 13:57:02 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607073" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607073" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091175" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:58 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:28 -0700 Message-Id: <20210816135139.10060-12-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 11/22] drm/i915/guc: Take context ref when cancelling request X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" A context can get destroyed after cancelling a request so take a reference to context when cancelling a request. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index bffd0199dc15..89126be26786 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1613,8 +1613,10 @@ static void guc_context_cancel_request(struct intel_context *ce, struct i915_request *rq) { if (i915_sw_fence_signaled(&rq->submit)) { - struct i915_sw_fence *fence = guc_context_block(ce); + struct i915_sw_fence *fence; + intel_context_get(ce); + fence = guc_context_block(ce); i915_sw_fence_wait(fence); if (!i915_request_completed(rq)) { __i915_request_skip(rq); @@ -1629,6 +1631,7 @@ static void guc_context_cancel_request(struct intel_context *ce, flush_work(&ce_to_guc(ce)->ct.requests.worker); guc_context_unblock(ce); + intel_context_put(ce); } } From patchwork Mon Aug 16 13:51:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E375C432BE for ; Mon, 16 Aug 2021 13:57:57 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4F656632F1 for ; Mon, 16 Aug 2021 13:57:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4F656632F1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E1B6889F85; Mon, 16 Aug 2021 13:57:24 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 89C0E89EFF; Mon, 16 Aug 2021 13:57:02 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607075" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607075" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091176" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:29 -0700 Message-Id: <20210816135139.10060-13-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 12/22] drm/i915/guc: Don't touch guc_state.sched_state without a lock X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Before we did some clever tricks to not use the a lock when touching guc_state.sched_state in certain cases. Don't do that, enforce the use of the lock. Part of this is removing a dead code path from guc_lrc_desc_pin where a context could be deregistered when the aforementioned function was called from the submission path. Remove this dead code and add a GEM_BUG_ON if this path is ever attempted to be used. Signed-off-by: Matthew Brost Reported-by: kernel test robot --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 57 ++++++++++--------- 1 file changed, 31 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89126be26786..8d45585773f3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -150,11 +150,22 @@ static inline void clr_context_registered(struct intel_context *ce) #define SCHED_STATE_BLOCKED_MASK (0xfff << SCHED_STATE_BLOCKED_SHIFT) static inline void init_sched_state(struct intel_context *ce) { - /* Only should be called from guc_lrc_desc_pin() */ + lockdep_assert_held(&ce->guc_state.lock); atomic_set(&ce->guc_sched_state_no_lock, 0); ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK; } +static inline bool sched_state_is_init(struct intel_context *ce) +{ + /* + * XXX: Kernel contexts can have SCHED_STATE_NO_LOCK_REGISTERED after + * suspend. + */ + return !(atomic_read(&ce->guc_sched_state_no_lock) & + ~SCHED_STATE_NO_LOCK_REGISTERED) && + !(ce->guc_state.sched_state &= ~SCHED_STATE_BLOCKED_MASK); +} + static inline bool context_wait_for_deregister_to_register(struct intel_context *ce) { @@ -165,7 +176,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) { - /* Only should be called from guc_lrc_desc_pin() without lock */ + lockdep_assert_held(&ce->guc_state.lock); ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; } @@ -599,9 +610,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) bool pending_disable, pending_enable, deregister, destroyed, banned; xa_for_each(&guc->context_lookup, index, ce) { - /* Flush context */ spin_lock_irqsave(&ce->guc_state.lock, flags); - spin_unlock_irqrestore(&ce->guc_state.lock, flags); /* * Once we are at this point submission_disabled() is guaranteed @@ -617,6 +626,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) banned = context_banned(ce); init_sched_state(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (pending_enable || destroyed || deregister) { atomic_dec(&guc->outstanding_submission_g2h); if (deregister) @@ -1318,6 +1329,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) int ret = 0; GEM_BUG_ON(!engine->mask); + GEM_BUG_ON(!sched_state_is_init(ce)); /* * Ensure LRC + CT vmas are is same region as write barrier is done @@ -1346,7 +1358,6 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) desc->priority = ce->guc_prio; desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD; guc_context_policy_init(engine, desc); - init_sched_state(ce); /* * The context_lookup xarray is used to determine if the hardware @@ -1357,26 +1368,23 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) * registering this context. */ if (context_registered) { + bool disabled; + unsigned long flags; + trace_intel_context_steal_guc_id(ce); - if (!loop) { + GEM_BUG_ON(!loop); + + /* Seal race with Reset */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + disabled = submission_disabled(guc); + if (likely(!disabled)) { set_context_wait_for_deregister_to_register(ce); intel_context_get(ce); - } else { - bool disabled; - unsigned long flags; - - /* Seal race with Reset */ - spin_lock_irqsave(&ce->guc_state.lock, flags); - disabled = submission_disabled(guc); - if (likely(!disabled)) { - set_context_wait_for_deregister_to_register(ce); - intel_context_get(ce); - } - spin_unlock_irqrestore(&ce->guc_state.lock, flags); - if (unlikely(disabled)) { - reset_lrc_desc(guc, desc_idx); - return 0; /* Will get registered later */ - } + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (unlikely(disabled)) { + reset_lrc_desc(guc, desc_idx); + return 0; /* Will get registered later */ } /* @@ -1385,10 +1393,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) */ with_intel_runtime_pm(runtime_pm, wakeref) ret = deregister_context(ce, ce->guc_id, loop); - if (unlikely(ret == -EBUSY)) { - clr_context_wait_for_deregister_to_register(ce); - intel_context_put(ce); - } else if (unlikely(ret == -ENODEV)) { + if (unlikely(ret == -ENODEV)) { ret = 0; /* Will get registered later */ } } else { From patchwork Mon Aug 16 13:51:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15050C4338F for ; Mon, 16 Aug 2021 13:58:03 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D98E8632F2 for ; Mon, 16 Aug 2021 13:58:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D98E8632F2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A571889FC8; Mon, 16 Aug 2021 13:57:38 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 00A9E89EEB; Mon, 16 Aug 2021 13:57:02 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607077" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607077" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091179" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:30 -0700 Message-Id: <20210816135139.10060-14-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 13/22] drm/i915/guc: Reset LRC descriptor if register returns -ENODEV X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Reset LRC descriptor if a context register returns -ENODEV as this means we are mid-reset. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 8d45585773f3..3a01743e09ea 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1399,10 +1399,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) } else { with_intel_runtime_pm(runtime_pm, wakeref) ret = register_context(ce, loop); - if (unlikely(ret == -EBUSY)) + if (unlikely(ret == -EBUSY)) { + reset_lrc_desc(guc, desc_idx); + } else if (unlikely(ret == -ENODEV)) { reset_lrc_desc(guc, desc_idx); - else if (unlikely(ret == -ENODEV)) ret = 0; /* Will get registered later */ + } } return ret; From patchwork Mon Aug 16 13:51:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36EC7C4338F for ; Mon, 16 Aug 2021 13:57:34 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 08FCD632E0 for ; Mon, 16 Aug 2021 13:57:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 08FCD632E0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 09FD689F2E; Mon, 16 Aug 2021 13:57:14 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0A15589F06; Mon, 16 Aug 2021 13:57:03 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607078" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607078" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091180" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:31 -0700 Message-Id: <20210816135139.10060-15-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 14/22] drm/i915: Allocate error capture in atomic context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Error captures can now be done in a work queue processing G2H messages. These messages need to be completely done being processed in the reset path, to avoid races in the missing G2H cleanup, which create a dependency on memory allocations and dma fences (i915_requests). Requests depend on resets, thus now we have a circular dependency. To work around this, allocate the error capture in an atomic context. Fixes: dc0dad365c5e ("Fix for error capture after full GPU reset with GuC") Fixes: 573ba126aef3 ("Capture error state on context reset") Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/i915_gpu_error.c | 37 +++++++++++++-------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0f08bcfbe964..453376aa6d9f 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -49,7 +49,6 @@ #include "i915_memcpy.h" #include "i915_scatterlist.h" -#define ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) #define ATOMIC_MAYFAIL (GFP_ATOMIC | __GFP_NOWARN) static void __sg_set_buf(struct scatterlist *sg, @@ -79,7 +78,7 @@ static bool __i915_error_grow(struct drm_i915_error_state_buf *e, size_t len) if (e->cur == e->end) { struct scatterlist *sgl; - sgl = (typeof(sgl))__get_free_page(ALLOW_FAIL); + sgl = (typeof(sgl))__get_free_page(ATOMIC_MAYFAIL); if (!sgl) { e->err = -ENOMEM; return false; @@ -99,10 +98,10 @@ static bool __i915_error_grow(struct drm_i915_error_state_buf *e, size_t len) } e->size = ALIGN(len + 1, SZ_64K); - e->buf = kmalloc(e->size, ALLOW_FAIL); + e->buf = kmalloc(e->size, ATOMIC_MAYFAIL); if (!e->buf) { e->size = PAGE_ALIGN(len + 1); - e->buf = kmalloc(e->size, GFP_KERNEL); + e->buf = kmalloc(e->size, ATOMIC_MAYFAIL); } if (!e->buf) { e->err = -ENOMEM; @@ -243,12 +242,12 @@ static bool compress_init(struct i915_vma_compress *c) { struct z_stream_s *zstream = &c->zstream; - if (pool_init(&c->pool, ALLOW_FAIL)) + if (pool_init(&c->pool, ATOMIC_MAYFAIL)) return false; zstream->workspace = kmalloc(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL), - ALLOW_FAIL); + ATOMIC_MAYFAIL); if (!zstream->workspace) { pool_fini(&c->pool); return false; @@ -256,7 +255,7 @@ static bool compress_init(struct i915_vma_compress *c) c->tmp = NULL; if (i915_has_memcpy_from_wc()) - c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + c->tmp = pool_alloc(&c->pool, ATOMIC_MAYFAIL); return true; } @@ -280,7 +279,7 @@ static void *compress_next_page(struct i915_vma_compress *c, if (dst->page_count >= dst->num_pages) return ERR_PTR(-ENOSPC); - page = pool_alloc(&c->pool, ALLOW_FAIL); + page = pool_alloc(&c->pool, ATOMIC_MAYFAIL); if (!page) return ERR_PTR(-ENOMEM); @@ -376,7 +375,7 @@ struct i915_vma_compress { static bool compress_init(struct i915_vma_compress *c) { - return pool_init(&c->pool, ALLOW_FAIL) == 0; + return pool_init(&c->pool, ATOMIC_MAYFAIL) == 0; } static bool compress_start(struct i915_vma_compress *c) @@ -391,7 +390,7 @@ static int compress_page(struct i915_vma_compress *c, { void *ptr; - ptr = pool_alloc(&c->pool, ALLOW_FAIL); + ptr = pool_alloc(&c->pool, ATOMIC_MAYFAIL); if (!ptr) return -ENOMEM; @@ -997,7 +996,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, num_pages = min_t(u64, vma->size, vma->obj->base.size) >> PAGE_SHIFT; num_pages = DIV_ROUND_UP(10 * num_pages, 8); /* worstcase zlib growth */ - dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), ALLOW_FAIL); + dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), ATOMIC_MAYFAIL); if (!dst) return NULL; @@ -1433,7 +1432,7 @@ capture_engine(struct intel_engine_cs *engine, struct i915_request *rq = NULL; unsigned long flags; - ee = intel_engine_coredump_alloc(engine, GFP_KERNEL); + ee = intel_engine_coredump_alloc(engine, ATOMIC_MAYFAIL); if (!ee) return NULL; @@ -1481,7 +1480,7 @@ gt_record_engines(struct intel_gt_coredump *gt, struct intel_engine_coredump *ee; /* Refill our page pool before entering atomic section */ - pool_refill(&compress->pool, ALLOW_FAIL); + pool_refill(&compress->pool, ATOMIC_MAYFAIL); ee = capture_engine(engine, compress); if (!ee) @@ -1507,7 +1506,7 @@ gt_record_uc(struct intel_gt_coredump *gt, const struct intel_uc *uc = >->_gt->uc; struct intel_uc_coredump *error_uc; - error_uc = kzalloc(sizeof(*error_uc), ALLOW_FAIL); + error_uc = kzalloc(sizeof(*error_uc), ATOMIC_MAYFAIL); if (!error_uc) return NULL; @@ -1518,8 +1517,8 @@ gt_record_uc(struct intel_gt_coredump *gt, * As modparams are generally accesible from the userspace make * explicit copies of the firmware paths. */ - error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL); - error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL); + error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ATOMIC_MAYFAIL); + error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ATOMIC_MAYFAIL); error_uc->guc_log = i915_vma_coredump_create(gt->_gt, uc->guc.log.vma, "GuC log buffer", @@ -1778,7 +1777,7 @@ i915_vma_capture_prepare(struct intel_gt_coredump *gt) { struct i915_vma_compress *compress; - compress = kmalloc(sizeof(*compress), ALLOW_FAIL); + compress = kmalloc(sizeof(*compress), ATOMIC_MAYFAIL); if (!compress) return NULL; @@ -1811,11 +1810,11 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask) if (IS_ERR(error)) return error; - error = i915_gpu_coredump_alloc(i915, ALLOW_FAIL); + error = i915_gpu_coredump_alloc(i915, ATOMIC_MAYFAIL); if (!error) return ERR_PTR(-ENOMEM); - error->gt = intel_gt_coredump_alloc(gt, ALLOW_FAIL); + error->gt = intel_gt_coredump_alloc(gt, ATOMIC_MAYFAIL); if (error->gt) { struct i915_vma_compress *compress; From patchwork Mon Aug 16 13:51:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51123C4338F for ; Mon, 16 Aug 2021 13:57:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 20F71632F1 for ; Mon, 16 Aug 2021 13:57:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 20F71632F1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 71D4F89F73; Mon, 16 Aug 2021 13:57:25 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 09CDA89F03; Mon, 16 Aug 2021 13:57:03 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607079" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607079" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091181" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:32 -0700 Message-Id: <20210816135139.10060-16-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 15/22] drm/i915/guc: Flush G2H work queue during reset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" It isn't safe to scrub for missing G2H or continue with the reset until all G2H processing is complete. Flush the G2H work queue during reset to ensure it is done running. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 3a01743e09ea..8c560ed14976 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -707,8 +707,6 @@ static void guc_flush_submissions(struct intel_guc *guc) void intel_guc_submission_reset_prepare(struct intel_guc *guc) { - int i; - if (unlikely(!guc_submission_initialized(guc))) { /* Reset called during driver load? GuC not yet initialised! */ return; @@ -724,20 +722,8 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) guc_flush_submissions(guc); - /* - * Handle any outstanding G2Hs before reset. Call IRQ handler directly - * each pass as interrupt have been disabled. We always scrub for - * outstanding G2H as it is possible for outstanding_submission_g2h to - * be incremented after the context state update. - */ - for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) { - intel_guc_to_host_event_handler(guc); -#define wait_for_reset(guc, wait_var) \ - intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20)) - do { - wait_for_reset(guc, &guc->outstanding_submission_g2h); - } while (!list_empty(&guc->ct.requests.incoming)); - } + flush_work(&guc->ct.requests.worker); + scrub_guc_desc_for_outstanding_g2h(guc); } From patchwork Mon Aug 16 13:51:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFA80C432BE for ; Mon, 16 Aug 2021 13:58:03 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7B99D632F2 for ; Mon, 16 Aug 2021 13:58:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7B99D632F2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6DF3289FDB; Mon, 16 Aug 2021 13:57:39 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 94C8189E36; Mon, 16 Aug 2021 13:57:03 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607080" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607080" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091182" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:33 -0700 Message-Id: <20210816135139.10060-17-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 16/22] drm/i915/guc: Release submit fence from an IRQ X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" A subsequent patch will flip the locking hierarchy from ce->guc_state.lock -> sched_engine->lock to sched_engine->lock -> ce->guc_state.lock. As such we need to release the submit fence for a request from an IRQ to break a lock inversion - i.e. the fence must be release went holding ce->guc_state.lock and the releasing of the can acquire sched_engine->lock. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 ++++++++++++++- drivers/gpu/drm/i915/i915_request.h | 5 +++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 8c560ed14976..9ae4633aa7cb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2017,6 +2017,14 @@ static const struct intel_context_ops guc_context_ops = { .create_virtual = guc_create_virtual, }; +static void submit_work_cb(struct irq_work *wrk) +{ + struct i915_request *rq = container_of(wrk, typeof(*rq), submit_work); + + might_lock(&rq->engine->sched_engine->lock); + i915_sw_fence_complete(&rq->submit); +} + static void __guc_signal_context_fence(struct intel_context *ce) { struct i915_request *rq; @@ -2026,8 +2034,12 @@ static void __guc_signal_context_fence(struct intel_context *ce) if (!list_empty(&ce->guc_state.fences)) trace_intel_context_fence_release(ce); + /* + * Use an IRQ to ensure locking order of sched_engine->lock -> + * ce->guc_state.lock is preserved. + */ list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) - i915_sw_fence_complete(&rq->submit); + irq_work_queue(&rq->submit_work); INIT_LIST_HEAD(&ce->guc_state.fences); } @@ -2137,6 +2149,7 @@ static int guc_request_alloc(struct i915_request *rq) spin_lock_irqsave(&ce->guc_state.lock, flags); if (context_wait_for_deregister_to_register(ce) || context_pending_disable(ce)) { + init_irq_work(&rq->submit_work, submit_work_cb); i915_sw_fence_await(&rq->submit); list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 1bc1349ba3c2..d818cfbfc41d 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -218,6 +218,11 @@ struct i915_request { }; struct llist_head execute_cb; struct i915_sw_fence semaphore; + /** + * @submit_work: complete submit fence from an IRQ if needed for + * locking hierarchy reasons. + */ + struct irq_work submit_work; /* * A list of everyone we wait upon, and everyone who waits upon us. From patchwork Mon Aug 16 13:51:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D656C4338F for ; Mon, 16 Aug 2021 13:58:00 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 20867632F2 for ; Mon, 16 Aug 2021 13:58:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 20867632F2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9D80F89F75; Mon, 16 Aug 2021 13:57:27 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 95D3889E50; Mon, 16 Aug 2021 13:57:03 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607081" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607081" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091184" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:34 -0700 Message-Id: <20210816135139.10060-18-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 17/22] drm/i915/guc: Move guc_blocked fence to struct guc_state X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Move guc_blocked fence to struct guc_state as the lock which protects the fence lives there. s/ce->guc_blocked/ce->guc_state.blocked_fence/g Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_context_types.h | 5 ++--- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 18 +++++++++--------- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 745e84c72c90..0e48939ec85f 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -405,8 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) * Initialize fence to be complete as this is expected to be complete * unless there is a pending schedule disable outstanding. */ - i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify); - i915_sw_fence_commit(&ce->guc_blocked); + i915_sw_fence_init(&ce->guc_state.blocked_fence, + sw_fence_dummy_notify); + i915_sw_fence_commit(&ce->guc_state.blocked_fence); i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 3a73f3117873..c06171ee8792 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -167,6 +167,8 @@ struct intel_context { * fence related to GuC submission */ struct list_head fences; + /* GuC context blocked fence */ + struct i915_sw_fence blocked_fence; } guc_state; struct { @@ -190,9 +192,6 @@ struct intel_context { */ struct list_head guc_id_link; - /* GuC context blocked fence */ - struct i915_sw_fence guc_blocked; - /* * GuC priority management */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9ae4633aa7cb..7aa16371908a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1482,24 +1482,24 @@ static void guc_blocked_fence_complete(struct intel_context *ce) { lockdep_assert_held(&ce->guc_state.lock); - if (!i915_sw_fence_done(&ce->guc_blocked)) - i915_sw_fence_complete(&ce->guc_blocked); + if (!i915_sw_fence_done(&ce->guc_state.blocked_fence)) + i915_sw_fence_complete(&ce->guc_state.blocked_fence); } static void guc_blocked_fence_reinit(struct intel_context *ce) { lockdep_assert_held(&ce->guc_state.lock); - GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked)); + GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_state.blocked_fence)); /* * This fence is always complete unless a pending schedule disable is * outstanding. We arm the fence here and complete it when we receive * the pending schedule disable complete message. */ - i915_sw_fence_fini(&ce->guc_blocked); - i915_sw_fence_reinit(&ce->guc_blocked); - i915_sw_fence_await(&ce->guc_blocked); - i915_sw_fence_commit(&ce->guc_blocked); + i915_sw_fence_fini(&ce->guc_state.blocked_fence); + i915_sw_fence_reinit(&ce->guc_state.blocked_fence); + i915_sw_fence_await(&ce->guc_state.blocked_fence); + i915_sw_fence_commit(&ce->guc_state.blocked_fence); } static u16 prep_context_pending_disable(struct intel_context *ce) @@ -1539,7 +1539,7 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce) if (enabled) clr_context_enabled(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags); - return &ce->guc_blocked; + return &ce->guc_state.blocked_fence; } /* @@ -1555,7 +1555,7 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce) with_intel_runtime_pm(runtime_pm, wakeref) __guc_context_sched_disable(guc, ce, guc_id); - return &ce->guc_blocked; + return &ce->guc_state.blocked_fence; } static void guc_context_unblock(struct intel_context *ce) From patchwork Mon Aug 16 13:51:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E296C4338F for ; Mon, 16 Aug 2021 13:58:08 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F28BA632F8 for ; Mon, 16 Aug 2021 13:58:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org F28BA632F8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9BC7A89FE3; Mon, 16 Aug 2021 13:57:40 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 966E589EB4; Mon, 16 Aug 2021 13:57:03 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607083" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607083" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091186" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:56:59 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:35 -0700 Message-Id: <20210816135139.10060-19-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 18/22] drm/i915/guc: Rework and simplify locking X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Rework and simplify the locking with GuC subission. Drop sched_state_no_lock and move all fields under the guc_state.sched_state and protect all these fields with guc_state.lock . This requires changing the locking hierarchy from guc_state.lock -> sched_engine.lock to sched_engine.lock -> guc_state.lock. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context_types.h | 5 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 186 ++++++++---------- drivers/gpu/drm/i915/i915_trace.h | 6 +- 3 files changed, 89 insertions(+), 108 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index c06171ee8792..d5d643b04d54 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -161,7 +161,7 @@ struct intel_context { * sched_state: scheduling state of this context using GuC * submission */ - u16 sched_state; + u32 sched_state; /* * fences: maintains of list of requests that have a submit * fence related to GuC submission @@ -178,9 +178,6 @@ struct intel_context { struct list_head requests; } guc_active; - /* GuC scheduling state flags that do not require a lock. */ - atomic_t guc_sched_state_no_lock; - /* GuC LRC descriptor ID */ u16 guc_id; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 7aa16371908a..ba19b99173fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -72,86 +72,23 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count); #define GUC_REQUEST_SIZE 64 /* bytes */ -/* - * Below is a set of functions which control the GuC scheduling state which do - * not require a lock as all state transitions are mutually exclusive. i.e. It - * is not possible for the context pinning code and submission, for the same - * context, to be executing simultaneously. We still need an atomic as it is - * possible for some of the bits to changing at the same time though. - */ -#define SCHED_STATE_NO_LOCK_ENABLED BIT(0) -#define SCHED_STATE_NO_LOCK_PENDING_ENABLE BIT(1) -#define SCHED_STATE_NO_LOCK_REGISTERED BIT(2) -static inline bool context_enabled(struct intel_context *ce) -{ - return (atomic_read(&ce->guc_sched_state_no_lock) & - SCHED_STATE_NO_LOCK_ENABLED); -} - -static inline void set_context_enabled(struct intel_context *ce) -{ - atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock); -} - -static inline void clr_context_enabled(struct intel_context *ce) -{ - atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED, - &ce->guc_sched_state_no_lock); -} - -static inline bool context_pending_enable(struct intel_context *ce) -{ - return (atomic_read(&ce->guc_sched_state_no_lock) & - SCHED_STATE_NO_LOCK_PENDING_ENABLE); -} - -static inline void set_context_pending_enable(struct intel_context *ce) -{ - atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE, - &ce->guc_sched_state_no_lock); -} - -static inline void clr_context_pending_enable(struct intel_context *ce) -{ - atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE, - &ce->guc_sched_state_no_lock); -} - -static inline bool context_registered(struct intel_context *ce) -{ - return (atomic_read(&ce->guc_sched_state_no_lock) & - SCHED_STATE_NO_LOCK_REGISTERED); -} - -static inline void set_context_registered(struct intel_context *ce) -{ - atomic_or(SCHED_STATE_NO_LOCK_REGISTERED, - &ce->guc_sched_state_no_lock); -} - -static inline void clr_context_registered(struct intel_context *ce) -{ - atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED, - &ce->guc_sched_state_no_lock); -} - /* * Below is a set of functions which control the GuC scheduling state which - * require a lock, aside from the special case where the functions are called - * from guc_lrc_desc_pin(). In that case it isn't possible for any other code - * path to be executing on the context. + * require a lock. */ #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) #define SCHED_STATE_DESTROYED BIT(1) #define SCHED_STATE_PENDING_DISABLE BIT(2) #define SCHED_STATE_BANNED BIT(3) -#define SCHED_STATE_BLOCKED_SHIFT 4 +#define SCHED_STATE_ENABLED BIT(4) +#define SCHED_STATE_PENDING_ENABLE BIT(5) +#define SCHED_STATE_REGISTERED BIT(6) +#define SCHED_STATE_BLOCKED_SHIFT 7 #define SCHED_STATE_BLOCKED BIT(SCHED_STATE_BLOCKED_SHIFT) #define SCHED_STATE_BLOCKED_MASK (0xfff << SCHED_STATE_BLOCKED_SHIFT) static inline void init_sched_state(struct intel_context *ce) { lockdep_assert_held(&ce->guc_state.lock); - atomic_set(&ce->guc_sched_state_no_lock, 0); ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK; } @@ -161,9 +98,8 @@ static inline bool sched_state_is_init(struct intel_context *ce) * XXX: Kernel contexts can have SCHED_STATE_NO_LOCK_REGISTERED after * suspend. */ - return !(atomic_read(&ce->guc_sched_state_no_lock) & - ~SCHED_STATE_NO_LOCK_REGISTERED) && - !(ce->guc_state.sched_state &= ~SCHED_STATE_BLOCKED_MASK); + return !(ce->guc_state.sched_state &= + ~(SCHED_STATE_BLOCKED_MASK | SCHED_STATE_REGISTERED)); } static inline bool @@ -236,6 +172,57 @@ static inline void clr_context_banned(struct intel_context *ce) ce->guc_state.sched_state &= ~SCHED_STATE_BANNED; } +static inline bool context_enabled(struct intel_context *ce) +{ + return ce->guc_state.sched_state & SCHED_STATE_ENABLED; +} + +static inline void set_context_enabled(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_ENABLED; +} + +static inline void clr_context_enabled(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_ENABLED; +} + +static inline bool context_pending_enable(struct intel_context *ce) +{ + return ce->guc_state.sched_state & SCHED_STATE_PENDING_ENABLE; +} + +static inline void set_context_pending_enable(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_PENDING_ENABLE; +} + +static inline void clr_context_pending_enable(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_PENDING_ENABLE; +} + +static inline bool context_registered(struct intel_context *ce) +{ + return ce->guc_state.sched_state & SCHED_STATE_REGISTERED; +} + +static inline void set_context_registered(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_REGISTERED; +} + +static inline void clr_context_registered(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_REGISTERED; +} + static inline u32 context_blocked(struct intel_context *ce) { return (ce->guc_state.sched_state & SCHED_STATE_BLOCKED_MASK) >> @@ -244,7 +231,6 @@ static inline u32 context_blocked(struct intel_context *ce) static inline void incr_context_blocked(struct intel_context *ce) { - lockdep_assert_held(&ce->engine->sched_engine->lock); lockdep_assert_held(&ce->guc_state.lock); ce->guc_state.sched_state += SCHED_STATE_BLOCKED; @@ -254,7 +240,6 @@ static inline void incr_context_blocked(struct intel_context *ce) static inline void decr_context_blocked(struct intel_context *ce) { - lockdep_assert_held(&ce->engine->sched_engine->lock); lockdep_assert_held(&ce->guc_state.lock); GEM_BUG_ON(!context_blocked(ce)); /* Underflow check */ @@ -443,6 +428,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 g2h_len_dw = 0; bool enabled; + lockdep_assert_held(&rq->engine->sched_engine->lock); + /* * Corner case where requests were sitting in the priority list or a * request resubmitted after the context was banned. @@ -450,7 +437,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) if (unlikely(intel_context_is_banned(ce))) { i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(ce->engine); - goto out; + return 0; } GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); @@ -463,9 +450,11 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) { err = guc_lrc_desc_pin(ce, false); if (unlikely(err)) - goto out; + return err; } + spin_lock(&ce->guc_state.lock); + /* * The request / context will be run on the hardware when scheduling * gets enabled in the unblock. @@ -500,6 +489,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) trace_i915_request_guc_submit(rq); out: + spin_unlock(&ce->guc_state.lock); return err; } @@ -720,8 +710,6 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) spin_lock_irq(&guc_to_gt(guc)->irq_lock); spin_unlock_irq(&guc_to_gt(guc)->irq_lock); - guc_flush_submissions(guc); - flush_work(&guc->ct.requests.worker); scrub_guc_desc_for_outstanding_g2h(guc); @@ -1125,7 +1113,11 @@ static int steal_guc_id(struct intel_guc *guc) list_del_init(&ce->guc_id_link); guc_id = ce->guc_id; + + spin_lock(&ce->guc_state.lock); clr_context_registered(ce); + spin_unlock(&ce->guc_state.lock); + set_context_guc_id_invalid(ce); return guc_id; } else { @@ -1161,6 +1153,8 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce) try_again: spin_lock_irqsave(&guc->contexts_lock, flags); + might_lock(&ce->guc_state.lock); + if (context_guc_id_invalid(ce)) { ret = assign_guc_id(guc, &ce->guc_id); if (ret) @@ -1240,8 +1234,13 @@ static int register_context(struct intel_context *ce, bool loop) trace_intel_context_register(ce); ret = __guc_action_register_context(guc, ce->guc_id, offset, loop); - if (likely(!ret)) + if (likely(!ret)) { + unsigned long flags; + + spin_lock_irqsave(&ce->guc_state.lock, flags); set_context_registered(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + } return ret; } @@ -1517,7 +1516,6 @@ static u16 prep_context_pending_disable(struct intel_context *ce) static struct i915_sw_fence *guc_context_block(struct intel_context *ce) { struct intel_guc *guc = ce_to_guc(ce); - struct i915_sched_engine *sched_engine = ce->engine->sched_engine; unsigned long flags; struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm; intel_wakeref_t wakeref; @@ -1526,13 +1524,7 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce) spin_lock_irqsave(&ce->guc_state.lock, flags); - /* - * Sync with submission path, increment before below changes to context - * state. - */ - spin_lock(&sched_engine->lock); incr_context_blocked(ce); - spin_unlock(&sched_engine->lock); enabled = context_enabled(ce); if (unlikely(!enabled || submission_disabled(guc))) { @@ -1561,7 +1553,6 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce) static void guc_context_unblock(struct intel_context *ce) { struct intel_guc *guc = ce_to_guc(ce); - struct i915_sched_engine *sched_engine = ce->engine->sched_engine; unsigned long flags; struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm; intel_wakeref_t wakeref; @@ -1586,13 +1577,7 @@ static void guc_context_unblock(struct intel_context *ce) intel_context_get(ce); } - /* - * Sync with submission path, decrement after above changes to context - * state. - */ - spin_lock(&sched_engine->lock); decr_context_blocked(ce); - spin_unlock(&sched_engine->lock); spin_unlock_irqrestore(&ce->guc_state.lock, flags); @@ -1702,7 +1687,9 @@ static void guc_context_sched_disable(struct intel_context *ce) if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { + spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_enabled(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); goto unpin; } @@ -1752,7 +1739,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce)); - clr_context_registered(ce); deregister_context(ce, ce->guc_id, true); } @@ -1825,8 +1811,10 @@ static void guc_context_destroy(struct kref *kref) /* Seal race with Reset */ spin_lock_irqsave(&ce->guc_state.lock, flags); disabled = submission_disabled(guc); - if (likely(!disabled)) + if (likely(!disabled)) { set_context_destroyed(ce); + clr_context_registered(ce); + } spin_unlock_irqrestore(&ce->guc_state.lock, flags); if (unlikely(disabled)) { release_guc_id(guc, ce); @@ -2695,8 +2683,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, (!context_pending_enable(ce) && !context_pending_disable(ce)))) { drm_err(&guc_to_gt(guc)->i915->drm, - "Bad context sched_state 0x%x, 0x%x, desc_idx %u", - atomic_read(&ce->guc_sched_state_no_lock), + "Bad context sched_state 0x%x, desc_idx %u", ce->guc_state.sched_state, desc_idx); return -EPROTO; } @@ -2711,7 +2698,9 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, } #endif + spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_pending_enable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); } else if (context_pending_disable(ce)) { bool banned; @@ -2985,9 +2974,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc, atomic_read(&ce->pin_count)); drm_printf(p, "\t\tGuC ID Ref Count: %u\n", atomic_read(&ce->guc_id_ref)); - drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n", - ce->guc_state.sched_state, - atomic_read(&ce->guc_sched_state_no_lock)); + drm_printf(p, "\t\tSchedule State: 0x%x\n\n", + ce->guc_state.sched_state); guc_log_context_priority(p, ce); } diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 806ad688274b..0a77eb2944b5 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -903,7 +903,6 @@ DECLARE_EVENT_CLASS(intel_context, __field(u32, guc_id) __field(int, pin_count) __field(u32, sched_state) - __field(u32, guc_sched_state_no_lock) __field(u8, guc_prio) ), @@ -911,15 +910,12 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_id = ce->guc_id; __entry->pin_count = atomic_read(&ce->pin_count); __entry->sched_state = ce->guc_state.sched_state; - __entry->guc_sched_state_no_lock = - atomic_read(&ce->guc_sched_state_no_lock); __entry->guc_prio = ce->guc_prio; ), - TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u", + TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x, guc_prio=%u", __entry->guc_id, __entry->pin_count, __entry->sched_state, - __entry->guc_sched_state_no_lock, __entry->guc_prio) ); From patchwork Mon Aug 16 13:51:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1370C432BE for ; Mon, 16 Aug 2021 13:57:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A086A632D6 for ; Mon, 16 Aug 2021 13:57:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A086A632D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B52CA89F47; Mon, 16 Aug 2021 13:57:20 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 658E889E3F; Mon, 16 Aug 2021 13:57:04 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607084" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607084" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091188" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:36 -0700 Message-Id: <20210816135139.10060-20-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 19/22] drm/i915/guc: Proper xarray usage for contexts_lookup X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Lock the xarray and take ref to the context if needed. v2: (Checkpatch) - Add new line after declaration Signed-off-by: Matthew Brost --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 84 ++++++++++++++++--- 1 file changed, 73 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ba19b99173fc..2ecb2f002bed 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -599,8 +599,18 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) unsigned long index, flags; bool pending_disable, pending_enable, deregister, destroyed, banned; + xa_lock_irqsave(&guc->context_lookup, flags); xa_for_each(&guc->context_lookup, index, ce) { - spin_lock_irqsave(&ce->guc_state.lock, flags); + /* + * Corner case where the ref count on the object is zero but and + * deregister G2H was lost. In this case we don't touch the ref + * count and finish the destroy of the context. + */ + bool do_put = kref_get_unless_zero(&ce->ref); + + xa_unlock(&guc->context_lookup); + + spin_lock(&ce->guc_state.lock); /* * Once we are at this point submission_disabled() is guaranteed @@ -616,7 +626,9 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) banned = context_banned(ce); init_sched_state(ce); - spin_unlock_irqrestore(&ce->guc_state.lock, flags); + spin_unlock(&ce->guc_state.lock); + + GEM_BUG_ON(!do_put && !destroyed); if (pending_enable || destroyed || deregister) { atomic_dec(&guc->outstanding_submission_g2h); @@ -645,7 +657,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) intel_context_put(ce); } + + if (do_put) + intel_context_put(ce); + xa_lock(&guc->context_lookup); } + xa_unlock_irqrestore(&guc->context_lookup, flags); } static inline bool @@ -866,16 +883,26 @@ void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) { struct intel_context *ce; unsigned long index; + unsigned long flags; if (unlikely(!guc_submission_initialized(guc))) { /* Reset called during driver load? GuC not yet initialised! */ return; } - xa_for_each(&guc->context_lookup, index, ce) + xa_lock_irqsave(&guc->context_lookup, flags); + xa_for_each(&guc->context_lookup, index, ce) { + intel_context_get(ce); + xa_unlock(&guc->context_lookup); + if (intel_context_is_pinned(ce)) __guc_reset_context(ce, stalled); + intel_context_put(ce); + xa_lock(&guc->context_lookup); + } + xa_unlock_irqrestore(&guc->context_lookup, flags); + /* GuC is blown away, drop all references to contexts */ xa_destroy(&guc->context_lookup); } @@ -950,11 +977,21 @@ void intel_guc_submission_cancel_requests(struct intel_guc *guc) { struct intel_context *ce; unsigned long index; + unsigned long flags; + + xa_lock_irqsave(&guc->context_lookup, flags); + xa_for_each(&guc->context_lookup, index, ce) { + intel_context_get(ce); + xa_unlock(&guc->context_lookup); - xa_for_each(&guc->context_lookup, index, ce) if (intel_context_is_pinned(ce)) guc_cancel_context_requests(ce); + intel_context_put(ce); + xa_lock(&guc->context_lookup); + } + xa_unlock_irqrestore(&guc->context_lookup, flags); + guc_cancel_sched_engine_requests(guc->sched_engine); /* GuC is blown away, drop all references to contexts */ @@ -2848,21 +2885,26 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) struct intel_context *ce; struct i915_request *rq; unsigned long index; + unsigned long flags; /* Reset called during driver load? GuC not yet initialised! */ if (unlikely(!guc_submission_initialized(guc))) return; + xa_lock_irqsave(&guc->context_lookup, flags); xa_for_each(&guc->context_lookup, index, ce) { + intel_context_get(ce); + xa_unlock(&guc->context_lookup); + if (!intel_context_is_pinned(ce)) - continue; + goto next; if (intel_engine_is_virtual(ce->engine)) { if (!(ce->engine->mask & engine->mask)) - continue; + goto next; } else { if (ce->engine != engine) - continue; + goto next; } list_for_each_entry(rq, &ce->guc_active.requests, sched.link) { @@ -2872,9 +2914,17 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) intel_engine_set_hung_context(engine, ce); /* Can only cope with one hang at a time... */ - return; + intel_context_put(ce); + xa_lock(&guc->context_lookup); + goto done; } +next: + intel_context_put(ce); + xa_lock(&guc->context_lookup); + } +done: + xa_unlock_irqrestore(&guc->context_lookup, flags); } void intel_guc_dump_active_requests(struct intel_engine_cs *engine, @@ -2890,23 +2940,32 @@ void intel_guc_dump_active_requests(struct intel_engine_cs *engine, if (unlikely(!guc_submission_initialized(guc))) return; + xa_lock_irqsave(&guc->context_lookup, flags); xa_for_each(&guc->context_lookup, index, ce) { + intel_context_get(ce); + xa_unlock(&guc->context_lookup); + if (!intel_context_is_pinned(ce)) - continue; + goto next; if (intel_engine_is_virtual(ce->engine)) { if (!(ce->engine->mask & engine->mask)) - continue; + goto next; } else { if (ce->engine != engine) - continue; + goto next; } spin_lock_irqsave(&ce->guc_active.lock, flags); intel_engine_dump_active_requests(&ce->guc_active.requests, hung_rq, m); spin_unlock_irqrestore(&ce->guc_active.lock, flags); + +next: + intel_context_put(ce); + xa_lock(&guc->context_lookup); } + xa_unlock_irqrestore(&guc->context_lookup, flags); } void intel_guc_submission_print_info(struct intel_guc *guc, @@ -2960,7 +3019,9 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc, { struct intel_context *ce; unsigned long index; + unsigned long flags; + xa_lock_irqsave(&guc->context_lookup, flags); xa_for_each(&guc->context_lookup, index, ce) { drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id); drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca); @@ -2979,6 +3040,7 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc, guc_log_context_priority(p, ce); } + xa_unlock_irqrestore(&guc->context_lookup, flags); } static struct intel_context * From patchwork Mon Aug 16 13:51:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00704C432BE for ; Mon, 16 Aug 2021 13:58:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C9347632F2 for ; Mon, 16 Aug 2021 13:58:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C9347632F2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3952589FAD; Mon, 16 Aug 2021 13:57:36 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id DD02989EFF; Mon, 16 Aug 2021 13:57:05 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607085" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607085" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091190" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:37 -0700 Message-Id: <20210816135139.10060-21-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 20/22] drm/i915/guc: Drop pin count check trick between sched_disable and re-pin X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Drop pin count check trick between a sched_disable and re-pin, now rely on the lock and counter of the number of committed requests to determine if scheduling should be disabled on the context. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 49 ++++++++++++------- 2 files changed, 34 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index d5d643b04d54..524a35a78bf4 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -169,6 +169,8 @@ struct intel_context { struct list_head fences; /* GuC context blocked fence */ struct i915_sw_fence blocked_fence; + /* GuC committed requests */ + int number_committed_requests; } guc_state; struct { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 2ecb2f002bed..c6ae6b4417c2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -247,6 +247,25 @@ static inline void decr_context_blocked(struct intel_context *ce) ce->guc_state.sched_state -= SCHED_STATE_BLOCKED; } +static inline bool context_has_committed_requests(struct intel_context *ce) +{ + return !!ce->guc_state.number_committed_requests; +} + +static inline void incr_context_committed_requests(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ++ce->guc_state.number_committed_requests; + GEM_BUG_ON(ce->guc_state.number_committed_requests < 0); +} + +static inline void decr_context_committed_requests(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + --ce->guc_state.number_committed_requests; + GEM_BUG_ON(ce->guc_state.number_committed_requests < 0); +} + static inline bool context_guc_id_invalid(struct intel_context *ce) { return ce->guc_id == GUC_INVALID_LRC_ID; @@ -1736,14 +1755,11 @@ static void guc_context_sched_disable(struct intel_context *ce) spin_lock_irqsave(&ce->guc_state.lock, flags); /* - * We have to check if the context has been disabled by another thread. - * We also have to check if the context has been pinned again as another - * pin operation is allowed to pass this function. Checking the pin - * count, within ce->guc_state.lock, synchronizes this function with - * guc_request_alloc ensuring a request doesn't slip through the - * 'context_pending_disable' fence. Checking within the spin lock (can't - * sleep) ensures another process doesn't pin this context and generate - * a request before we set the 'context_pending_disable' flag here. + * We have to check if the context has been disabled by another thread, + * check if submssion has been disabled to seal a race with reset and + * finally check if any more requests have been committed to the + * context ensursing that a request doesn't slip through the + * 'context_pending_disable' fence. */ enabled = context_enabled(ce); if (unlikely(!enabled || submission_disabled(guc))) { @@ -1752,7 +1768,8 @@ static void guc_context_sched_disable(struct intel_context *ce) spin_unlock_irqrestore(&ce->guc_state.lock, flags); goto unpin; } - if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { + if (unlikely(context_has_committed_requests(ce))) { + intel_context_sched_disable_unpin(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags); return; } @@ -1785,6 +1802,7 @@ static void __guc_context_destroy(struct intel_context *ce) ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] || ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] || ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]); + GEM_BUG_ON(ce->guc_state.number_committed_requests); lrc_fini(ce); intel_context_fini(ce); @@ -2015,6 +2033,10 @@ static void remove_from_context(struct i915_request *rq) spin_unlock_irq(&ce->guc_active.lock); + spin_lock_irq(&ce->guc_state.lock); + decr_context_committed_requests(ce); + spin_unlock_irq(&ce->guc_state.lock); + atomic_dec(&ce->guc_id_ref); i915_request_notify_execute_cb_imm(rq); } @@ -2162,15 +2184,7 @@ static int guc_request_alloc(struct i915_request *rq) * schedule enable or context registration if either G2H is pending * respectfully. Once a G2H returns, the fence is released that is * blocking these requests (see guc_signal_context_fence). - * - * We can safely check the below fields outside of the lock as it isn't - * possible for these fields to transition from being clear to set but - * converse is possible, hence the need for the check within the lock. */ - if (likely(!context_wait_for_deregister_to_register(ce) && - !context_pending_disable(ce))) - return 0; - spin_lock_irqsave(&ce->guc_state.lock, flags); if (context_wait_for_deregister_to_register(ce) || context_pending_disable(ce)) { @@ -2179,6 +2193,7 @@ static int guc_request_alloc(struct i915_request *rq) list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); } + incr_context_committed_requests(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags); return 0; From patchwork Mon Aug 16 13:51:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DECF2C4320A for ; Mon, 16 Aug 2021 13:58:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AEA7C632F2 for ; Mon, 16 Aug 2021 13:58:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AEA7C632F2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9187089FC0; Mon, 16 Aug 2021 13:57:37 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id E0DAA89F01; Mon, 16 Aug 2021 13:57:05 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607086" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607086" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091191" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:38 -0700 Message-Id: <20210816135139.10060-22-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 21/22] drm/i915/guc: Move GuC priority fields in context under guc_active X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Move GuC management fields in context under guc_active struct as this is where the lock that protects theses fields lives. Also only set guc_prio field once during context init. Fixes: ee242ca704d3 ("drm/i915/guc: Implement GuC priority management") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/intel_context_types.h | 12 ++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 68 +++++++++++-------- drivers/gpu/drm/i915/i915_trace.h | 2 +- 3 files changed, 45 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 524a35a78bf4..f6989e6807f7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -112,6 +112,7 @@ struct intel_context { #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 #define CONTEXT_LRCA_DIRTY 9 +#define CONTEXT_GUC_INIT 10 struct { u64 timeout_us; @@ -178,6 +179,11 @@ struct intel_context { spinlock_t lock; /** requests: active requests on this context */ struct list_head requests; + /* + * GuC priority management + */ + u8 guc_prio; + u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM]; } guc_active; /* GuC LRC descriptor ID */ @@ -191,12 +197,6 @@ struct intel_context { */ struct list_head guc_id_link; - /* - * GuC priority management - */ - u8 guc_prio; - u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM]; - #ifdef CONFIG_DRM_I915_SELFTEST /** * @drop_schedule_enable: Force drop of schedule enable G2H for selftest diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c6ae6b4417c2..eb06a4c7534e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1354,8 +1354,6 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; } -static inline u8 map_i915_prio_to_guc_prio(int prio); - static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_engine_cs *engine = ce->engine; @@ -1363,8 +1361,6 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) struct intel_guc *guc = &engine->gt->uc.guc; u32 desc_idx = ce->guc_id; struct guc_lrc_desc *desc; - const struct i915_gem_context *ctx; - int prio = I915_CONTEXT_DEFAULT_PRIORITY; bool context_registered; intel_wakeref_t wakeref; int ret = 0; @@ -1381,12 +1377,6 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) context_registered = lrc_desc_registered(guc, desc_idx); - rcu_read_lock(); - ctx = rcu_dereference(ce->gem_context); - if (ctx) - prio = ctx->sched.priority; - rcu_read_unlock(); - reset_lrc_desc(guc, desc_idx); set_lrc_desc_registered(guc, desc_idx, ce); @@ -1395,8 +1385,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) desc->engine_submit_mask = adjust_engine_mask(engine->class, engine->mask); desc->hw_context_desc = ce->lrc.lrca; - ce->guc_prio = map_i915_prio_to_guc_prio(prio); - desc->priority = ce->guc_prio; + desc->priority = ce->guc_active.guc_prio; desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD; guc_context_policy_init(engine, desc); @@ -1798,10 +1787,10 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) static void __guc_context_destroy(struct intel_context *ce) { - GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] || - ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] || - ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] || - ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]); + GEM_BUG_ON(ce->guc_active.guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] || + ce->guc_active.guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] || + ce->guc_active.guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] || + ce->guc_active.guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]); GEM_BUG_ON(ce->guc_state.number_committed_requests); lrc_fini(ce); @@ -1911,14 +1900,17 @@ static void guc_context_set_prio(struct intel_guc *guc, GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH || prio > GUC_CLIENT_PRIORITY_NORMAL); + lockdep_assert_held(&ce->guc_active.lock); - if (ce->guc_prio == prio || submission_disabled(guc) || - !context_registered(ce)) + if (ce->guc_active.guc_prio == prio || submission_disabled(guc) || + !context_registered(ce)) { + ce->guc_active.guc_prio = prio; return; + } guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); - ce->guc_prio = prio; + ce->guc_active.guc_prio = prio; trace_intel_context_set_prio(ce); } @@ -1938,24 +1930,24 @@ static inline void add_context_inflight_prio(struct intel_context *ce, u8 guc_prio) { lockdep_assert_held(&ce->guc_active.lock); - GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count)); + GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_active.guc_prio_count)); - ++ce->guc_prio_count[guc_prio]; + ++ce->guc_active.guc_prio_count[guc_prio]; /* Overflow protection */ - GEM_WARN_ON(!ce->guc_prio_count[guc_prio]); + GEM_WARN_ON(!ce->guc_active.guc_prio_count[guc_prio]); } static inline void sub_context_inflight_prio(struct intel_context *ce, u8 guc_prio) { lockdep_assert_held(&ce->guc_active.lock); - GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count)); + GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_active.guc_prio_count)); /* Underflow protection */ - GEM_WARN_ON(!ce->guc_prio_count[guc_prio]); + GEM_WARN_ON(!ce->guc_active.guc_prio_count[guc_prio]); - --ce->guc_prio_count[guc_prio]; + --ce->guc_active.guc_prio_count[guc_prio]; } static inline void update_context_prio(struct intel_context *ce) @@ -1968,8 +1960,8 @@ static inline void update_context_prio(struct intel_context *ce) lockdep_assert_held(&ce->guc_active.lock); - for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) { - if (ce->guc_prio_count[i]) { + for (i = 0; i < ARRAY_SIZE(ce->guc_active.guc_prio_count); ++i) { + if (ce->guc_active.guc_prio_count[i]) { guc_context_set_prio(guc, ce, i); break; } @@ -2108,6 +2100,20 @@ static bool context_needs_register(struct intel_context *ce, bool new_guc_id) !submission_disabled(ce_to_guc(ce)); } +static void guc_context_init(struct intel_context *ce) +{ + const struct i915_gem_context *ctx; + int prio = I915_CONTEXT_DEFAULT_PRIORITY; + + rcu_read_lock(); + ctx = rcu_dereference(ce->gem_context); + if (ctx) + prio = ctx->sched.priority; + rcu_read_unlock(); + + ce->guc_active.guc_prio = map_i915_prio_to_guc_prio(prio); +} + static int guc_request_alloc(struct i915_request *rq) { struct intel_context *ce = rq->context; @@ -2139,6 +2145,9 @@ static int guc_request_alloc(struct i915_request *rq) rq->reserved_space -= GUC_REQUEST_SIZE; + if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) + guc_context_init(ce); + /* * Call pin_guc_id here rather than in the pinning step as with * dma_resv, contexts can be repeatedly pinned / unpinned trashing the @@ -3018,13 +3027,12 @@ static inline void guc_log_context_priority(struct drm_printer *p, { int i; - drm_printf(p, "\t\tPriority: %d\n", - ce->guc_prio); + drm_printf(p, "\t\tPriority: %d\n", ce->guc_active.guc_prio); drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n"); for (i = GUC_CLIENT_PRIORITY_KMD_HIGH; i < GUC_CLIENT_PRIORITY_NUM; ++i) { drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n", - i, ce->guc_prio_count[i]); + i, ce->guc_active.guc_prio_count[i]); } drm_printf(p, "\n"); } diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 0a77eb2944b5..518a6fa2cca7 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -910,7 +910,7 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_id = ce->guc_id; __entry->pin_count = atomic_read(&ce->pin_count); __entry->sched_state = ce->guc_state.sched_state; - __entry->guc_prio = ce->guc_prio; + __entry->guc_prio = ce->guc_active.guc_prio; ), TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x, guc_prio=%u", From patchwork Mon Aug 16 13:51:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Brost X-Patchwork-Id: 12438597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F347C4338F for ; Mon, 16 Aug 2021 13:57:52 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D48EA632F0 for ; Mon, 16 Aug 2021 13:57:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D48EA632F0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 18B6289F92; Mon, 16 Aug 2021 13:57:23 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3599889F0A; Mon, 16 Aug 2021 13:57:06 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10077"; a="279607088" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279607088" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:01 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="441091194" Received: from jons-linux-dev-box.fm.intel.com ([10.1.27.20]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 06:57:00 -0700 From: Matthew Brost To: , Cc: Date: Mon, 16 Aug 2021 06:51:39 -0700 Message-Id: <20210816135139.10060-23-matthew.brost@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816135139.10060-1-matthew.brost@intel.com> References: <20210816135139.10060-1-matthew.brost@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 22/22] drm/i915/guc: Add GuC kernel doc X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Add GuC kernel doc for all structures added thus far for GuC submission and update the main GuC submission section with the new interface details. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_context_types.h | 42 +++++--- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 19 +++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 101 ++++++++++++++---- drivers/gpu/drm/i915/i915_request.h | 18 ++-- 4 files changed, 131 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index f6989e6807f7..75d609a1bc33 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -156,44 +156,56 @@ struct intel_context { u8 wa_bb_page; /* if set, page num reserved for context workarounds */ struct { - /** lock: protects everything in guc_state */ + /** @lock: protects everything in guc_state */ spinlock_t lock; /** - * sched_state: scheduling state of this context using GuC + * @sched_state: scheduling state of this context using GuC * submission */ u32 sched_state; /* - * fences: maintains of list of requests that have a submit - * fence related to GuC submission + * @fences: maintains a list of requests are currently being + * fenced until a GuC operation completes */ struct list_head fences; - /* GuC context blocked fence */ + /** + * @blocked_fence: fence used to signal when the blocking of a + * contexts submissions is complete. + */ struct i915_sw_fence blocked_fence; - /* GuC committed requests */ + /** @number_committed_requests: number of committed requests */ int number_committed_requests; } guc_state; struct { - /** lock: protects everything in guc_active */ + /** @lock: protects everything in guc_active */ spinlock_t lock; - /** requests: active requests on this context */ + /** @requests: list of active requests on this context */ struct list_head requests; - /* - * GuC priority management - */ + /** @guc_prio: the contexts current guc priority */ u8 guc_prio; + /** + * @guc_prio_count: a counter of the number requests inflight in + * each priority bucket + */ u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM]; } guc_active; - /* GuC LRC descriptor ID */ + /** + * @guc_id: unique handle which is used to communicate information with + * the GuC about this context, protected by guc->contexts_lock + */ u16 guc_id; - /* GuC LRC descriptor reference count */ + /** + * @guc_id_ref: the number of references to the guc_id, when + * transitioning in and out of zero protected by guc->contexts_lock + */ atomic_t guc_id_ref; - /* - * GuC ID link - in list when unpinned but guc_id still valid in GuC + /** + * @guc_id_link: in guc->guc_id_list when the guc_id has no refs but is + * still valid, protected by guc->contexts_lock */ struct list_head guc_id_link; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 2e27fe59786b..c0b3fdb601f0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -41,6 +41,10 @@ struct intel_guc { spinlock_t irq_lock; unsigned int msg_enabled_mask; + /** + * @outstanding_submission_g2h: number of outstanding G2H related to GuC + * submission, used to determine if the GT is idle + */ atomic_t outstanding_submission_g2h; struct { @@ -49,12 +53,16 @@ struct intel_guc { void (*disable)(struct intel_guc *guc); } interrupts; - /* - * contexts_lock protects the pool of free guc ids and a linked list of - * guc ids available to be stolen + /** + * @contexts_lock: protects guc_ids, guc_id_list, ce->guc_id, and + * ce->guc_id_ref when transitioning in and out of zero */ spinlock_t contexts_lock; + /** @guc_ids: used to allocate new guc_ids */ struct ida guc_ids; + /** + * @guc_id_list: list of intel_context with valid guc_ids but no refs + */ struct list_head guc_id_list; bool submission_supported; @@ -70,7 +78,10 @@ struct intel_guc { struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr; - /* guc_id to intel_context lookup */ + /** + * @context_lookup: used to intel_context from guc_id, if a context is + * present in this structure it is registered with the GuC + */ struct xarray context_lookup; /* Control params for fw initialization */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index eb06a4c7534e..18ef363c6e5d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -28,21 +28,6 @@ /** * DOC: GuC-based command submission * - * IMPORTANT NOTE: GuC submission is currently not supported in i915. The GuC - * firmware is moving to an updated submission interface and we plan to - * turn submission back on when that lands. The below documentation (and related - * code) matches the old submission model and will be updated as part of the - * upgrade to the new flow. - * - * GuC stage descriptor: - * During initialization, the driver allocates a static pool of 1024 such - * descriptors, and shares them with the GuC. Currently, we only use one - * descriptor. This stage descriptor lets the GuC know about the workqueue and - * process descriptor. Theoretically, it also lets the GuC know about our HW - * contexts (context ID, etc...), but we actually employ a kind of submission - * where the GuC uses the LRCA sent via the work item instead. This is called - * a "proxy" submission. - * * The Scratch registers: * There are 16 MMIO-based registers start from 0xC180. The kernel driver writes * a value to the action register (SOFT_SCRATCH_0) along with any data. It then @@ -51,14 +36,86 @@ * processes the request. The kernel driver polls waiting for this update and * then proceeds. * - * Work Items: - * There are several types of work items that the host may place into a - * workqueue, each with its own requirements and limitations. Currently only - * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which - * represents in-order queue. The kernel driver packs ring tail pointer and an - * ELSP context descriptor dword into Work Item. - * See guc_add_request() + * Command Transport buffers (CTBs): + * Covered in detail in other sections but CTBs (host-to-guc, H2G, guc-to-host + * G2H) are a message interface between the i915 and GuC used to controls + * submissions. + * + * Context registration: + * Before a context can be submitted it must be registered with the GuC via a + * H2G. A unique guc_id is associated with each context. The context is either + * registered at request creation time (normal operation) or at submission time + * (abnormal operation, e.g. after a reset). + * + * Context submission: + * The i915 updates the LRC tail value in memory. Either a schedule enable H2G + * or context submit H2G is used to submit a context. + * + * Context unpin: + * To unpin a context a H2G is used to disable scheduling and when the + * corresponding G2H returns indicating the scheduling disable operation has + * completed it is safe to unpin the context. While a disable is in flight it + * isn't safe to resubmit the context so a fence is used to stall all future + * requests until the G2H is returned. + * + * Context deregistration: + * Before a context can be destroyed or we steal its guc_id we must deregister + * the context with the GuC via H2G. If stealing the guc_id it isn't safe to + * submit anything to this guc_id until the deregister completes so a fence is + * used to stall all requests associated with this guc_ids until the + * corresponding G2H returns indicating the guc_id has been deregistered. + * + * guc_ids: + * Unique number associated with private GuC context data passed in during + * context registration / submission / deregistration. 64k available. Simple ida + * is used for allocation. + * + * Stealing guc_ids: + * If no guc_ids are available they can be stolen from another context at + * request creation time if that context is unpinned. If a guc_id can't be found + * we punt this problem to the user as we believe this is near impossible to hit + * during normal use cases. + * + * Locking: + * In the GuC submission code we have 4 basic spin locks which protect + * everything. Details about each below. + * + * sched_engine->lock + * This is the submission lock for all contexts that share a i915 schedule + * engine (sched_engine), thus only 1 context which share a sched_engine can be + * submitting at a time. Currently only 1 sched_engine used for all of GuC + * submission but that could change in the future. + * + * guc->contexts_lock + * Protects guc_id allocation. Global lock i.e. Only 1 context that uses GuC + * submission can hold this at a time. + * + * ce->guc_state.lock + * Protects everything under ce->guc_state. Ensures that a context is in the + * correct state before issuing a H2G. e.g. We don't issue a schedule disable + * on disabled context (bad idea), we don't issue schedule enable when a + * schedule disable is inflight, etc... Lock individual to each context. + * + * ce->guc_active.lock + * Protects everything under ce->guc_active which is the current requests + * inflight on the context / priority management. Lock individual to each + * context. + * + * Lock ordering rules: + * sched_engine->lock -> ce->guc_active.lock + * sched_engine->lock -> ce->guc_state.lock + * guc->contexts_lock -> ce->guc_state.lock * + * Reset races: + * When a GPU full reset is triggered it is assumed that some G2H responses to + * a H2G can be lost as the GuC is likely toast. Losing these G2H can prove to + * fatal as we do certain operations upon receiving a G2H (e.g. destroy + * contexts, release guc_ids, etc...). Luckly when this occurs we can scrub + * context state and cleanup appropriately, however this is quite racey. To + * avoid races the rules are check for submission being disabled (i.e. check for + * mid reset) with the appropriate lock being held. If submission is disabled + * don't send the H2G or update the context state. The reset code must disable + * submission and grab all these locks before scrubbing for the missing G2H. */ /* GuC Virtual Engine */ diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index d818cfbfc41d..177eaf55adff 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -290,18 +290,20 @@ struct i915_request { struct hrtimer timer; } watchdog; - /* - * Requests may need to be stalled when using GuC submission waiting for - * certain GuC operations to complete. If that is the case, stalled - * requests are added to a per context list of stalled requests. The - * below list_head is the link in that list. + /** + * @guc_fence_link: Requests may need to be stalled when using GuC + * submission waiting for certain GuC operations to complete. If that is + * the case, stalled requests are added to a per context list of stalled + * requests. The below list_head is the link in that list. Protected by + * ce->guc_state.lock. */ struct list_head guc_fence_link; /** - * Priority level while the request is inflight. Differs from i915 - * scheduler priority. See comment above - * I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP for details. + * @guc_prio: Priority level while the request is inflight. Differs from + * i915 scheduler priority. See comment above + * I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP for details. Protected by + * ce->guc_active.lock. */ #define GUC_PRIO_INIT 0xff #define GUC_PRIO_FINI 0xfe