From patchwork Fri Oct 28 19:46:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13024252 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D9DCFECAAA1 for ; Fri, 28 Oct 2022 19:45:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A555910E8D2; Fri, 28 Oct 2022 19:45:32 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id E86B810E8CC; Fri, 28 Oct 2022 19:45:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666986319; x=1698522319; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3lYphhyZVi7Wi2cxkFQ8Pw8jc83g2UGQ5dZMJbPn+68=; b=SzDevT0Q8bpb8OoBFFo8NgQ1f1f85Y1KTk2X5kxESB2eD3mVCbIy/695 ZDMcjFbzCRMclqioC8sdvTI/o3jKCPg9GNF+RYgJgBtXWPuE8ZjTI5/J1 jF4QG0YC1Ftvy/uFOilCefVaKGY1CzDeojKaWVTdD/64pdA/mKZZJYupi LpmvFLvcQQu0jdO5pcfnBTccfZIqF5yNSL8QRflS6mhIHtLlFsvMLLpJW LOJShLzWarkWMWB4YR5PA7Wnl+Tncp7dSWOgBu2ZYBm7rfY/d1ammAEvi lANLQoz2oy9vDa6GNvgfg4keBWYI2mI717gyCOgO9rBqw8z0H7LSxzZ+D w==; X-IronPort-AV: E=McAfee;i="6500,9779,10514"; a="372787686" X-IronPort-AV: E=Sophos;i="5.95,222,1661842800"; d="scan'208";a="372787686" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Oct 2022 12:45:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10514"; a="775491785" X-IronPort-AV: E=Sophos;i="5.95,222,1661842800"; d="scan'208";a="775491785" Received: from relo-linux-5.jf.intel.com ([10.165.21.195]) by fmsmga001.fm.intel.com with ESMTP; 28 Oct 2022 12:45:10 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Fri, 28 Oct 2022 12:46:48 -0700 Message-Id: <20221028194649.1130223-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221028194649.1130223-1-John.C.Harrison@Intel.com> References: <20221028194649.1130223-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 1/2] drm/i915/guc: Properly initialise kernel contexts X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison If a context has already been registered prior to first submission then context init code was not being called. The noticeable effect of that was the scheduling priority was left at zero (meaning super high priority) instead of being set to normal. This would occur with kernel contexts at start of day as they are manually pinned up front rather than on first submission. So add a call to initialise those when they are pinned. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 4ccb29f9ac55c..941613be3b9dd 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4111,6 +4111,9 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce); + if (!test_bit(CONTEXT_GUC_INIT, &ce->flags)) + guc_context_init(ce); + try_context_registration(ce, true); } From patchwork Fri Oct 28 19:46:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 13024251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1B8C6C38A02 for ; Fri, 28 Oct 2022 19:45:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B8F7010E8CC; Fri, 28 Oct 2022 19:45:23 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 258B510E8CD; Fri, 28 Oct 2022 19:45:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666986320; x=1698522320; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9m9BLyVLa//CHbStsyJadeIkrhdV2IQHongz84Y2YG4=; b=dGmtjncb1I5YT/+heKXnAmUwVwB5RV7GxAn0CoOzy+Cskr7dB9895KKn GVhcV+kE7dKMLZxK3Jiqr/0sEKdv/pl50dwZ93ab46687Hzn3Cg9UbabS M23mrTAvxrSohW7bWk+iEbU6ZHGQ6G/BPrmfzzEkZ7JUdEMvvzTBp4CkO qyBM6eLjmYHU1FohXQAklzvmeiTmZC/Sdbjj9O/0PTOOD1ro3cQ4v6x9u oynzWPcpt+5EHNV8dsqyCTijwWMy2fvJWjxiB6JoKiF9QIhyv1zKsff9/ PqeFiDKh4N8vc+e7Jh0GJ+CQWqYWQwkFIzVcnvtsmKORq+SV45LJ5qgBS w==; X-IronPort-AV: E=McAfee;i="6500,9779,10514"; a="372787687" X-IronPort-AV: E=Sophos;i="5.95,222,1661842800"; d="scan'208";a="372787687" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Oct 2022 12:45:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10514"; a="775491790" X-IronPort-AV: E=Sophos;i="5.95,222,1661842800"; d="scan'208";a="775491790" Received: from relo-linux-5.jf.intel.com ([10.165.21.195]) by fmsmga001.fm.intel.com with ESMTP; 28 Oct 2022 12:45:10 -0700 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Date: Fri, 28 Oct 2022 12:46:49 -0700 Message-Id: <20221028194649.1130223-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221028194649.1130223-1-John.C.Harrison@Intel.com> References: <20221028194649.1130223-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Don't deadlock busyness stats vs reset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: DRI-Devel@Lists.FreeDesktop.Org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: John Harrison The engine busyness stats has a worker function to do things like 64bit extend the 32bit hardware counters. The GuC's reset prepare function flushes out this worker function to ensure no corruption happens during the reset. Unforunately, the worker function has an infinite wait for active resets to finish before doing its work. Thus a deadlock would occur if the worker function had actually started just as the reset starts. Update the worker to abort if a reset is in progress rather than waiting for it to complete. It will still acquire the reset lock in the case where a reset was not already in progress. So the processing is still safe from corruption, but the deadlock can no longer occur. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++++++++++++++- drivers/gpu/drm/i915/gt/intel_reset.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 ++++-- 3 files changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 3159df6cdd492..2f48c6e4420ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1407,7 +1407,7 @@ void intel_gt_handle_error(struct intel_gt *gt, intel_runtime_pm_put(gt->uncore->rpm, wakeref); } -int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu) +static int _intel_gt_reset_trylock(struct intel_gt *gt, int *srcu, bool retry) { might_lock(>->reset.backoff_srcu); might_sleep(); @@ -1416,6 +1416,9 @@ int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu) while (test_bit(I915_RESET_BACKOFF, >->reset.flags)) { rcu_read_unlock(); + if (!retry) + return -EBUSY; + if (wait_event_interruptible(gt->reset.queue, !test_bit(I915_RESET_BACKOFF, >->reset.flags))) @@ -1429,6 +1432,16 @@ int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu) return 0; } +int intel_gt_reset_trylock_noretry(struct intel_gt *gt, int *srcu) +{ + return _intel_gt_reset_trylock(gt, srcu, false); +} + +int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu) +{ + return _intel_gt_reset_trylock(gt, srcu, true); +} + void intel_gt_reset_unlock(struct intel_gt *gt, int tag) __releases(>->reset.backoff_srcu) { diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h index adc734e673870..7f863726eb6a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.h +++ b/drivers/gpu/drm/i915/gt/intel_reset.h @@ -38,6 +38,7 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, void __i915_request_reset(struct i915_request *rq, bool guilty); +int __must_check intel_gt_reset_trylock_noretry(struct intel_gt *gt, int *srcu); int __must_check intel_gt_reset_trylock(struct intel_gt *gt, int *srcu); void intel_gt_reset_unlock(struct intel_gt *gt, int tag); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 941613be3b9dd..1fa1bc7dde3df 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1401,9 +1401,11 @@ static void guc_timestamp_ping(struct work_struct *wrk) /* * Synchronize with gt reset to make sure the worker does not - * corrupt the engine/guc stats. + * corrupt the engine/guc stats. NB: can't actually block waiting + * for a reset to complete as the reset requires flushing out + * any running worker thread. So waiting would deadlock. */ - ret = intel_gt_reset_trylock(gt, &srcu); + ret = intel_gt_reset_trylock_noretry(gt, &srcu); if (ret) return;