[v4,2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

From: Matthew Brost <matthew.brost@intel.com>

From: Matthew Brost <matthew.brost@intel.com>

Add a delay, configurable via debugs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a somewhat costly operation so the idea is that a delay
allows the user to resubmit something before doing this operation.
This delay is only done if the context isn't closed and less than 3/4
of total guc_ids are in use.

As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.

Alan Previn: Matt Brost first introduced this series back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.

Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means that the totality of all 36 containers and its
workloads were not saturating the utilized hw engines to its max
(in order to maintain just enough headrooom to meet the min fps and
max latencies of incoming container submissions).

Problem statement: It was observed that the CPU utilization of the CPU
core that was pinned to i915 soft IRQ work was experiencing severe load.
Using tracelogs and an instrumentation patch to count specific i915 IRQ
events, it was confirmed that the majority of the CPU cycles were caused
by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G request are sent
whenever a context is idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.

Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.

Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   9 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  18 +++
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |  57 +++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 150 +++++++++++++++---
 drivers/gpu/drm/i915/i915_selftest.h          |   2 +
 drivers/gpu/drm/i915/i915_trace.h             |  10 ++
 8 files changed, 229 insertions(+), 26 deletions(-)

Message ID	20220815043157.1506623-3-alan.previn.teres.alexis@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0BE22C00140 for <intel-gfx@archiver.kernel.org>; Mon, 15 Aug 2022 04:31:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D917BA760E; Mon, 15 Aug 2022 04:30:53 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 296F5A75F3 for <intel-gfx@lists.freedesktop.org>; Mon, 15 Aug 2022 04:30:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660537838; x=1692073838; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=HEYupRZImFmLasv09ZuuZpH17+Fio98uzY3X5qBx9CM=; b=E33RHQlojf/t7eBPteFwqZiGhPF2QSmVLcKn/Pny2/orbiadLNSkVTnW nuToMq3mr9SgJlrVUKHDnbowEWC5q8jiqFqkmm5zV1/iUdiuwDbiyxNXn wRmXkUc0h0Gs6MjnGAg0nsFJnIr2kACalVgQDWpuk/esAYZv9m9UPh3OT /R5432xOV2DAdYusXoJ0qtXvfh1dl/Pekp7TYdaWnTovwyxtn/6VJ0+oZ ns7X5zI0oT6x+kK3P9fb8kz7oAh4lU10+NITjwX1edWQQ+Z3DnyV4/vts YWL2I2Q37MqPCAyVdK1UT2YlkdKjT1FS7P9i5qYIw5urGXEnlwrlgWhZ+ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10439"; a="274943009" X-IronPort-AV: E=Sophos;i="5.93,237,1654585200"; d="scan'208";a="274943009" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Aug 2022 21:30:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,237,1654585200"; d="scan'208";a="582752635" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by orsmga006.jf.intel.com with ESMTP; 14 Aug 2022 21:30:37 -0700 From: Alan Previn <alan.previn.teres.alexis@intel.com> To: intel-gfx@lists.freedesktop.org Date: Sun, 14 Aug 2022 21:31:57 -0700 Message-Id: <20220815043157.1506623-3-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220815043157.1506623-1-alan.previn.teres.alexis@intel.com> References: <20220815043157.1506623-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [Intel-gfx v4 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development <intel-gfx.lists.freedesktop.org> List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>, <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe> List-Archive: <https://lists.freedesktop.org/archives/intel-gfx> List-Post: <mailto:intel-gfx@lists.freedesktop.org> List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help> List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>, <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe> Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Delay disabling scheduling on a context \| expand [v4,0/2] Delay disabling scheduling on a context [v4,1/2] drm/i915/selftests: Use correct selfest calls for live tests [v4,2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

[v4,2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

Commit Message

Patch