From patchwork Sat Dec 11 06:58:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12671719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3296DC433F5 for ; Sat, 11 Dec 2021 06:59:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 165DF10E3A3; Sat, 11 Dec 2021 06:59:02 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id F035110E3B3; Sat, 11 Dec 2021 06:59:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639205940; x=1670741940; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VeaTcmkQV0TSpDcijcaY4A1S2Jhvv9jWM2wC93NKIZ4=; b=SHSHOh1MX5YmgvJL7uKEAUbQxbdja1kl1osEfJfM7ry0zDqGuqA4SIt3 516e0vkyYtIzDe7SlI7yGOtZc8kdOu8g7dV+eIzUGg2Fm45k4r23NKj0v cqc7WG0VBR4E4OYuxcwreDIt1/4RU/9y9CckWKDQpZo2H0YrVfJg+oBCs rhdb6XEIb+bRkx6yYvcM/lvTOpFeFhVamcH5k/K8vPKLZYwKgDQNfV4zZ YPQyDIFvBKthNzdjCROTBK64ZEAA8ZB10wAvqik45sZ5Qd263fv7ukPI0 xwSL2ARZOco//VPx7aEszwNe9d0OSp256f8KDB0mmjENkAnkW9hYv4kM7 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10194"; a="237249046" X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="237249046" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Dec 2021 22:59:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="463979851" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by orsmga006.jf.intel.com with ESMTP; 10 Dec 2021 22:58:59 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH 1/4] drm/i915/guc: Speed up GuC log dumps Date: Fri, 10 Dec 2021 22:58:56 -0800 Message-Id: <20211211065859.2248188-2-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211211065859.2248188-1-John.C.Harrison@Intel.com> References: <20211211065859.2248188-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Add support for telling the debugfs interface the size of the GuC log dump in advance. Without that, the underlying framework keeps calling the 'show' function with larger and larger buffer allocations until it fits. That means reading the log from graphics memory many times - 16 times with the full 18MB log size. v2: Don't return error codes from size query. Report overflow in the error dump as well (review feedback from Daniele). Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_gt_debugfs.h | 21 +++++-- .../drm/i915/gt/uc/intel_guc_log_debugfs.c | 58 ++++++++++++++++++- 2 files changed, 71 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_debugfs.h b/drivers/gpu/drm/i915/gt/intel_gt_debugfs.h index e307ceb99031..17e79b735cfe 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_debugfs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_debugfs.h @@ -10,11 +10,7 @@ struct intel_gt; -#define DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(__name) \ - static int __name ## _open(struct inode *inode, struct file *file) \ -{ \ - return single_open(file, __name ## _show, inode->i_private); \ -} \ +#define __GT_DEBUGFS_ATTRIBUTE_FOPS(__name) \ static const struct file_operations __name ## _fops = { \ .owner = THIS_MODULE, \ .open = __name ## _open, \ @@ -23,6 +19,21 @@ static const struct file_operations __name ## _fops = { \ .release = single_release, \ } +#define DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(__name) \ +static int __name ## _open(struct inode *inode, struct file *file) \ +{ \ + return single_open(file, __name ## _show, inode->i_private); \ +} \ +__GT_DEBUGFS_ATTRIBUTE_FOPS(__name) + +#define DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE_WITH_SIZE(__name, __size_vf) \ +static int __name ## _open(struct inode *inode, struct file *file) \ +{ \ + return single_open_size(file, __name ## _show, inode->i_private, \ + __size_vf(inode->i_private)); \ +} \ +__GT_DEBUGFS_ATTRIBUTE_FOPS(__name) + void intel_gt_debugfs_register(struct intel_gt *gt); struct intel_gt_debugfs_file { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c index 8fd068049376..ddfbe334689f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.c @@ -10,22 +10,74 @@ #include "intel_guc.h" #include "intel_guc_log.h" #include "intel_guc_log_debugfs.h" +#include "intel_uc.h" + +static u32 obj_to_guc_log_dump_size(struct drm_i915_gem_object *obj) +{ + u32 size; + + if (!obj) + return PAGE_SIZE; + + /* "0x%08x 0x%08x 0x%08x 0x%08x\n" => 16 bytes -> 44 chars => x2.75 */ + size = ((obj->base.size * 11) + 3) / 4; + + /* Add padding for final blank line, any extra header info, etc. */ + size = PAGE_ALIGN(size + PAGE_SIZE); + + return size; +} + +static u32 guc_log_dump_size(struct intel_guc_log *log) +{ + struct intel_guc *guc = log_to_guc(log); + + if (!intel_guc_is_supported(guc)) + return PAGE_SIZE; + + if (!log->vma) + return PAGE_SIZE; + + return obj_to_guc_log_dump_size(log->vma->obj); +} static int guc_log_dump_show(struct seq_file *m, void *data) { struct drm_printer p = drm_seq_file_printer(m); + int ret; - return intel_guc_log_dump(m->private, &p, false); + ret = intel_guc_log_dump(m->private, &p, false); + + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) && seq_has_overflowed(m)) + pr_warn_once("preallocated size:%zx for %s exceeded\n", + m->size, __func__); + + return ret; +} +DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE_WITH_SIZE(guc_log_dump, guc_log_dump_size); + +static u32 guc_load_err_dump_size(struct intel_guc_log *log) +{ + struct intel_guc *guc = log_to_guc(log); + struct intel_uc *uc = container_of(guc, struct intel_uc, guc); + + if (!intel_guc_is_supported(guc)) + return PAGE_SIZE; + + return obj_to_guc_log_dump_size(uc->load_err_log); } -DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(guc_log_dump); static int guc_load_err_log_dump_show(struct seq_file *m, void *data) { struct drm_printer p = drm_seq_file_printer(m); + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) && seq_has_overflowed(m)) + pr_warn_once("preallocated size:%zx for %s exceeded\n", + m->size, __func__); + return intel_guc_log_dump(m->private, &p, true); } -DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(guc_load_err_log_dump); +DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE_WITH_SIZE(guc_load_err_log_dump, guc_load_err_dump_size); static int guc_log_level_get(void *data, u64 *val) { From patchwork Sat Dec 11 06:58:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12671721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DDE0AC433F5 for ; Sat, 11 Dec 2021 06:59:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 56A1110E3BF; Sat, 11 Dec 2021 06:59:02 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1DF0510E3A3; Sat, 11 Dec 2021 06:59:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639205941; x=1670741941; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rS++CehQW3TfrjeWNSJ76JoPdSz6vRL1Mp0wJnUGjs8=; b=Fll3rPEZi2TdDvJu9M41ccg2G/9x8Q08HZ50Fz2tV+coxuAtCxJf2dbu ThCHPKGWe25Cy8+QDX7g3kAiTTi3EtPw1d8So0h/F+Sp5J57AmyKWJ3L/ RQBW+MbX5h3pbnsMsgTGquM8WSbAqVNMtbDpMSPvc6S5+rWepXQCoK6lU tlAIsruBW8GhYwus/0C+otU3KS7KEoUmeoWU/NhpHVgdPTUFxQrb7cL01 LZVxK/+UcfU+IUOyu+FPBvSjVWO/3vALWHjoColyBOCilwPjRHoaVsx8g hIt95KeTz/jJ536xcv3SXSaEj/8MxdEoVZIelS8UdIViR/SL9unXSFtNa A==; X-IronPort-AV: E=McAfee;i="6200,9189,10194"; a="237249047" X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="237249047" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Dec 2021 22:59:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="463979854" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by orsmga006.jf.intel.com with ESMTP; 10 Dec 2021 22:58:59 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH 2/4] drm/i915/guc: Increase GuC log size for CONFIG_DEBUG_GEM Date: Fri, 10 Dec 2021 22:58:57 -0800 Message-Id: <20211211065859.2248188-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211211065859.2248188-1-John.C.Harrison@Intel.com> References: <20211211065859.2248188-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Brost , John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison Lots of testing is done with the DEBUG_GEM config option enabled but not the DEBUG_GUC option. That means we only get teeny-tiny GuC logs which are not hugely useful. Enabling full DEBUG_GUC also spews lots of other detailed output that is not generally desired. However, bigger GuC logs are extremely useful for almost any regression debug. So enable bigger logs for DEBUG_GEM builds as well. Signed-off-by: John Harrison Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_log.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h index ac1ee1d5ce10..fe6ab7550a14 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h @@ -15,9 +15,12 @@ struct intel_guc; -#ifdef CONFIG_DRM_I915_DEBUG_GUC +#if defined(CONFIG_DRM_I915_DEBUG_GUC) #define CRASH_BUFFER_SIZE SZ_2M #define DEBUG_BUFFER_SIZE SZ_16M +#elif defined(CONFIG_DRM_I915_DEBUG_GEM) +#define CRASH_BUFFER_SIZE SZ_1M +#define DEBUG_BUFFER_SIZE SZ_2M #else #define CRASH_BUFFER_SIZE SZ_8K #define DEBUG_BUFFER_SIZE SZ_64K From patchwork Sat Dec 11 06:58:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12671725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E2644C433EF for ; Sat, 11 Dec 2021 06:59:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 92B4010E5D5; Sat, 11 Dec 2021 06:59:07 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 407E110E3BF; Sat, 11 Dec 2021 06:59:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639205941; x=1670741941; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Lz6kcdq5Ya6xFKebpRqe9l48NRg0MJy/TW8PC0vQhBA=; b=CuRk3/gsaQAJn80tiAlh5T+wGccfQix6beVN745KeMd9R8+vsR8hRBSR Rc1SjLlu/DmMWND324R/pID+cj9LtyutahiOaa7ycBhor7UFy5AsDyKcY 66XS8xgc20T1fTubkIEFn8B7+F8o0eUS+/CDmyMpnwVz36MjkncvY2Xeo BDLQugwrnEu1+Pvzz5ta9zcCKQUoUlZowpljK2HqXXSeO3urcev9jJQVB b+Dp2AHZhOwW4gERruLIsXYYnVsQqXeTpcDrEjUhvxDyTg5EmCCMir2vh +c8H28tx9RC5TXrYAIh1Jb68P9KFTZhyyAnUrIQSenOf1lrPlmKYhInA0 g==; X-IronPort-AV: E=McAfee;i="6200,9189,10194"; a="237249048" X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="237249048" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Dec 2021 22:59:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="463979857" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by orsmga006.jf.intel.com with ESMTP; 10 Dec 2021 22:58:59 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH 3/4] drm/i915/guc: Flag an error if an engine reset fails Date: Fri, 10 Dec 2021 22:58:58 -0800 Message-Id: <20211211065859.2248188-4-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211211065859.2248188-1-John.C.Harrison@Intel.com> References: <20211211065859.2248188-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison If GuC encounters an error during engine reset, the i915 driver promotes to full GT reset. This includes an info message about why the reset is happening. However, that is not treated as a failure by any of the CI systems because resets are an expected occurrance during testing. This kind of failure is a major problem and should never happen. So, complain more loudly and make sure CI notices. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 97311119da6f..6015815f1da0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4018,11 +4018,12 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, const u32 *msg, u32 len) { struct intel_engine_cs *engine; + struct intel_gt *gt = guc_to_gt(guc); u8 guc_class, instance; u32 reason; if (unlikely(len != 3)) { - drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + drm_err(>->i915->drm, "Invalid length %u", len); return -EPROTO; } @@ -4032,12 +4033,19 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, engine = guc_lookup_engine(guc, guc_class, instance); if (unlikely(!engine)) { - drm_err(&guc_to_gt(guc)->i915->drm, + drm_err(>->i915->drm, "Invalid engine %d:%d", guc_class, instance); return -EPROTO; } - intel_gt_handle_error(guc_to_gt(guc), engine->mask, + /* + * This is an unexpected failure of a hardware feature. So, log a real + * error message not just the informational that comes with the reset. + */ + drm_err(>->i915->drm, "GuC engine reset request failed on %d:%d (%s) because %d", + guc_class, instance, engine->name, reason); + + intel_gt_handle_error(gt, engine->mask, I915_ERROR_CAPTURE, "GuC failed to reset %s (reason=0x%08x)\n", engine->name, reason); From patchwork Sat Dec 11 06:58:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Harrison X-Patchwork-Id: 12671727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BFB1FC433EF for ; Sat, 11 Dec 2021 06:59:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B333410E603; Sat, 11 Dec 2021 06:59:07 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 621B910E3A3; Sat, 11 Dec 2021 06:59:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639205941; x=1670741941; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rIJnYoAjhrOOkWPNRWUi/BrcwvPrM/K4Ak699pCY9Dk=; b=TK12NSGDh+sdm3Qrbi7a/jDAfeHYPRTyNKx9pIErqXlfNDvp1ZRE57Uz g1AsPNAM8yvLMrwG/uoUWUZv+MyvN4eBi79jA9y64ywex2igdcUfFbI/x 7v8afk9BYM6tOENgBqK0fUI1j/gAsCqn+JUDPDX9FIUvJFopW1kBHTo1E t5XnCk9v5a58TD9ubAKuFpFRFhWX68c/rHSXTiQUHTFNji4YkraE76qgD mfYjRIXQWQ+BlHq3dqH/xddxI3mOH9MIzEnwyjerzsnpFCS4mbzREuKKp VJSVXenslN31zlrbL4ulK7IsteVcOlMIpopDCwZ6TndrsDJc/7ojAjjuB A==; X-IronPort-AV: E=McAfee;i="6200,9189,10194"; a="237249049" X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="237249049" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Dec 2021 22:59:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="463979859" Received: from relo-linux-5.jf.intel.com ([10.165.21.134]) by orsmga006.jf.intel.com with ESMTP; 10 Dec 2021 22:58:59 -0800 From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Subject: [PATCH 4/4] drm/i915: Improve long running OCL w/a for GuC submission Date: Fri, 10 Dec 2021 22:58:59 -0800 Message-Id: <20211211065859.2248188-5-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211211065859.2248188-1-John.C.Harrison@Intel.com> References: <20211211065859.2248188-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: John Harrison , DRI-Devel@Lists.FreeDesktop.Org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: John Harrison A workaround was added to the driver to allow OpenCL workloads to run 'forever' by disabling pre-emption on the RCS engine for Gen12. It is not totally unbound as the heartbeat will kick in eventually and cause a reset of the hung engine. However, this does not work well in GuC submission mode. In GuC mode, the pre-emption timeout is how GuC detects hung contexts and triggers a per engine reset. Thus, disabling the timeout means also losing all per engine reset ability. A full GT reset will still occur when the heartbeat finally expires, but that is a much more destructive and undesirable mechanism. The purpose of the workaround is actually to give OpenCL tasks longer to reach a pre-emption point after a pre-emption request has been issued. This is necessary because Gen12 does not support mid-thread pre-emption and OpenCL can have long running threads. So, rather than disabling the timeout completely, just set it to a 'long' value. Likewise, bump the heartbeat interval. That gives the OpenCL thread sufficient time to reach a pre-emption point without being killed off either by the GuC or by the heartbeat. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 42 +++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 352254e001b4..26af8d60fe2b 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -382,9 +382,45 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, engine->props.timeslice_duration_ms = CONFIG_DRM_I915_TIMESLICE_DURATION; - /* Override to uninterruptible for OpenCL workloads. */ - if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) - engine->props.preempt_timeout_ms = 0; + /* + * Mid-thread pre-emption is not available in Gen12. Unfortunately, + * some OpenCL workloads run quite long threads. That means they get + * reset due to not pre-empting in a timely manner. + * The execlist solution was to disable pre-emption completely. + * However, pre-emption timeouts are the way GuC detects hung contexts + * and triggers engine resets. Thus, without pre-emption, there is no + * per engine reset. And full GT reset is much more intrusive. So keep + * the timeout for GuC submission platforms and just bump it to be + * much larger. Also bump the heartbeat timeout to match, otherwise + * the heartbeat can expire before the pre-emption can timeout and + * thus trigger a full GT reset anyway. + */ + if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) { + if (intel_uc_wants_guc_submission(>->uc)) { + const unsigned long min_preempt = 7500; + const unsigned long min_beat = 5000; + + if (engine->props.preempt_timeout_ms && + engine->props.preempt_timeout_ms < min_preempt) { + drm_info(>->i915->drm, "Bumping pre-emption timeout from %ld to %ld on %s to allow slow compute pre-emption\n", + engine->props.preempt_timeout_ms, + min_preempt, engine->name); + + engine->props.preempt_timeout_ms = min_preempt; + } + + if (engine->props.heartbeat_interval_ms && + engine->props.heartbeat_interval_ms < min_beat) { + drm_info(>->i915->drm, "Bumping heartbeat interval from %ld to %ld on %s to allow slow compute pre-emption\n", + engine->props.heartbeat_interval_ms, + min_beat, engine->name); + + engine->props.heartbeat_interval_ms = min_beat; + } + } else { + engine->props.preempt_timeout_ms = 0; + } + } engine->defaults = engine->props; /* never to change again */