From patchwork Sat Oct 26 06:26:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lucas De Marchi X-Patchwork-Id: 13852074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 45785D10BE3 for ; Sat, 26 Oct 2024 06:27:10 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B3FDE10E3FF; Sat, 26 Oct 2024 06:27:09 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="gCQ4nC55"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by gabe.freedesktop.org (Postfix) with ESMTPS id C38CB10E402 for ; Sat, 26 Oct 2024 06:27:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729924029; x=1761460029; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zeK52cArmpTkxk/NIpqdBV8glIVOguZdhY4Mckk22Qc=; b=gCQ4nC55ZarrApaiWnD0IydV94x0L2k2N1F6Mf/grh3tm0rhrKmh4IvV +yJainbyd2Q5WfOZj9uKR5QySIOSfEKAVUILLrj8PVnnILSNSwTRKbZX7 m5L+GYnlOtrF+MygH0yPy+JOfJoBufkyIY3ViN/V7XJr+XHvmqW0Wp43m yuncIiRKjEbkbG5OUeeN6oxhCU2/oRZOoya2kH3FLh5Lo9lFrPU4YY3Rn WIlDJpQI/W4lrwh7u3fyFdOn4sZVo1L3w+K/83MNm7HCsmqMktHRlKbsq f4FcjBHiFRF4qAAV++dgq/Yg0eWgtglNZ4mSF2oGsy5Z1LNpaH2wzkQ6E Q==; X-CSE-ConnectionGUID: WtCT2sbHQ2eD241nQ+MMfw== X-CSE-MsgGUID: 0zDX8+i+TYOBX8Ylj7TykA== X-IronPort-AV: E=McAfee;i="6700,10204,11236"; a="40177200" X-IronPort-AV: E=Sophos;i="6.11,234,1725346800"; d="scan'208";a="40177200" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2024 23:27:08 -0700 X-CSE-ConnectionGUID: f3MN4rTHRD2m0a+v0bvzMA== X-CSE-MsgGUID: DhAd/9I6ToKZeC66srGYKw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,234,1725346800"; d="scan'208";a="118586607" Received: from ldmartin-desk2.corp.intel.com (HELO ldmartin-desk2.lan) ([10.125.111.191]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2024 23:27:08 -0700 From: Lucas De Marchi To: intel-gfx@lists.freedesktop.org Cc: Jonathan Cavitt , Umesh Nerlige Ramappa , Lucas De Marchi Subject: [PATCH 1/3] drm/xe: Add trace to lrc timestamp update Date: Sat, 26 Oct 2024 01:26:56 -0500 Message-ID: <20241026062658.28060-2-lucas.demarchi@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241026062658.28060-1-lucas.demarchi@intel.com> References: <20241026062658.28060-1-lucas.demarchi@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Help debugging when LRC timestamp is updated for a exec queue. Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/xe_lrc.c | 3 ++ drivers/gpu/drm/xe/xe_trace_lrc.c | 9 ++++++ drivers/gpu/drm/xe/xe_trace_lrc.h | 52 +++++++++++++++++++++++++++++++ 4 files changed, 65 insertions(+) create mode 100644 drivers/gpu/drm/xe/xe_trace_lrc.c create mode 100644 drivers/gpu/drm/xe/xe_trace_lrc.h diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index bc7a04ce69fd..21d69935c336 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -101,6 +101,7 @@ xe-y += xe_bb.o \ xe_trace.o \ xe_trace_bo.o \ xe_trace_guc.o \ + xe_trace_lrc.o \ xe_ttm_sys_mgr.o \ xe_ttm_stolen_mgr.o \ xe_ttm_vram_mgr.o \ diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c index 4f64c7f4e68d..4b65da77c6e0 100644 --- a/drivers/gpu/drm/xe/xe_lrc.c +++ b/drivers/gpu/drm/xe/xe_lrc.c @@ -25,6 +25,7 @@ #include "xe_map.h" #include "xe_memirq.h" #include "xe_sriov.h" +#include "xe_trace_lrc.h" #include "xe_vm.h" #include "xe_wa.h" @@ -1758,5 +1759,7 @@ u32 xe_lrc_update_timestamp(struct xe_lrc *lrc, u32 *old_ts) lrc->ctx_timestamp = xe_lrc_ctx_timestamp(lrc); + trace_xe_lrc_update_timestamp(lrc, *old_ts); + return lrc->ctx_timestamp; } diff --git a/drivers/gpu/drm/xe/xe_trace_lrc.c b/drivers/gpu/drm/xe/xe_trace_lrc.c new file mode 100644 index 000000000000..ab9b7e2970bc --- /dev/null +++ b/drivers/gpu/drm/xe/xe_trace_lrc.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef __CHECKER__ +#define CREATE_TRACE_POINTS +#include "xe_trace_lrc.h" +#endif diff --git a/drivers/gpu/drm/xe/xe_trace_lrc.h b/drivers/gpu/drm/xe/xe_trace_lrc.h new file mode 100644 index 000000000000..5c669a0b2180 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_trace_lrc.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright © 2024 Intel Corporation + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM xe + +#if !defined(_XE_TRACE_LRC_H_) || defined(TRACE_HEADER_MULTI_READ) +#define _XE_TRACE_LRC_H_ + +#include +#include + +#include "xe_gt_types.h" +#include "xe_lrc.h" +#include "xe_lrc_types.h" + +#define __dev_name_lrc(lrc) dev_name(gt_to_xe((lrc)->fence_ctx.gt)->drm.dev) + +TRACE_EVENT(xe_lrc_update_timestamp, + TP_PROTO(struct xe_lrc *lrc, uint32_t old), + TP_ARGS(lrc, old), + TP_STRUCT__entry( + __field(struct xe_lrc *, lrc) + __field(u32, old) + __field(u32, new) + __string(name, lrc->fence_ctx.name) + __string(device_id, __dev_name_lrc(lrc)) + ), + + TP_fast_assign( + __entry->lrc = lrc; + __entry->old = old; + __entry->new = lrc->ctx_timestamp; + __assign_str(name); + __assign_str(device_id); + ), + TP_printk("lrc=:%p lrc->name=%s old=%u new=%u device_id:%s", + __entry->lrc, __get_str(name), + __entry->old, __entry->new, + __get_str(device_id)) +); + +#endif + +/* This part must be outside protection */ +#undef TRACE_INCLUDE_PATH +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_PATH ../../drivers/gpu/drm/xe +#define TRACE_INCLUDE_FILE xe_trace_lrc +#include From patchwork Sat Oct 26 06:26:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas De Marchi X-Patchwork-Id: 13852077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 747FED10BE7 for ; Sat, 26 Oct 2024 06:27:14 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 10C5C10EB7E; Sat, 26 Oct 2024 06:27:14 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="kng8FBaB"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by gabe.freedesktop.org (Postfix) with ESMTPS id 521C110E3FF for ; Sat, 26 Oct 2024 06:27:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729924029; x=1761460029; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EuoIEako/BaOClz/Tl33jfqFXdSg5lrE2smGeabOIIs=; b=kng8FBaBRAgjnMTh2fsHrc+H67oJzLVE8Wp7Ji51n5zdmC4AeSMAD+hH Bj1jX92+b99wTlH382QInENEP1S/535ZTvw0VglzTOH5jGoNPeZcZhp4Y eKHviCmt2ysTg6y+SLonTh0Y10MivxGq9ErjwKZqh9StK0PYWM+3+zw7g 8kMZgKTu2YTsRfNrfymlwqHKCE9uenPx/dtY1XvGs17OpyE72IzCqWQur JglFGWvFi0DgYwar2Gmr9v/ai6mj9MugFCjwi5yg9OxatcbUSozxsXOSi G+Ht6BIGoQc2fIGEFV+NAMG2UzY4StTRIEuaPrW77BSCutTX+jRIWJmxJ w==; X-CSE-ConnectionGUID: dm0xywWgQ4SxgEJnSwZG1g== X-CSE-MsgGUID: N4lvGEqaTmaxHo4mN5y4fw== X-IronPort-AV: E=McAfee;i="6700,10204,11236"; a="40177201" X-IronPort-AV: E=Sophos;i="6.11,234,1725346800"; d="scan'208";a="40177201" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2024 23:27:09 -0700 X-CSE-ConnectionGUID: RRJb0c2LQzuoDmT/oeePyg== X-CSE-MsgGUID: qSnoPxKcS3StPe6SotAC6w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,234,1725346800"; d="scan'208";a="118586612" Received: from ldmartin-desk2.corp.intel.com (HELO ldmartin-desk2.lan) ([10.125.111.191]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2024 23:27:09 -0700 From: Lucas De Marchi To: intel-gfx@lists.freedesktop.org Cc: Jonathan Cavitt , Umesh Nerlige Ramappa , Lucas De Marchi Subject: [PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy Date: Sat, 26 Oct 2024 01:26:57 -0500 Message-ID: <20241026062658.28060-3-lucas.demarchi@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241026062658.28060-1-lucas.demarchi@intel.com> References: <20241026062658.28060-1-lucas.demarchi@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When the exec queue is destroyed, there's a race between a query to the fdinfo and the exec queue value being updated: after the destroy ioctl, if the fdinfo is queried before a call to guc_exec_queue_free_job(), the wrong utilization is reported: it's not accumulated on the query since the queue was removed from the array, and the value wasn't updated yet by the free_job(). Explicitly accumulate the engine utilization so the right value is visible after the ioctl return. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667 Cc: Jonathan Cavitt Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index d098d2dd1b2d..b15ca84b2422 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -829,6 +829,14 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data, xe_exec_queue_kill(q); + /* + * After killing and destroying the exec queue, make sure userspace has + * an updated view of the run ticks, regardless if this was the last + * ref: since the exec queue is removed from xef->exec_queue.xa, a + * query to fdinfo after this returns could not account for this load. + */ + xe_exec_queue_update_run_ticks(q); + trace_xe_exec_queue_close(q); xe_exec_queue_put(q); From patchwork Sat Oct 26 06:26:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lucas De Marchi X-Patchwork-Id: 13852076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 780F1D10BE3 for ; Sat, 26 Oct 2024 06:27:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1C1C110E411; Sat, 26 Oct 2024 06:27:13 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="WbmJjdTy"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by gabe.freedesktop.org (Postfix) with ESMTPS id CF97710E402 for ; Sat, 26 Oct 2024 06:27:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729924030; x=1761460030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MAGouAwClj/J/WNsHAA7MpiV14Iy2Xvb6ZX6Oi2b/Us=; b=WbmJjdTyeFwkNrSMbfnBuRvy/v+5WDjliIwGYI4Pd70gAoYcdIy/EJIw K+CT26xPY0KQY9D6pzwAvLghrpfOGjQKx+GxgLUN/A9cuTjt2kfLNQdXG 0WqSiLPFDibMAPatfy6lBIHjprbeWLgFp/v7FQ1Ln/8bM4ZgPI80JVBXG nKn9c08+W3tjX8zEGJi+cgxs7FYFujTt4fjd+b1sYI1B25ksDHOtBqRId ORdywTTn/5wS6oAzR2ArLCG30XC63G5riWRdNgALC+/p81LlIIM0qLHXc 45P/zI+4+e+cl7f82EAHmL8v5gDi4tyZN3EUWWD3HeMc/P/X0x6pf6sUk g==; X-CSE-ConnectionGUID: PAKOjE6/T7+lb77b6i0OEA== X-CSE-MsgGUID: 2wjCGJ54Q6yKxw8W9vCB+g== X-IronPort-AV: E=McAfee;i="6700,10204,11236"; a="40177202" X-IronPort-AV: E=Sophos;i="6.11,234,1725346800"; d="scan'208";a="40177202" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2024 23:27:09 -0700 X-CSE-ConnectionGUID: xtZ1fX+hTJGsehoLPVHPyg== X-CSE-MsgGUID: D+lDKBRlRQGR6VFaPVQ5Hg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,234,1725346800"; d="scan'208";a="118586616" Received: from ldmartin-desk2.corp.intel.com (HELO ldmartin-desk2.lan) ([10.125.111.191]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2024 23:27:09 -0700 From: Lucas De Marchi To: intel-gfx@lists.freedesktop.org Cc: Jonathan Cavitt , Umesh Nerlige Ramappa , Lucas De Marchi Subject: [PATCH 3/3] drm/xe: Stop accumulating LRC timestamp on job_free Date: Sat, 26 Oct 2024 01:26:58 -0500 Message-ID: <20241026062658.28060-4-lucas.demarchi@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241026062658.28060-1-lucas.demarchi@intel.com> References: <20241026062658.28060-1-lucas.demarchi@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The exec queue timestamp is only really useful when it's being queried through the fdinfo. There's no need to update it so often, on every job_free. Tracing a simple app like vkcube running shows an update rate of ~ 120Hz. The update on job_free() is used to cover a gap: if exec queue is created and destroyed rapidily, before a new query, the timestamp still needs to be accumulated and accounted on the xef. Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") couldn't do it on the exec_queue_fini since the xef could be gone at that point. However since commit ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is refcounted and the exec queue has a reference. Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") by reducing the frequency in which the update is needed. Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/xe/xe_exec_queue.c | 6 ++++++ drivers/gpu/drm/xe/xe_guc_submit.c | 2 -- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index b15ca84b2422..bc2fc917e0de 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -260,8 +260,14 @@ void xe_exec_queue_fini(struct xe_exec_queue *q) { int i; + /* + * Before releasing our ref to lrc and xef, accumulate our run ticks + */ + xe_exec_queue_update_run_ticks(q); + for (i = 0; i < q->width; ++i) xe_lrc_put(q->lrc[i]); + __xe_exec_queue_free(q); } diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index e5d7c767a744..ebe4665d9159 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -747,8 +747,6 @@ static void guc_exec_queue_free_job(struct drm_sched_job *drm_job) { struct xe_sched_job *job = to_xe_sched_job(drm_job); - xe_exec_queue_update_run_ticks(job->q); - trace_xe_sched_job_free(job); xe_sched_job_put(job); }