From patchwork Fri Apr 11 13:04:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Khatri X-Patchwork-Id: 14048377 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 75807C369A8 for ; Fri, 11 Apr 2025 13:05:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0D7E410EBAE; Fri, 11 Apr 2025 13:05:47 +0000 (UTC) Received: from rtg-sunil-navi33.amd.com (unknown [165.204.156.251]) by gabe.freedesktop.org (Postfix) with ESMTPS id 65D2410EBA8; Fri, 11 Apr 2025 13:05:43 +0000 (UTC) Received: from rtg-sunil-navi33.amd.com (localhost [127.0.0.1]) by rtg-sunil-navi33.amd.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 53BD5XN64105017; Fri, 11 Apr 2025 18:35:33 +0530 Received: (from sunil@localhost) by rtg-sunil-navi33.amd.com (8.15.2/8.15.2/Submit) id 53BD5XvR4105016; Fri, 11 Apr 2025 18:35:33 +0530 From: Sunil Khatri To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Cc: Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , Sunil Khatri Subject: [PATCH v1 1/3] drm: function to get process name and pid Date: Fri, 11 Apr 2025 18:34:26 +0530 Message-Id: <20250411130428.4104957-1-sunil.khatri@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add helper function which get the process information for the drm_file and updates the user provided character buffer with the information of process name and pid as a string. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/drm_file.c | 30 ++++++++++++++++++++++++++++++ include/drm/drm_file.h | 1 + 2 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index cb5f22f5bbb6..4434258d21b5 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -965,6 +965,36 @@ void drm_show_fdinfo(struct seq_file *m, struct file *f) } EXPORT_SYMBOL(drm_show_fdinfo); +/** + * drm_process_info - Fill info string with process name and pid + * @file_priv: context of interest for process name and pid + * @proc_info: user char ptr to write the string to + * @buff_size: size of the buffer passed for the string + * + * This update the user provided buffer with process + * name and pid information for @file_priv + */ +void drm_process_info(struct drm_file *file_priv, char *proc_info, size_t buff_size) +{ + struct task_struct *task; + struct pid *pid; + struct drm_device *dev = file_priv->minor->dev; + + if (!proc_info) { + drm_err(dev, "Invalid user buffer\n"); + return; + } + + rcu_read_lock(); + pid = rcu_dereference(file_priv->pid); + task = pid_task(pid, PIDTYPE_TGID); + if (task) + snprintf(proc_info, buff_size, "comm:%s pid:%d", task->comm, task->pid); + + rcu_read_unlock(); +} +EXPORT_SYMBOL(drm_process_info); + /** * mock_drm_getfile - Create a new struct file for the drm device * @minor: drm minor to wrap (e.g. #drm_device.primary) diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index f0ef32e9fa5e..c01b34936968 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -501,6 +501,7 @@ void drm_print_memory_stats(struct drm_printer *p, void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file); void drm_show_fdinfo(struct seq_file *m, struct file *f); +void drm_process_info(struct drm_file *file_priv, char *proc_info, size_t buff_size); struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags); From patchwork Fri Apr 11 13:04:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Khatri X-Patchwork-Id: 14048375 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4AAEEC36010 for ; Fri, 11 Apr 2025 13:05:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 79BA410EB9B; Fri, 11 Apr 2025 13:05:44 +0000 (UTC) Received: from rtg-sunil-navi33.amd.com (unknown [165.204.156.251]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3E2B710EB9B; Fri, 11 Apr 2025 13:05:43 +0000 (UTC) Received: from rtg-sunil-navi33.amd.com (localhost [127.0.0.1]) by rtg-sunil-navi33.amd.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 53BD5XIp4105022; Fri, 11 Apr 2025 18:35:33 +0530 Received: (from sunil@localhost) by rtg-sunil-navi33.amd.com (8.15.2/8.15.2/Submit) id 53BD5XDR4105021; Fri, 11 Apr 2025 18:35:33 +0530 From: Sunil Khatri To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Cc: Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , Sunil Khatri Subject: [PATCH v1 2/3] drm/amdgpu: add drm_file reference in userq_mgr Date: Fri, 11 Apr 2025 18:34:27 +0530 Message-Id: <20250411130428.4104957-2-sunil.khatri@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250411130428.4104957-1-sunil.khatri@amd.com> References: <20250411130428.4104957-1-sunil.khatri@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" drm_file will be used in usermode queues code to enable better process information in logging and hence add drm_file part of the userq_mgr struct. update the drm_file pointer in userq_mgr for each amdgpu_driver_open_kms. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 0ba3ef1e4a06..2a6ecf0d6c78 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1436,6 +1436,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev); + fpriv->userq_mgr.file = file_priv; r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev); if (r) DRM_WARN("Can't setup usermode queues, use legacy workload submission only\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h index ec1a4ca6f632..4ddd41835be6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h @@ -77,6 +77,7 @@ struct amdgpu_userq_mgr { struct amdgpu_device *adev; struct delayed_work resume_work; struct list_head list; + struct drm_file *file; }; struct amdgpu_db_info { From patchwork Fri Apr 11 13:04:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Khatri X-Patchwork-Id: 14048376 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3081CC369A9 for ; Fri, 11 Apr 2025 13:05:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C9D3F10EB98; Fri, 11 Apr 2025 13:05:46 +0000 (UTC) Received: from rtg-sunil-navi33.amd.com (unknown [165.204.156.251]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6B61910EBA9; Fri, 11 Apr 2025 13:05:43 +0000 (UTC) Received: from rtg-sunil-navi33.amd.com (localhost [127.0.0.1]) by rtg-sunil-navi33.amd.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 53BD5XD94105027; Fri, 11 Apr 2025 18:35:33 +0530 Received: (from sunil@localhost) by rtg-sunil-navi33.amd.com (8.15.2/8.15.2/Submit) id 53BD5Xdk4105026; Fri, 11 Apr 2025 18:35:33 +0530 From: Sunil Khatri To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Cc: Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , Sunil Khatri Subject: [PATCH v1 3/3] drm/amdgpu: update the error logging for more information Date: Fri, 11 Apr 2025 18:34:28 +0530 Message-Id: <20250411130428.4104957-3-sunil.khatri@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250411130428.4104957-1-sunil.khatri@amd.com> References: <20250411130428.4104957-1-sunil.khatri@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" add process and pid information in the userqueue error logging to make it more useful in resolving the error by logs. Sample log: [ 42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] *ERROR* Timed out waiting for fence f=000000001c74d978 for comm:Xwayland pid:3427 [ 42.444669] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not suspending userqueue, timeout waiting for comm:Xwayland pid:3427 [ 42.824729] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] *ERROR* Timed out waiting for fence f=0000000074407d3e for comm:systemd-logind pid:1058 [ 42.825082] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not suspending userqueue, timeout waiting for comm:systemd-logind pid:1058 Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 45 +++++++++++++++---- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c index ecd49cf15b2a..5b58c41618ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c @@ -62,12 +62,17 @@ amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_device *adev = uq_mgr->adev; const struct amdgpu_userq_funcs *uq_funcs = adev->userq_funcs[queue->queue_type]; struct dma_fence *f = queue->last_fence; + struct drm_file *file; + char proc_log[50]; int ret; if (f && !dma_fence_is_signaled(f)) { ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100)); if (ret <= 0) { - DRM_ERROR("Timed out waiting for fence f=%p\n", f); + file = uq_mgr->file; + drm_process_info(file, proc_log, sizeof(proc_log)); + DRM_ERROR("Timed out waiting for fence f=%p for %s\n", + f, proc_log); return; } } @@ -427,6 +432,8 @@ amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr) const struct amdgpu_userq_funcs *userq_funcs; struct amdgpu_usermode_queue *queue; int queue_id; + struct drm_file *file; + char proc_log[50]; int ret = 0; /* Resume all the queues for this process */ @@ -435,8 +442,12 @@ amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr) ret = userq_funcs->resume(uq_mgr, queue); } - if (ret) - DRM_ERROR("Failed to resume all the queue\n"); + if (ret) { + file = uq_mgr->file; + drm_process_info(file, proc_log, sizeof(proc_log)); + DRM_ERROR("Failed to resume all the queue for %s\n", + proc_log); + } return ret; } @@ -585,6 +596,8 @@ amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr) const struct amdgpu_userq_funcs *userq_funcs; struct amdgpu_usermode_queue *queue; int queue_id; + struct drm_file *file; + char proc_log[50]; int ret = 0; /* Try to suspend all the queues in this process ctx */ @@ -593,8 +606,12 @@ amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr) ret += userq_funcs->suspend(uq_mgr, queue); } - if (ret) - DRM_ERROR("Couldn't suspend all the queues\n"); + if (ret) { + file = uq_mgr->file; + drm_process_info(file, proc_log, sizeof(proc_log)); + DRM_ERROR("Couldn't suspend all the queues for %s\n", + proc_log); + } return ret; } @@ -602,6 +619,8 @@ static int amdgpu_userqueue_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr) { struct amdgpu_usermode_queue *queue; + struct drm_file *file; + char proc_log[50]; int queue_id, ret; idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) { @@ -611,7 +630,10 @@ amdgpu_userqueue_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr) continue; ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100)); if (ret <= 0) { - DRM_ERROR("Timed out waiting for fence f=%p\n", f); + file = uq_mgr->file; + drm_process_info(file, proc_log, sizeof(proc_log)); + DRM_ERROR("Timed out waiting for fence f=%p for %s\n", + f, proc_log); return -ETIMEDOUT; } } @@ -624,19 +646,26 @@ amdgpu_userqueue_suspend(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_eviction_fence *ev_fence) { int ret; + struct drm_file *file; + char proc_log[50]; struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr); struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr; /* Wait for any pending userqueue fence work to finish */ ret = amdgpu_userqueue_wait_for_signal(uq_mgr); if (ret) { - DRM_ERROR("Not suspending userqueue, timeout waiting for work\n"); + file = uq_mgr->file; + drm_process_info(file, proc_log, sizeof(proc_log)); + DRM_ERROR("Not suspending userqueue, timeout waiting for %s\n", + proc_log); return; } ret = amdgpu_userqueue_suspend_all(uq_mgr); if (ret) { - DRM_ERROR("Failed to evict userqueue\n"); + file = uq_mgr->file; + drm_process_info(file, proc_log, sizeof(proc_log)); + DRM_ERROR("Failed to evict userqueue for %s\n", proc_log); return; }