From patchwork Fri Jan 26 21:08:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Crouse X-Patchwork-Id: 10187017 X-Patchwork-Delegate: agross@codeaurora.org Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 42A39602C8 for ; Fri, 26 Jan 2018 21:08:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 340F22A81D for ; Fri, 26 Jan 2018 21:08:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2844B2A822; Fri, 26 Jan 2018 21:08:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BDD42A80F for ; Fri, 26 Jan 2018 21:08:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751904AbeAZVI5 (ORCPT ); Fri, 26 Jan 2018 16:08:57 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:59984 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751466AbeAZVI4 (ORCPT ); Fri, 26 Jan 2018 16:08:56 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id C5E6E60A24; Fri, 26 Jan 2018 21:08:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1517000935; bh=dContCckUj+HEeTcbPByRN/8PGPkBzVrjdfALlab3hI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lCEdXpag3ObSb5cC8nxcr6Pk6QTrq1QyUR3tZQuT5bqNpX5NoGCvTdD+iWjN17EqH QfbzrB0wnp+TXuq68NF88rn4hS6MpcdcTIe3nQ/SXMhL5jaqsrHL9p6tVPJcFuWUnl dSADjiyvv5P4PQhqnMIUyYf1giS7l45tUSE/+5Go= Received: from jcrouse-lnx.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: jcrouse@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id D5DA8607EB; Fri, 26 Jan 2018 21:08:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1517000935; bh=dContCckUj+HEeTcbPByRN/8PGPkBzVrjdfALlab3hI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lCEdXpag3ObSb5cC8nxcr6Pk6QTrq1QyUR3tZQuT5bqNpX5NoGCvTdD+iWjN17EqH QfbzrB0wnp+TXuq68NF88rn4hS6MpcdcTIe3nQ/SXMhL5jaqsrHL9p6tVPJcFuWUnl dSADjiyvv5P4PQhqnMIUyYf1giS7l45tUSE/+5Go= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org D5DA8607EB Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=jcrouse@codeaurora.org From: Jordan Crouse To: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org Subject: [PATCH 3/6] drm/msm: gpu: Capture the GPU state on a GPU hang Date: Fri, 26 Jan 2018 14:08:47 -0700 Message-Id: <1517000930-1893-4-git-send-email-jcrouse@codeaurora.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1517000930-1893-1-git-send-email-jcrouse@codeaurora.org> References: <1517000930-1893-1-git-send-email-jcrouse@codeaurora.org> Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Capture the GPU state on a GPU hang and store it for later playback using the 'crash' node in the debugfs directory. Only one crash state is stored at a time on the assumption that the first hang is usually the most interesting. The existing crash state can be cleared by writing to the debugfs node and then a new one will be captured on the next hang. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 18 ++++++++-- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 2 +- drivers/gpu/drm/msm/msm_debugfs.c | 61 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_gpu.c | 47 ++++++++++++++++++++----- drivers/gpu/drm/msm/msm_gpu.h | 36 ++++++++++++++++++- 5 files changed, 151 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 81da214..963fce3 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -372,6 +372,8 @@ struct msm_gpu_state *adreno_gpu_state_get(struct msm_gpu *gpu) if (!state) return ERR_PTR(-ENOMEM); + kref_init(&state->ref); + do_gettimeofday(&state->time); for (i = 0; i < gpu->nr_rings; i++) { @@ -407,15 +409,25 @@ struct msm_gpu_state *adreno_gpu_state_get(struct msm_gpu *gpu) return state; } -void adreno_gpu_state_put(struct msm_gpu_state *state) +static void adreno_gpu_state_destroy(struct kref *kref) { - if (IS_ERR_OR_NULL(state)) - return; + struct msm_gpu_state *state = container_of(kref, + struct msm_gpu_state, ref); + kfree(state->comm); + kfree(state->cmd); kfree(state->registers); kfree(state); } +int adreno_gpu_state_put(struct msm_gpu_state *state) +{ + if (IS_ERR_OR_NULL(state)) + return 1; + + return kref_put(&state->ref, adreno_gpu_state_destroy); +} + #ifdef CONFIG_DEBUG_FS void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state, struct seq_file *m) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index b44e0b9..bcf755e 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h @@ -221,7 +221,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, void adreno_gpu_cleanup(struct adreno_gpu *gpu); struct msm_gpu_state *adreno_gpu_state_get(struct msm_gpu *gpu); -void adreno_gpu_state_put(struct msm_gpu_state *state); +int adreno_gpu_state_put(struct msm_gpu_state *state); /* ringbuffer helpers (the parts that are adreno specific) */ diff --git a/drivers/gpu/drm/msm/msm_debugfs.c b/drivers/gpu/drm/msm/msm_debugfs.c index 89ee74b..50e049c 100644 --- a/drivers/gpu/drm/msm/msm_debugfs.c +++ b/drivers/gpu/drm/msm/msm_debugfs.c @@ -16,11 +16,69 @@ */ #ifdef CONFIG_DEBUG_FS + +#include +#include #include "msm_drv.h" #include "msm_gpu.h" #include "msm_kms.h" #include "msm_debugfs.h" +static int msm_gpu_crash_show(struct seq_file *m, void *data) +{ + struct msm_gpu *gpu = m->private; + struct msm_gpu_state *state; + + state = msm_gpu_crashstate_get(gpu); + if (!state) + return 0; + + seq_printf(m, "%s Crash Status:\n", gpu->name); + seq_puts(m, "Kernel: " UTS_RELEASE "\n"); + seq_printf(m, "Time: %ld s %ld us\n", + state->time.tv_sec, state->time.tv_usec); + if (state->comm) + seq_printf(m, "comm: %s\n", state->comm); + if (state->cmd) + seq_printf(m, "cmdline: %s\n", state->cmd); + + gpu->funcs->show(gpu, state, m); + + msm_gpu_crashstate_put(gpu); + + return 0; +} + +static ssize_t msm_gpu_crash_write(struct file *file, const char __user *buf, + size_t count, loff_t *pos) +{ + struct msm_gpu *gpu = ((struct seq_file *)file->private_data)->private; + + dev_err(gpu->dev->dev, "Releasing the GPU crash state\n"); + msm_gpu_crashstate_put(gpu); + + return count; +} + +static int msm_gpu_crash_open(struct inode *inode, struct file *file) +{ + struct msm_drm_private *priv = inode->i_private; + + if (!priv->gpu) + return -ENODEV; + + return single_open(file, msm_gpu_crash_show, priv->gpu); +} + +static const struct file_operations msm_gpu_crash_fops = { + .owner = THIS_MODULE, + .open = msm_gpu_crash_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, + .write = msm_gpu_crash_write, +}; + static int msm_gpu_show(struct drm_device *dev, struct seq_file *m) { struct msm_drm_private *priv = dev->dev_private; @@ -170,6 +228,9 @@ int msm_debugfs_init(struct drm_minor *minor) return ret; } + debugfs_create_file("crash", 0644, minor->debugfs_root, + priv, &msm_gpu_crash_fops); + if (priv->kms->funcs->debugfs_init) ret = priv->kms->funcs->debugfs_init(priv->kms, minor); diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index bd376f9..f8dff90 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -273,6 +273,30 @@ int msm_gpu_hw_init(struct msm_gpu *gpu) return ret; } +static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, char *comm, + char *cmd) +{ + struct msm_gpu_state *state; + + /* Only save one crash state at a a time */ + if (gpu->crashstate) + return; + + state = gpu->funcs->gpu_state_get(gpu); + if (IS_ERR_OR_NULL(state)) + return; + + /* Fill in the additional crash state information */ + state->comm = kstrdup(comm, GFP_KERNEL); + state->cmd = kstrdup(cmd, GFP_KERNEL); + + kref_get(&state->ref); + + /* Set the active crash state to be dumped on failure */ + gpu->crashstate = state; +} + + /* * Hangcheck detection for locked gpu: */ @@ -314,6 +338,7 @@ static void recover_worker(struct work_struct *work) struct msm_drm_private *priv = dev->dev_private; struct msm_gem_submit *submit; struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu); + char *comm = NULL, *cmd = NULL; int i; mutex_lock(&dev->struct_mutex); @@ -326,8 +351,9 @@ static void recover_worker(struct work_struct *work) rcu_read_lock(); task = pid_task(submit->pid, PIDTYPE_PID); + if (task) { - char *cmd; + comm = kstrdup(task->comm, GFP_KERNEL); /* * So slightly annoying, in other paths like @@ -342,20 +368,25 @@ static void recover_worker(struct work_struct *work) mutex_unlock(&dev->struct_mutex); cmd = kstrdup_quotable_cmdline(task, GFP_KERNEL); mutex_lock(&dev->struct_mutex); + } + + rcu_read_unlock(); + if (comm && cmd) { dev_err(dev->dev, "%s: offending task: %s (%s)\n", - gpu->name, task->comm, cmd); + gpu->name, comm, cmd); msm_rd_dump_submit(priv->hangrd, submit, - "offending task: %s (%s)", task->comm, cmd); - - kfree(cmd); - } else { + "offending task: %s (%s)", comm, cmd); + } else msm_rd_dump_submit(priv->hangrd, submit, NULL); - } - rcu_read_unlock(); } + /* Record the crash state */ + msm_gpu_crashstate_capture(gpu, comm, cmd); + + kfree(cmd); + kfree(comm); /* * Update all the rings with the latest and greatest fence.. this diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 42853e9..23e3b06 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -70,7 +70,7 @@ struct msm_gpu_funcs { #endif int (*gpu_busy)(struct msm_gpu *gpu, uint64_t *value); struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu); - void (*gpu_state_put)(struct msm_gpu_state *state); + int (*gpu_state_put)(struct msm_gpu_state *state); }; struct msm_gpu { @@ -131,6 +131,8 @@ struct msm_gpu { u64 busy_cycles; ktime_t time; } devfreq; + + struct msm_gpu_state *crashstate; }; /* It turns out that all targets use the same ringbuffer size */ @@ -178,6 +180,7 @@ struct msm_gpu_submitqueue { }; struct msm_gpu_state { + struct kref ref; struct timeval time; struct { @@ -191,6 +194,9 @@ struct msm_gpu_state { u32 *registers; u32 rbbm_status; + + char *comm; + char *cmd; }; static inline void gpu_write(struct msm_gpu *gpu, u32 reg, u32 data) @@ -272,4 +278,32 @@ static inline void msm_submitqueue_put(struct msm_gpu_submitqueue *queue) kref_put(&queue->ref, msm_submitqueue_destroy); } +static inline struct msm_gpu_state *msm_gpu_crashstate_get(struct msm_gpu *gpu) +{ + struct msm_gpu_state *state = NULL; + + mutex_lock(&gpu->dev->struct_mutex); + + if (gpu->crashstate) { + kref_get(&gpu->crashstate->ref); + state = gpu->crashstate; + } + + mutex_unlock(&gpu->dev->struct_mutex); + + return state; +} + +static inline void msm_gpu_crashstate_put(struct msm_gpu *gpu) +{ + mutex_lock(&gpu->dev->struct_mutex); + + if (gpu->crashstate) { + if (gpu->funcs->gpu_state_put(gpu->crashstate)) + gpu->crashstate = NULL; + } + + mutex_unlock(&gpu->dev->struct_mutex); +} + #endif /* __MSM_GPU_H__ */