From patchwork Tue Jul 24 16:33:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jordan Crouse X-Patchwork-Id: 10542421 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF4DF157A for ; Tue, 24 Jul 2018 16:33:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF4B128ACD for ; Tue, 24 Jul 2018 16:33:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A3DCA28C0A; Tue, 24 Jul 2018 16:33:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4B0AF28ACD for ; Tue, 24 Jul 2018 16:33:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 760E46E5C9; Tue, 24 Jul 2018 16:33:37 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from smtp.codeaurora.org (smtp.codeaurora.org [198.145.29.96]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1DCF36E0E1; Tue, 24 Jul 2018 16:33:35 +0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 1000) id C0C30607EB; Tue, 24 Jul 2018 16:33:35 +0000 (UTC) Received: from jcrouse-lnx.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: jcrouse@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 0133560594; Tue, 24 Jul 2018 16:33:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 0133560594 From: Jordan Crouse To: freedreno@lists.freedesktop.org Subject: [v8 PATCH 00/13] drm/msm: Capture and dump the GPU crash state Date: Tue, 24 Jul 2018 10:33:18 -0600 Message-Id: <20180724163331.18250-1-jcrouse@codeaurora.org> X-Mailer: git-send-email 2.18.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-msm@vger.kernel.org, dri-devel@lists.freedesktop.org MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP This is revision 8 implementing a GPU crash state for drm/msm (https://patchwork.freedesktop.org/series/36097/). This patchset adds better documentation and reflects comments from the mailing lists. I know we will miss 4.19 at this point, but I think this is ready to soak in msm-next for a while. The object of this code is to store and provide enough information to debug software and hardware issues on the Adreno hardware in a semi human-readable format that can also be parsed by scripts. THe full set of changes here capture basic information about the GPU, the status and contents of the ringbuffers, a snapshot of the current register state and the active buffers from the hanging submit. The data is printed with devcoredump. For example, after a hang you can get the data from /sys/class/devcoredump/devcdX/data where X is a unique number. v8: Add documentation and consolidate puts/printf code from code comments v7: Add EXPORT_SYMBOL for __drm_puts_coredump and use %zd to print a size_t variable for the bo dump thanks to the ever vigilant zero one bot. v6: Add drm_puts() and use it in the appropriate place. Clean up a few minor bugs here and there. v5: Fix symbol error in i915_gpu_error.c thanks to 01 dot org bot. Added open/release functions for the show debugfs file to get the state per Chris Wilson. Slightly modified the register output format to be more YAML friendly also per Chris. v4: Add buffer dump for the active submit. Fix refcount issue with devcoredump. Change header for a5xx registers to registers-hlsq because I'm told YAML requires unique tags. v3: Make recommended changes to ascii85 per Chris Wilson. Use devcoredump to dump crash states as suggested by Bjorn Andersson and add a new drm_print facility to facilitate that. Remove the now obsolete 'crash' debugfs node. Add documentation for the crash dump output. v2: Convert output to yaml, use ascii85 to dump ringbuffer contents. Jordan Crouse (13): include: Move ascii85 functions from i915 to linux/ascii85.h drm: drm_printer: Add printer for devcoredump drm: Add drm_puts() to complement drm_printf() drm: Add a -puts() function for the seq_file printer drm: Add puts callback for the coredump printer drm/msm/gpu: Capture the state of the GPU drm/msm/gpu: Convert the GPU show function to use the GPU state drm/msm/gpu: Rearrange the code that collects the task during a hang drm/msm/gpu: Capture the GPU state on a GPU hang drm/msm/adreno: Convert the show/crash file format drm/msm/adreno: Add ringbuffer data to the GPU state drm/msm/adreno: Add a5xx specific registers for the GPU state drm/msm/gpu: Add the buffer objects from the submit to the crash dump Documentation/gpu/msm-crash-dump.rst | 96 ++++++++++ drivers/gpu/drm/drm_print.c | 111 +++++++++++ drivers/gpu/drm/i915/i915_gpu_error.c | 34 +--- drivers/gpu/drm/msm/Kconfig | 1 + drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 30 +-- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 22 ++- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 242 ++++++++++++++++++++++-- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 184 ++++++++++++++++-- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 10 +- drivers/gpu/drm/msm/msm_debugfs.c | 93 ++++++++- drivers/gpu/drm/msm/msm_gpu.c | 145 +++++++++++++- drivers/gpu/drm/msm/msm_gpu.h | 68 ++++++- include/drm/drm_print.h | 71 +++++++ include/linux/ascii85.h | 38 ++++ 14 files changed, 1044 insertions(+), 101 deletions(-) create mode 100644 Documentation/gpu/msm-crash-dump.rst create mode 100644 include/linux/ascii85.h