From patchwork Mon Aug 15 11:38:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9280813 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9BDB8607FD for ; Mon, 15 Aug 2016 11:38:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C0E828C36 for ; Mon, 15 Aug 2016 11:38:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 80DC028C38; Mon, 15 Aug 2016 11:38:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C27EE28C36 for ; Mon, 15 Aug 2016 11:38:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7B71F6E403; Mon, 15 Aug 2016 11:38:36 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x243.google.com (mail-wm0-x243.google.com [IPv6:2a00:1450:400c:c09::243]) by gabe.freedesktop.org (Postfix) with ESMTPS id C084D6E400 for ; Mon, 15 Aug 2016 11:38:34 +0000 (UTC) Received: by mail-wm0-x243.google.com with SMTP id i138so10609334wmf.3 for ; Mon, 15 Aug 2016 04:38:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=vtDr03J+sOxFEfcmwMxxVmoZUNMyJNufq5Ch8zULUYE=; b=dJrjOEuD7PUyLNO9/hqD5LiVD/nJSbAeYJuNXzr1yZUR7c4oCOSSO2ml66LTuodEMs 2x+rm51jN7pblSAMiA3gRbMRUGGIp2VwGt0K7PulbFOCX4NanIUgzjZfWcCu5Mt/yklR zRzGXJfBEyXwOONWZ+azW543wp7S/Szifqi1dMzDh6UdYMHiFpX3G7m0OYZq2XHaR87G Chx7vejycUU9ixuww/3CY2HdQaiLtxT8mJNi1fHQTlrhbKTq3YW3d/khO2kFkaqxMCMj jh5+NpJTzTo94PeAPbRf3GhkmAJvLIXCG6EmTv920t3/azqC6ZiUW42hXR71jMQ9Pp4A 4LoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=vtDr03J+sOxFEfcmwMxxVmoZUNMyJNufq5Ch8zULUYE=; b=HGADr9acpbM9sbv/Dp9L7e2YcmwB01LdnBgIUrliEJu9UHgwWKRnIFP0GckW4bl8OP 9wX03YG8FfFGL8z+vdc40V9xuQf6eghN7LZtfWtA6A8oA0Cl96eB0Aj59KUO2hVRcu9o Xnu4Z7FtkjvX5+hJ5foRb9RAIELwvwipLoc9XBJmAbyAvWsCXPYT5C3W/9A33+duT2sz msVTMzlIxCO0GUIMRX1jbklCgcZbGrncpE/fqxE+Xtp8WBVWEOnuPgPreOcLHP3ehQ4o jYO/gb/MiHX9ODXZG83AlKkRTa1hFWT9UEdqqhlLyemfDXk5QmnMpWlPPWqWUv+am9Ui pKmQ== X-Gm-Message-State: AEkooussOIl2R6B4uXKxRJWFITbaHFYjO9+ItJUzzfTRZhQa7rpK2i3WwdF/qnOOIvWCfQ== X-Received: by 10.28.152.66 with SMTP id a63mr15119301wme.66.1471261113037; Mon, 15 Aug 2016 04:38:33 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id f4sm16133679wmf.8.2016.08.15.04.38.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2016 04:38:32 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 15 Aug 2016 12:38:22 +0100 Message-Id: <1471261106-21289-2-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1471261106-21289-1-git-send-email-chris@chris-wilson.co.uk> References: <1471261106-21289-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 1/5] drm/i915: Allow disabling error capture X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We currently capture the GPU state after we detect a hang. This is vital for us to both triage and debug hangs in the wild (post-mortem debugging). However, it comes at the cost of running some potentially dangerous code (since it has to make very few assumption about the state of the driver) that is quite resource intensive. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Kconfig | 10 ++++++++++ drivers/gpu/drm/i915/i915_debugfs.c | 6 ++++++ drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++++ drivers/gpu/drm/i915/i915_gpu_error.c | 7 +++++++ drivers/gpu/drm/i915/i915_params.c | 9 +++++++++ drivers/gpu/drm/i915/i915_params.h | 1 + drivers/gpu/drm/i915/i915_sysfs.c | 8 ++++++++ drivers/gpu/drm/i915/intel_display.c | 4 ++++ drivers/gpu/drm/i915/intel_overlay.c | 4 ++++ 9 files changed, 60 insertions(+) diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 7769e469118f..10a6ac11b6a9 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -46,6 +46,16 @@ config DRM_I915_PRELIMINARY_HW_SUPPORT If in doubt, say "N". +config DRM_I915_CAPTURE_ERROR + bool "Enable capturing GPU state following a hang" + depends on DRM_I915 + default y + help + This option enables capturing the GPU state when a hang is detected. + This information is vital for triaging hangs and assists in debugging. + + If in doubt, say "Y". + config DRM_I915_USERPTR bool "Always enable userptr support" depends on DRM_I915 diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index b89478a8d19a..f41ebf25655c 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -973,6 +973,8 @@ static int i915_hws_info(struct seq_file *m, void *data) return 0; } +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR + static ssize_t i915_error_state_write(struct file *filp, const char __user *ubuf, @@ -1062,6 +1064,8 @@ static const struct file_operations i915_error_state_fops = { .release = i915_error_state_release, }; +#endif + static int i915_next_seqno_get(void *data, u64 *val) { @@ -5399,7 +5403,9 @@ static const struct i915_debugfs_files { {"i915_ring_missed_irq", &i915_ring_missed_irq_fops}, {"i915_ring_test_irq", &i915_ring_test_irq_fops}, {"i915_gem_drop_caches", &i915_drop_caches_fops}, +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR {"i915_error_state", &i915_error_state_fops}, +#endif {"i915_next_seqno", &i915_next_seqno_fops}, {"i915_display_crc_ctl", &i915_display_crc_ctl_fops}, {"i915_pri_wm_latency", &i915_pri_wm_latency_fops}, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 35caa9b2f36a..20caac1796ef 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3482,6 +3482,7 @@ static inline void intel_display_crc_init(struct drm_device *dev) {} #endif /* i915_gpu_error.c */ +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR __printf(2, 3) void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...); int i915_error_state_to_str(struct drm_i915_error_state_buf *estr, @@ -3501,6 +3502,16 @@ void i915_error_state_get(struct drm_device *dev, struct i915_error_state_file_priv *error_priv); void i915_error_state_put(struct i915_error_state_file_priv *error_priv); void i915_destroy_error_state(struct drm_device *dev); +#else +static inline void i915_capture_error_state(struct drm_i915_private *dev_priv, + u32 engine_mask, + const char *error_msg) +{ +} +static inline void i915_destroy_error_state(struct drm_device *dev) +{ +} +#endif void i915_get_extra_instdone(struct drm_i915_private *dev_priv, uint32_t *instdone); const char *i915_cache_level_str(struct drm_i915_private *i915, int type); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0c3f30ce85c3..0bbc22f9a705 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -30,6 +30,8 @@ #include #include "i915_drv.h" +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR + static const char *engine_str(int engine) { switch (engine) { @@ -1419,6 +1421,9 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv, struct drm_i915_error_state *error; unsigned long flags; + if (!i915.error_capture) + return; + if (READ_ONCE(dev_priv->gpu_error.first_error)) return; @@ -1504,6 +1509,8 @@ void i915_destroy_error_state(struct drm_device *dev) kref_put(&error->ref, i915_error_state_free); } +#endif + const char *i915_cache_level_str(struct drm_i915_private *i915, int type) { switch (type) { diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 768ad89d9cd4..e72a41223535 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = { .load_detect_test = 0, .force_reset_modeset_test = 0, .reset = true, + .error_capture = true, .invert_brightness = 0, .disable_display = 0, .enable_cmd_parser = 1, @@ -115,6 +116,14 @@ MODULE_PARM_DESC(vbt_sdvo_panel_type, module_param_named_unsafe(reset, i915.reset, bool, 0600); MODULE_PARM_DESC(reset, "Attempt GPU resets (default: true)"); +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR +module_param_named(error_capture, i915.error_capture, bool, 0600); +MODULE_PARM_DESC(error_capture, + "Record the GPU state following a hang. " + "This information in /sys/class/drm/card/error is vital for " + "triaging and debugging hangs."); +#endif + module_param_named_unsafe(enable_hangcheck, i915.enable_hangcheck, bool, 0644); MODULE_PARM_DESC(enable_hangcheck, "Periodically check GPU activity for detecting hangs. " diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index 3a0dd78ddb38..94efc899c1ef 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -59,6 +59,7 @@ struct i915_params { bool load_detect_test; bool force_reset_modeset_test; bool reset; + bool error_capture; bool disable_display; bool verbose_state_checks; bool nuclear_pageflip; diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c index f1ffde7f7c0b..36a18345492c 100644 --- a/drivers/gpu/drm/i915/i915_sysfs.c +++ b/drivers/gpu/drm/i915/i915_sysfs.c @@ -532,6 +532,8 @@ static const struct attribute *vlv_attrs[] = { NULL, }; +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR + static ssize_t error_state_read(struct file *filp, struct kobject *kobj, struct bin_attribute *attr, char *buf, loff_t off, size_t count) @@ -597,6 +599,8 @@ static struct bin_attribute error_state_attr = { .write = error_state_write, }; +#endif + void i915_setup_sysfs(struct drm_device *dev) { int ret; @@ -642,15 +646,19 @@ void i915_setup_sysfs(struct drm_device *dev) if (ret) DRM_ERROR("RPS sysfs setup failed\n"); +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR ret = sysfs_create_bin_file(&dev->primary->kdev->kobj, &error_state_attr); if (ret) DRM_ERROR("error_state sysfs setup failed\n"); +#endif } void i915_teardown_sysfs(struct drm_device *dev) { +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR sysfs_remove_bin_file(&dev->primary->kdev->kobj, &error_state_attr); +#endif if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev)) sysfs_remove_files(&dev->primary->kdev->kobj, vlv_attrs); else diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 781d22eee774..bb4473b26fd2 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -16879,6 +16879,8 @@ int intel_modeset_vga_set_state(struct drm_device *dev, bool state) return 0; } +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR + struct intel_display_error_state { u32 power_well_driver; @@ -17061,3 +17063,5 @@ intel_display_print_error_state(struct drm_i915_error_state_buf *m, err_printf(m, " VSYNC: %08x\n", error->transcoder[i].vsync); } } + +#endif diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 72f8990a13d2..87689df7f514 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -1470,6 +1470,8 @@ void intel_cleanup_overlay(struct drm_i915_private *dev_priv) kfree(dev_priv->overlay); } +#ifdef CONFIG_DRM_I915_CAPTURE_ERROR + struct intel_overlay_error_state { struct overlay_registers regs; unsigned long base; @@ -1587,3 +1589,5 @@ intel_overlay_print_error_state(struct drm_i915_error_state_buf *m, P(UVSCALEV); #undef P } + +#endif