From patchwork Mon Sep 23 03:58:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13809279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 22E0DCF9C6E for ; Mon, 23 Sep 2024 03:59:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BBBF610E36B; Mon, 23 Sep 2024 03:59:34 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="N87iDxOV"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id 56EEB10E36A; Mon, 23 Sep 2024 03:59:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727063974; x=1758599974; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YRrOwSAxG99z4hcVsg7rqtAZxr7q7fvuHbElT7nv3T8=; b=N87iDxOVNQB6kJy/fF5WAssN1USnic/53gPmwzQwkhUivdyYphAF2iEc 3wAfY6UKtfVuei450xEKwolMrcy5k2jKLJb0n2YN8sZCEA2i13FHMSxRG Zci+wEpJJ+YVpIsWP97/snR3vPJW0EJz3H9vZeb/1MUt+ZQPxJw2DS3ky s/csLEYnyC1k38hjvRiu6SHGYZkQXV0o7SEQhJrVFYdtBesQ2ADISSrdz mnDA4rAF3GAJp+HV+aJK3GpTdNMg4AsqpsFPYNvLpTSB8eXhw88wVBVsh tEqIZ/aB2pKOAmq3tldrO0tnRAsfSNHDa70wTkzwRoZgIDcxUqmObeTb2 w==; X-CSE-ConnectionGUID: Xu/iApwLS/6zugwZK0c8tw== X-CSE-MsgGUID: UIabIh1PShSeSMrDuR4HXw== X-IronPort-AV: E=McAfee;i="6700,10204,11202"; a="29718240" X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="29718240" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2024 20:59:34 -0700 X-CSE-ConnectionGUID: PU9mv7TUQSm48Ze3brSBcQ== X-CSE-MsgGUID: XfTqgX24RUK1tfV/1PHyYg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="101667451" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by orviesa002.jf.intel.com with ESMTP; 22 Sep 2024 20:59:27 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v6 1/4] drm: Introduce device wedged event Date: Mon, 23 Sep 2024 09:28:23 +0530 Message-Id: <20240923035826.624196-2-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240923035826.624196-1-raag.jadav@intel.com> References: <20240923035826.624196-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Introduce device wedged event, which will notify userspace of wedged (hanged/unusable) state of the DRM device through a uevent. This is useful especially in cases where the device is no longer operating as expected and has become unrecoverable from driver context. Purpose of this implementation is to provide drivers a way to recover through userspace intervention. Different drivers may have different ideas of a "wedged device" depending on their hardware implementation, and hence the vendor agnostic nature of the event. It is up to the drivers to decide when they see the need for recovery and how they want to recover from the available methods. Current implementation defines three recovery methods, out of which, drivers can choose to support any one or multiple of them. Preferred recovery method will be sent in the uevent environment as WEDGED=. Userspace consumers (sysadmin) can define udev rules to parse this event and take respective action to recover the device. Method | Consumer expectations -----------|----------------------------------- rebind | unbind + rebind driver bus-reset | unbind + reset bus device + rebind reboot | reboot system v4: s/drm_dev_wedged/drm_dev_wedged_event Use drm_info() (Jani) Kernel doc adjustment (Aravind) v5: Send recovery method with uevent (Lina) v6: Access wedge_recovery_opts[] using helper function (Jani) Use snprintf() (Jani) Signed-off-by: Raag Jadav --- drivers/gpu/drm/drm_drv.c | 41 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_device.h | 24 +++++++++++++++++++++++ include/drm/drm_drv.h | 18 +++++++++++++++++ 3 files changed, 83 insertions(+) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index ac30b0ec9d93..03a5d9009689 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -70,6 +70,18 @@ static struct dentry *drm_debugfs_root; DEFINE_STATIC_SRCU(drm_unplug_srcu); +/* + * Available recovery methods for wedged device. To be sent along with device + * wedged uevent. + */ +#define WEDGE_LEN 32 /* Need 16+ */ + +const char *const wedge_recovery_opts[] = { + [DRM_WEDGE_RECOVERY_REBIND] = "rebind", + [DRM_WEDGE_RECOVERY_BUS_RESET] = "bus-reset", + [DRM_WEDGE_RECOVERY_REBOOT] = "reboot", +}; + /* * DRM Minors * A DRM device can provide several char-dev interfaces on the DRM-Major. Each @@ -497,6 +509,35 @@ void drm_dev_unplug(struct drm_device *dev) } EXPORT_SYMBOL(drm_dev_unplug); +/** + * drm_dev_wedged_event - generate a device wedged uevent + * @dev: DRM device + * @method: method to be used for recovery + * + * This generates a device wedged uevent for the DRM device specified by @dev. + * Recovery @method from wedge_recovery_opts[] (if supprted by the device) is + * sent in the uevent environment as WEDGED=, on the basis of which, + * userspace may take respective action to recover the device. + * + * Returns: 0 on success, or negative error code otherwise. + */ +int drm_dev_wedged_event(struct drm_device *dev, enum wedge_recovery_method method) +{ + char event_string[WEDGE_LEN] = {}; + char *envp[] = { event_string, NULL }; + + if (!test_bit(method, &dev->wedge_recovery)) { + drm_err(dev, "device wedged, recovery method not supported\n"); + return -EOPNOTSUPP; + } + + snprintf(event_string, sizeof(event_string), "WEDGED=%s", recovery_method_name(method)); + + drm_info(dev, "device wedged, generating uevent\n"); + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); +} +EXPORT_SYMBOL(drm_dev_wedged_event); + /* * DRM internal mount * We want to be able to allocate our own "struct address_space" to control diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index c91f87b5242d..f1a71763c22a 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -40,6 +40,27 @@ enum switch_power_state { DRM_SWITCH_POWER_DYNAMIC_OFF = 3, }; +/** + * enum wedge_recovery_method - Recovery method for wedged device in order + * of severity. To be set as bit fields in drm_device.wedge_recovery variable. + * Drivers can choose to support any one or multiple of them depending on their + * needs. + */ + +enum wedge_recovery_method { + /** @DRM_WEDGE_RECOVERY_REBIND: unbind + rebind driver */ + DRM_WEDGE_RECOVERY_REBIND, + + /** @DRM_WEDGE_RECOVERY_BUS_RESET: unbind + reset bus device + rebind */ + DRM_WEDGE_RECOVERY_BUS_RESET, + + /** @DRM_WEDGE_RECOVERY_REBOOT: reboot system */ + DRM_WEDGE_RECOVERY_REBOOT, + + /** @DRM_WEDGE_RECOVERY_MAX: for bounds checking, do not use */ + DRM_WEDGE_RECOVERY_MAX +}; + /** * struct drm_device - DRM device structure * @@ -317,6 +338,9 @@ struct drm_device { * Root directory for debugfs files. */ struct dentry *debugfs_root; + + /** @wedge_recovery: Supported recovery methods for wedged device */ + unsigned long wedge_recovery; }; #endif diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 02ea4e3248fd..83d44e153557 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -45,6 +45,8 @@ struct drm_mode_create_dumb; struct drm_printer; struct sg_table; +extern const char *const wedge_recovery_opts[]; + /** * enum drm_driver_feature - feature flags * @@ -461,6 +463,7 @@ void drm_put_dev(struct drm_device *dev); bool drm_dev_enter(struct drm_device *dev, int *idx); void drm_dev_exit(int idx); void drm_dev_unplug(struct drm_device *dev); +int drm_dev_wedged_event(struct drm_device *dev, enum wedge_recovery_method method); /** * drm_dev_is_unplugged - is a DRM device unplugged @@ -551,4 +554,19 @@ static inline void drm_debugfs_dev_init(struct drm_device *dev, struct dentry *r } #endif +static inline bool recovery_method_is_valid(enum wedge_recovery_method method) +{ + if (method >= DRM_WEDGE_RECOVERY_REBIND && method < DRM_WEDGE_RECOVERY_MAX) + return true; + + return false; +} + +static inline const char *recovery_method_name(enum wedge_recovery_method method) +{ + if (recovery_method_is_valid(method)) + return wedge_recovery_opts[method]; + + return NULL; +} #endif From patchwork Mon Sep 23 03:58:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13809280 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 05918CF9C6E for ; Mon, 23 Sep 2024 03:59:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A0E6810E367; Mon, 23 Sep 2024 03:59:40 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ez1gtRFK"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1AB0810E36D; Mon, 23 Sep 2024 03:59:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727063980; x=1758599980; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tKbIYASP1CsvFDUuFBHMLMuWYnHOaHy/2dJpg4EAGPg=; b=ez1gtRFK/S4mIlfHE/rUUBD+Q8YhTTbBKFVLFY6bbVYD+QE3xk3dQ1V/ /bZfM0Lvs7jrqq7qeEzOMQxndB/C5+jdKQ6fO0N8vofK9PL7/fhD6nbSI mZiudBgw+71J2yuWDGuzbMv0TwTdTMQEZ3l3CgsDMilCthKqNgr0QTAml mYKmU6AhA1Ih8doxtWtqKLQm7fvHSEcZF+mmGWBvjWtwy5PHb+sGNoqU5 qkDI/SUdCdg4eSXSLXYDoBIZk3zbG6+ByZHSEwoE3VB40I/rcjTxZoxCr t/FYRkfx0sZIH9i/+u54Ct+qSO40pcfDWgYs/Wsg/ID9/0fETzJJb/YJF A==; X-CSE-ConnectionGUID: vbHB/i4VTbeCJYotqmYhAQ== X-CSE-MsgGUID: v0I8D9AwSDax6RoS8VZRsQ== X-IronPort-AV: E=McAfee;i="6700,10204,11202"; a="29718244" X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="29718244" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2024 20:59:39 -0700 X-CSE-ConnectionGUID: KIiWMSXfSPyXE2uA2DWKoA== X-CSE-MsgGUID: 5vNNpbThQIOzFpZOR9V3Xw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="101667463" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by orviesa002.jf.intel.com with ESMTP; 22 Sep 2024 20:59:34 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v6 2/4] drm: Expose wedge recovery methods Date: Mon, 23 Sep 2024 09:28:24 +0530 Message-Id: <20240923035826.624196-3-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240923035826.624196-1-raag.jadav@intel.com> References: <20240923035826.624196-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that we have device wedged event in place, add wedge_recovery sysfs attribute which will expose recovery methods supported by the DRM device. This is useful for userspace consumers in cases where the device supports multiple recovery methods which can be used as fallbacks. $ cat /sys/class/drm/card0/wedge_recovery rebind bus-reset reboot Signed-off-by: Raag Jadav --- drivers/gpu/drm/drm_sysfs.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c index fb3bbb6adcd1..99767de9e685 100644 --- a/drivers/gpu/drm/drm_sysfs.c +++ b/drivers/gpu/drm/drm_sysfs.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -508,6 +509,26 @@ void drm_sysfs_connector_property_event(struct drm_connector *connector, } EXPORT_SYMBOL(drm_sysfs_connector_property_event); +static ssize_t wedge_recovery_show(struct device *device, + struct device_attribute *attr, char *buf) +{ + struct drm_minor *minor = to_drm_minor(device); + struct drm_device *dev = minor->dev; + int method, count = 0; + + for_each_set_bit(method, &dev->wedge_recovery, DRM_WEDGE_RECOVERY_MAX) + count += sysfs_emit_at(buf, count, "%s\n", recovery_method_name(method)); + + return count; +} +static DEVICE_ATTR_RO(wedge_recovery); + +static struct attribute *minor_dev_attrs[] = { + &dev_attr_wedge_recovery.attr, + NULL +}; +ATTRIBUTE_GROUPS(minor_dev); + struct device *drm_sysfs_minor_alloc(struct drm_minor *minor) { const char *minor_str; @@ -532,6 +553,7 @@ struct device *drm_sysfs_minor_alloc(struct drm_minor *minor) kdev->devt = MKDEV(DRM_MAJOR, minor->index); kdev->class = drm_class; kdev->type = &drm_sysfs_device_minor; + kdev->groups = minor_dev_groups; } kdev->parent = minor->dev->dev; From patchwork Mon Sep 23 03:58:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13809281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CB532CF9C5B for ; Mon, 23 Sep 2024 03:59:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5835310E36A; Mon, 23 Sep 2024 03:59:46 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="bPW7mqAO"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1606C10E36E; Mon, 23 Sep 2024 03:59:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727063986; x=1758599986; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q5U9v1JUywxsBmUJ9c9VCNOOFna6CAS58ufXIazjDPw=; b=bPW7mqAOXpXLzbMHBDMS1J0McVRlrV1cZLoYe35RVzKf/91lU9MI2sjC hIGxXb1+DKzf7F/PvDDHEeDFbZlr3Y0Y3R3UZvesicm6NYcoZLy+jbnSY PXcibpb5SH7aoterkxH/7RJTC8rzQ6TCzQVF2kCgYYyGJnYF+HFfazWr4 dNSmC/F/Ow5A5jQp2wV+jAU5JgbNRERtcLaiWquK4paal5/LUZsr70g2S Y9txIBd61rOi2CIMoPU23Q6jGTwuBm7ia9SUVfrvP6/zbbc6JiNBeFTZp FICMR2J/n39SHyJoWhNV1LagMdzyI/z7j8pl7+6I+gYQi6mH18ExUftPU Q==; X-CSE-ConnectionGUID: OU4N/J7QTZK4WaRZ2tRHvA== X-CSE-MsgGUID: x41mtKKGT7qUkgNpFFAh8Q== X-IronPort-AV: E=McAfee;i="6700,10204,11202"; a="29718252" X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="29718252" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2024 20:59:45 -0700 X-CSE-ConnectionGUID: G7eSwJ7FSC6VDiF4IQ4ZrA== X-CSE-MsgGUID: l/iedVPmTFuIw83u7C70LA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="101667480" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by orviesa002.jf.intel.com with ESMTP; 22 Sep 2024 20:59:39 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v6 3/4] drm/xe: Use device wedged event Date: Mon, 23 Sep 2024 09:28:25 +0530 Message-Id: <20240923035826.624196-4-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240923035826.624196-1-raag.jadav@intel.com> References: <20240923035826.624196-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" This was previously attempted as xe specific reset uevent but dropped in commit 77a0d4d1cea2 ("drm/xe/uapi: Remove reset uevent for now") as part of refactoring. Now that we have device wedged event supported by DRM core, make use of it. With this in place userspace will be notified of wedged device, on the basis of which, userspace may take respective action to recover the device. $ udevadm monitor --property --kernel monitor will print the received events for: KERNEL - the kernel uevent KERNEL[265.802982] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm) ACTION=change DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 SUBSYSTEM=drm WEDGED=bus-reset DEVNAME=/dev/dri/card0 DEVTYPE=drm_minor SEQNUM=5208 MAJOR=226 MINOR=0 v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic re-implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav --- drivers/gpu/drm/xe/xe_device.c | 17 +++++++++++++++-- drivers/gpu/drm/xe/xe_device.h | 1 + drivers/gpu/drm/xe/xe_pci.c | 2 ++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index cb5a9fd820cf..8edafea18f45 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -783,6 +783,15 @@ int xe_device_probe(struct xe_device *xe) return err; } +void xe_setup_wedge_recovery(struct xe_device *xe) +{ + struct drm_device *dev = &xe->drm; + + /* Support both driver rebind and bus reset based recovery. */ + set_bit(DRM_WEDGE_RECOVERY_REBIND, &dev->wedge_recovery); + set_bit(DRM_WEDGE_RECOVERY_BUS_RESET, &dev->wedge_recovery); +} + static void xe_device_remove_display(struct xe_device *xe) { xe_display_unregister(xe); @@ -989,11 +998,12 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg) * xe_device_declare_wedged - Declare device wedged * @xe: xe device instance * - * This is a final state that can only be cleared with a mudule + * This is a final state that can only be cleared with a module * re-probe (unbind + bind). * In this state every IOCTL will be blocked so the GT cannot be used. * In general it will be called upon any critical error such as gt reset - * failure or guc loading failure. + * failure or guc loading failure. Userspace will be notified of this state + * by a DRM uevent. * If xe.wedged module parameter is set to 2, this function will be called * on every single execution timeout (a.k.a. GPU hang) right after devcoredump * snapshot capture. In this mode, GT reset won't be attempted so the state of @@ -1023,6 +1033,9 @@ void xe_device_declare_wedged(struct xe_device *xe) "IOCTLs and executions are blocked. Only a rebind may clear the failure\n" "Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new\n", dev_name(xe->drm.dev)); + + /* Notify userspace of wedged device */ + drm_dev_wedged_event(&xe->drm, DRM_WEDGE_RECOVERY_BUS_RESET); } for_each_gt(gt, xe, id) diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index 4c3f0ebe78a9..ca4b3935a982 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -186,6 +186,7 @@ static inline bool xe_device_wedged(struct xe_device *xe) return atomic_read(&xe->wedged.flag); } +void xe_setup_wedge_recovery(struct xe_device *xe); void xe_device_declare_wedged(struct xe_device *xe); struct xe_file *xe_file_get(struct xe_file *xef); diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c index edaeefd2d648..e7a1d59c40a9 100644 --- a/drivers/gpu/drm/xe/xe_pci.c +++ b/drivers/gpu/drm/xe/xe_pci.c @@ -860,6 +860,8 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (err) goto err_driver_cleanup; + xe_setup_wedge_recovery(xe); + drm_dbg(&xe->drm, "d3cold: capable=%s\n", str_yes_no(xe->d3cold.capable)); From patchwork Mon Sep 23 03:58:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13809282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 757D4CF9C72 for ; Mon, 23 Sep 2024 03:59:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 015CF10E373; Mon, 23 Sep 2024 03:59:52 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="EKCWW4N7"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id DD16610E373; Mon, 23 Sep 2024 03:59:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727063992; x=1758599992; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hDHwYTD0WC4SEU1jX6R13ZEAdVrJxgdWKzuKbuWTqc0=; b=EKCWW4N7AUczPr48rB/BjvvnX60Dw1Gm+8j3qcOWBNT5vm9bZ2NZCxwO 3y4RpELELuUzT/6QO46ilwHjW0vCCrox+xY6YocFClN7z+dRjguBIfv2P yupvdAkBJ30638mLjaSOrObA8hH6u+ygJL5/aRVsLpnIfAgU/15zcjDj0 iZdiVbKql/TOW013nXyOdIoTifHqCbQNCB6riE0PF32+OqXzGwJvzmicb d6FH4eo5+rRuEcisodkbrFl29QSBE+EO0wgvnrrmdULs3M2ZYwVKAP9Ol 92qAoixEfTzgZryDI265ntmI83ZIXvJZgmf5pU4s+Ju59S6Rg8ThJJgJU g==; X-CSE-ConnectionGUID: PXDz48cXSn+thel/rZ/LXg== X-CSE-MsgGUID: OtLg8g0/Rn6jzrj4Y+MfmQ== X-IronPort-AV: E=McAfee;i="6700,10204,11202"; a="29718261" X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="29718261" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2024 20:59:51 -0700 X-CSE-ConnectionGUID: uHYu+uI0Q7S0TgkSZS/yzg== X-CSE-MsgGUID: WdJ7SHwEQXyRoukcb5CXWA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,250,1719903600"; d="scan'208";a="101667489" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by orviesa002.jf.intel.com with ESMTP; 22 Sep 2024 20:59:45 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v6 4/4] drm/i915: Use device wedged event Date: Mon, 23 Sep 2024 09:28:26 +0530 Message-Id: <20240923035826.624196-5-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240923035826.624196-1-raag.jadav@intel.com> References: <20240923035826.624196-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that we have device wedged event supported by DRM core, make use of it. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav --- drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++ drivers/gpu/drm/i915/i915_driver.c | 10 ++++++++++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 8f1ea95471ef..02f357d4e4fb 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1418,6 +1418,8 @@ static void intel_gt_reset_global(struct intel_gt *gt, if (!test_bit(I915_WEDGED, >->reset.flags)) kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event); + else + drm_dev_wedged_event(>->i915->drm, DRM_WEDGE_RECOVERY_BUS_RESET); } /** diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c index fe905d65ddf7..0185fb41eb95 100644 --- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -711,6 +711,15 @@ static void i915_welcome_messages(struct drm_i915_private *dev_priv) "DRM_I915_DEBUG_RUNTIME_PM enabled\n"); } +static void i915_setup_wedge_recovery(struct drm_i915_private *i915) +{ + struct drm_device *dev = &i915->drm; + + /* Support both driver rebind and bus reset based recovery. */ + set_bit(DRM_WEDGE_RECOVERY_REBIND, &dev->wedge_recovery); + set_bit(DRM_WEDGE_RECOVERY_BUS_RESET, &dev->wedge_recovery); +} + static struct drm_i915_private * i915_driver_create(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -812,6 +821,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent) enable_rpm_wakeref_asserts(&i915->runtime_pm); + i915_setup_wedge_recovery(i915); i915_welcome_messages(i915); i915->do_release = true;