From patchwork Tue Sep 17 04:02:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13805884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BAC65C3ABCB for ; Tue, 17 Sep 2024 04:03:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5BFEA10E32C; Tue, 17 Sep 2024 04:03:28 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="I4QwjTte"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id BBB6010E332; Tue, 17 Sep 2024 04:03:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726545808; x=1758081808; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m2N4j0llPmm34l7ZiHh4OR6yipWx4AYR3N6HhqTLp7I=; b=I4QwjTteJt0KbTrcjIfiNnthR8K6nVKJweolDUl1YcARW4s1h/mOYcP2 kpKBJXhsaCwYjB6KF3Q1ScEj7NTEB/HfQxB/8pmwaNj4utj9y0zeqRuVc KensSLPs+LqqXE7WUNnYFuliC4LHPAjCFym8pA+Nm5hs6Kvr5fLZW0ww5 K2XoS5Rlynvu3JXPsIX3DC7In2xbbgym5h6dmCGDZbR2dj8FfiRm5+4fu rrhJgrGIh8+kl/EP68lzQzlKZ1ZOozgJMW1G3EkVx24F1bAqwKm6naSCZ LX96vbhp589INIj5tLQ5sXw5z3rpwRKjxeLHtRXmNXgOXRWovqkRVjsx/ A==; X-CSE-ConnectionGUID: RS5AA+zxRViK23ULr+5Cpg== X-CSE-MsgGUID: ZgCqHWlaQReNDgiN5R4Zxg== X-IronPort-AV: E=McAfee;i="6700,10204,11197"; a="42865156" X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="42865156" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2024 21:03:27 -0700 X-CSE-ConnectionGUID: qj+4KXXYR4KWlNzmb5bb0A== X-CSE-MsgGUID: SJKbRyWTRDCo7Ggaqw0HoA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="69168560" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by fmviesa008.fm.intel.com with ESMTP; 16 Sep 2024 21:03:22 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v5 1/4] drm: Introduce device wedged event Date: Tue, 17 Sep 2024 09:32:32 +0530 Message-Id: <20240917040235.197019-2-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240917040235.197019-1-raag.jadav@intel.com> References: <20240917040235.197019-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Introduce device wedged event, which will notify userspace of wedged (hanged/unusable) state of the DRM device through a uevent. This is useful especially in cases where the device is no longer operating as expected and has become unrecoverable from driver context. Purpose of this implementation is to provide drivers a way to recover through userspace intervention. Different drivers may have different ideas of a "wedged device" depending on their hardware implementation, and hence the vendor agnostic nature of the event. It is upto the drivers to decide when they see the need for recovery and how they want to recover from the available methods. Current implementation defines three recovery methods, out of which, drivers can choose to support any one or multiple of them. Preferred recovery method will be sent in the uevent environment as WEDGED=. Userspace consumers (sysadmin) can define udev rules to parse this event and take respective action to recover the device. Method | Consumer expectations -----------|----------------------------------- rebind | unbind + rebind driver bus-reset | unbind + reset bus device + rebind reboot | reboot system v4: s/drm_dev_wedged/drm_dev_wedged_event Use drm_info() (Jani) Kernel doc adjustment (Aravind) v5: Send recovery method with uevent (Lina) Signed-off-by: Raag Jadav --- drivers/gpu/drm/drm_drv.c | 37 +++++++++++++++++++++++++++++++++++++ include/drm/drm_device.h | 24 ++++++++++++++++++++++++ include/drm/drm_drv.h | 1 + 3 files changed, 62 insertions(+) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index ac30b0ec9d93..1e850a9f608d 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -497,6 +497,43 @@ void drm_dev_unplug(struct drm_device *dev) } EXPORT_SYMBOL(drm_dev_unplug); +const char *const wedge_recovery_opts[] = { + [DRM_WEDGE_RECOVERY_REBIND] = "rebind", + [DRM_WEDGE_RECOVERY_BUS_RESET] = "bus-reset", + [DRM_WEDGE_RECOVERY_REBOOT] = "reboot", +}; + +/** + * drm_dev_wedged_event - generate a device wedged uevent + * @dev: DRM device + * @method: method to be used for recovery + * + * This generates a device wedged uevent for the DRM device specified by @dev. + * Recovery @method from wedge_recovery_opts[] (if supprted by the device) is + * sent in the uevent environment as WEDGED=, on the basis of which, + * userspace may take respective action to recover the device. + * + * Returns: 0 on success, or negative error code otherwise. + */ +int drm_dev_wedged_event(struct drm_device *dev, enum wedge_recovery_method method) +{ + char event_string[32] = "WEDGED="; + char *envp[] = { event_string, NULL }; + bool supported; + + supported = test_bit(method, &dev->wedge_recovery); + if (unlikely(!supported)) { + drm_err(dev, "device wedged, recovery method not supported\n"); + return -EOPNOTSUPP; + } + + strcat(event_string, wedge_recovery_opts[method]); + + drm_info(dev, "device wedged, generating uevent\n"); + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); +} +EXPORT_SYMBOL(drm_dev_wedged_event); + /* * DRM internal mount * We want to be able to allocate our own "struct address_space" to control diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index c91f87b5242d..e4f32967b5ae 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -40,6 +40,27 @@ enum switch_power_state { DRM_SWITCH_POWER_DYNAMIC_OFF = 3, }; +/** + * enum wedge_recovery_method - Recovery method for wedged device in order + * of severity. To be set as bit fields in drm_device.wedge_recovery variable. + * Drivers can choose to support any one or multiple of them depending on their + * needs. + */ + +enum wedge_recovery_method { + /** @DRM_WEDGE_RECOVERY_REBIND: unbind + rebind driver */ + DRM_WEDGE_RECOVERY_REBIND = 0, + + /** @DRM_WEDGE_RECOVERY_BUS_RESET: unbind + reset bus device + rebind */ + DRM_WEDGE_RECOVERY_BUS_RESET = 1, + + /** @DRM_WEDGE_RECOVERY_REBOOT: reboot system */ + DRM_WEDGE_RECOVERY_REBOOT = 2, + + /** @DRM_WEDGE_RECOVERY_MAX: for bounds checking, do not use */ + DRM_WEDGE_RECOVERY_MAX = 3, +}; + /** * struct drm_device - DRM device structure * @@ -317,6 +338,9 @@ struct drm_device { * Root directory for debugfs files. */ struct dentry *debugfs_root; + + /** @wedge_recovery: Supported recovery methods for wedged device */ + unsigned long wedge_recovery; }; #endif diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 02ea4e3248fd..6e02187f1f6c 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -461,6 +461,7 @@ void drm_put_dev(struct drm_device *dev); bool drm_dev_enter(struct drm_device *dev, int *idx); void drm_dev_exit(int idx); void drm_dev_unplug(struct drm_device *dev); +int drm_dev_wedged_event(struct drm_device *dev, enum wedge_recovery_method method); /** * drm_dev_is_unplugged - is a DRM device unplugged From patchwork Tue Sep 17 04:02:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13805885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1FB93C3ABCF for ; Tue, 17 Sep 2024 04:03:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B1E2210E3FF; Tue, 17 Sep 2024 04:03:34 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="m1E5+AiB"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5A28310E332; Tue, 17 Sep 2024 04:03:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726545813; x=1758081813; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U5X7DprDXAM1iIvXBrPSSCmo6qVTMWAwY4c3tE8APrM=; b=m1E5+AiBR2uglG6Ecf2N4f/QuBt83HxMppmTNqxocjAjEx5XNIg6KJjb dF4cHeYnR0WvpJITCEl8N0IsQTLfPVdI73fGbbAOlC/XpJor+TLzI4qzf 2OWlfeip+jEm1bWhSSC2q1oHoK1f/NoxciM/k6Ql8jgFvQq5pjlvH3Ejc K6PcG4lpnjkEQN+E4AKy2ZAcslc5DhFl21/zP8HVH9VAFsRjYO5FBGdYf +C8trPWA1UXQJrftrWCQo/mFuzGg03kXN9MANZNJEOfLja53TKqEouAsF Ny3vQFYnfjHjO43cqkwlExGVHF5Xe0zCSkrtJFFQu3eC97M4eYnN2pMa1 A==; X-CSE-ConnectionGUID: yB9zzmQcQjGe6yqu7AfFHw== X-CSE-MsgGUID: ZIarDkW8QPuC3gRHZQnkrA== X-IronPort-AV: E=McAfee;i="6700,10204,11197"; a="42865168" X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="42865168" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2024 21:03:33 -0700 X-CSE-ConnectionGUID: sxM0JrslTDWH6C8FByS9Hw== X-CSE-MsgGUID: pcT8D/UiRpCciO3NmKNsUg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="69168595" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by fmviesa008.fm.intel.com with ESMTP; 16 Sep 2024 21:03:28 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v5 2/4] drm: Expose wedge recovery methods Date: Tue, 17 Sep 2024 09:32:33 +0530 Message-Id: <20240917040235.197019-3-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240917040235.197019-1-raag.jadav@intel.com> References: <20240917040235.197019-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that we have device wedged event in place, add wedge_recovery sysfs attribute which will expose recovery methods supported by the DRM device. This is useful for userspace consumers in cases where the device supports multiple recovery methods which can be used as fallbacks. $ cat /sys/class/drm/card0/wedge_recovery rebind bus-reset reboot Signed-off-by: Raag Jadav --- drivers/gpu/drm/drm_sysfs.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c index fb3bbb6adcd1..b88cdbfa3b5e 100644 --- a/drivers/gpu/drm/drm_sysfs.c +++ b/drivers/gpu/drm/drm_sysfs.c @@ -36,6 +36,8 @@ #define to_drm_minor(d) dev_get_drvdata(d) #define to_drm_connector(d) dev_get_drvdata(d) +extern const char *const wedge_recovery_opts[]; + /** * DOC: overview * @@ -508,6 +510,26 @@ void drm_sysfs_connector_property_event(struct drm_connector *connector, } EXPORT_SYMBOL(drm_sysfs_connector_property_event); +static ssize_t wedge_recovery_show(struct device *device, + struct device_attribute *attr, char *buf) +{ + struct drm_minor *minor = to_drm_minor(device); + struct drm_device *dev = minor->dev; + int opt, count = 0; + + for_each_set_bit(opt, &dev->wedge_recovery, DRM_WEDGE_RECOVERY_MAX) + count += sysfs_emit_at(buf, count, "%s\n", wedge_recovery_opts[opt]); + + return count; +} +static DEVICE_ATTR_RO(wedge_recovery); + +static struct attribute *minor_dev_attrs[] = { + &dev_attr_wedge_recovery.attr, + NULL +}; +ATTRIBUTE_GROUPS(minor_dev); + struct device *drm_sysfs_minor_alloc(struct drm_minor *minor) { const char *minor_str; @@ -532,6 +554,7 @@ struct device *drm_sysfs_minor_alloc(struct drm_minor *minor) kdev->devt = MKDEV(DRM_MAJOR, minor->index); kdev->class = drm_class; kdev->type = &drm_sysfs_device_minor; + kdev->groups = minor_dev_groups; } kdev->parent = minor->dev->dev; From patchwork Tue Sep 17 04:02:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13805886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 50129C3ABCF for ; Tue, 17 Sep 2024 04:03:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DF14C10E407; Tue, 17 Sep 2024 04:03:40 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="dszERGC5"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1E92010E404; Tue, 17 Sep 2024 04:03:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726545819; x=1758081819; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DCYR0za9l8qE1OWNI2AHKrT5zsJWMR8uZ3p9q7rJ1t0=; b=dszERGC5I+0Y5JJKxos7CyNadk482S8g7GbumeWb8/bERCQIkjfmXtwT Qiqp8pegru6y+aWf8/x57aunOHL3jVrC3w96ZBzT1Zms0rV0KtoyWuAve OR6qycGG7KvHym47boRqv42oxa6S0InqsuwlDbaMuPlSMS6GVbVu+PE+d tpkzjeXolLAG5k5rDtWLPxObUqIqD3xNp+xgGikFJO0swGJhqHSiCtyNT jsCskUtDx1snnYuh5IzIk4J6RnNQ9Nxar4PZxBTmNdM39Tdr/xsMuQbQA N1u84bAdBrDhx4N6/67mVi1JkZLuOa94nJ8qFJfytRX2WlMsPfpKKttY/ g==; X-CSE-ConnectionGUID: i9RLQABARAO+hg5t1YDokg== X-CSE-MsgGUID: ohIIk7K3Szy5Dg68wNv1kw== X-IronPort-AV: E=McAfee;i="6700,10204,11197"; a="42865179" X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="42865179" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2024 21:03:39 -0700 X-CSE-ConnectionGUID: lij/9bTJRUOKdO1bSJmiRA== X-CSE-MsgGUID: udCEXefpRwm9ukmAlVxqyQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="69168599" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by fmviesa008.fm.intel.com with ESMTP; 16 Sep 2024 21:03:33 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v5 3/4] drm/xe: Use device wedged event Date: Tue, 17 Sep 2024 09:32:34 +0530 Message-Id: <20240917040235.197019-4-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240917040235.197019-1-raag.jadav@intel.com> References: <20240917040235.197019-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" This was previously attempted as xe specific reset uevent but dropped in commit 77a0d4d1cea2 ("drm/xe/uapi: Remove reset uevent for now") as part of refactoring. Now that we have device wedged event supported by DRM core, make use of it. With this in place userspace will be notified of wedged device, on the basis of which, userspace may take respective action to recover the device. $ udevadm monitor --property --kernel monitor will print the received events for: KERNEL - the kernel uevent KERNEL[265.802982] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm) ACTION=change DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 SUBSYSTEM=drm WEDGED=bus-reset DEVNAME=/dev/dri/card0 DEVTYPE=drm_minor SEQNUM=5208 MAJOR=226 MINOR=0 v2: Change authorship to Himal (Aravind) Add uevent for all device wedged cases (Aravind) v3: Generic re-implementation in DRM subsystem (Lucas) v4: Change authorship to Raag (Aravind) Signed-off-by: Raag Jadav --- drivers/gpu/drm/xe/xe_device.c | 17 +++++++++++++++-- drivers/gpu/drm/xe/xe_device.h | 1 + drivers/gpu/drm/xe/xe_pci.c | 2 ++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 4d3c794f134c..1b097643aacb 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -780,6 +780,15 @@ int xe_device_probe(struct xe_device *xe) return err; } +void xe_setup_wedge_recovery(struct xe_device *xe) +{ + struct drm_device *dev = &xe->drm; + + /* Support both driver rebind and bus reset based recovery. */ + set_bit(DRM_WEDGE_RECOVERY_REBIND, &dev->wedge_recovery); + set_bit(DRM_WEDGE_RECOVERY_BUS_RESET, &dev->wedge_recovery); +} + static void xe_device_remove_display(struct xe_device *xe) { xe_display_unregister(xe); @@ -986,11 +995,12 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg) * xe_device_declare_wedged - Declare device wedged * @xe: xe device instance * - * This is a final state that can only be cleared with a mudule + * This is a final state that can only be cleared with a module * re-probe (unbind + bind). * In this state every IOCTL will be blocked so the GT cannot be used. * In general it will be called upon any critical error such as gt reset - * failure or guc loading failure. + * failure or guc loading failure. Userspace will be notified of this state + * by a DRM uevent. * If xe.wedged module parameter is set to 2, this function will be called * on every single execution timeout (a.k.a. GPU hang) right after devcoredump * snapshot capture. In this mode, GT reset won't be attempted so the state of @@ -1020,6 +1030,9 @@ void xe_device_declare_wedged(struct xe_device *xe) "IOCTLs and executions are blocked. Only a rebind may clear the failure\n" "Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new\n", dev_name(xe->drm.dev)); + + /* Notify userspace of wedged device */ + drm_dev_wedged_event(&xe->drm, DRM_WEDGE_RECOVERY_BUS_RESET); } for_each_gt(gt, xe, id) diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index ca8d8ef6342b..77a2332b4b87 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -174,6 +174,7 @@ static inline bool xe_device_wedged(struct xe_device *xe) return atomic_read(&xe->wedged.flag); } +void xe_setup_wedge_recovery(struct xe_device *xe); void xe_device_declare_wedged(struct xe_device *xe); struct xe_file *xe_file_get(struct xe_file *xef); diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c index a1d08e20cd34..60a8a60f1d9f 100644 --- a/drivers/gpu/drm/xe/xe_pci.c +++ b/drivers/gpu/drm/xe/xe_pci.c @@ -872,6 +872,8 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (err) goto err_driver_cleanup; + xe_setup_wedge_recovery(xe); + drm_dbg(&xe->drm, "d3cold: capable=%s\n", str_yes_no(xe->d3cold.capable)); From patchwork Tue Sep 17 04:02:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raag Jadav X-Patchwork-Id: 13805887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2BDC6C3ABCE for ; Tue, 17 Sep 2024 04:03:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D030910E307; Tue, 17 Sep 2024 04:03:45 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="kSfrHoVG"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9B94510E307; Tue, 17 Sep 2024 04:03:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726545825; x=1758081825; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hDHwYTD0WC4SEU1jX6R13ZEAdVrJxgdWKzuKbuWTqc0=; b=kSfrHoVGmrs/Yv8s/KWQF4fhlxFWZg9q24VJHdGkcxu1JwNBe6PuhvOI 72dhY0xsXSgG7gAzuGJ4BRZzD8m008ZA4e4Rnfbgdgdz5EO/qnjasFQ06 M8PYodUqQ1kgUK1begNkoBd68bCRJhVkBS4/ptHIw5yShmq13PGp1JUQO SzNYSSQ+WBYuztsIq6ZedVN2U++zzAdFLK48YQUsI78mfdR5ZzyDMf6M8 ffVnRM5bqZtOrriA/POoMUSO6yhbpcDpUT7/OoJ5f/5MN/mvAVR7X0OQ/ Nf7WFJica64NQ5wViZ8CifqnK7Ec6xF5OgVjCpaKvMHIHVfaQIloruDtP A==; X-CSE-ConnectionGUID: U1dmwJqcTgy7YmtqaNNrXA== X-CSE-MsgGUID: tVY7IiGHQ9WSMqIErT8AkA== X-IronPort-AV: E=McAfee;i="6700,10204,11197"; a="42865186" X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="42865186" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2024 21:03:44 -0700 X-CSE-ConnectionGUID: Khq1abFwQaKf34Pa2O+niA== X-CSE-MsgGUID: 72Rf6eGYSwiliHsVc4S5fg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,234,1719903600"; d="scan'208";a="69168605" Received: from jraag-nuc8i7beh.iind.intel.com ([10.145.169.79]) by fmviesa008.fm.intel.com with ESMTP; 16 Sep 2024 21:03:39 -0700 From: Raag Jadav To: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, tursulin@ursulin.net, lina@asahilina.net Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com, francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com, andi.shyti@linux.intel.com, andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com, Raag Jadav Subject: [PATCH v5 4/4] drm/i915: Use device wedged event Date: Tue, 17 Sep 2024 09:32:35 +0530 Message-Id: <20240917040235.197019-5-raag.jadav@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240917040235.197019-1-raag.jadav@intel.com> References: <20240917040235.197019-1-raag.jadav@intel.com> MIME-Version: 1.0 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that we have device wedged event supported by DRM core, make use of it. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav --- drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++ drivers/gpu/drm/i915/i915_driver.c | 10 ++++++++++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 8f1ea95471ef..02f357d4e4fb 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1418,6 +1418,8 @@ static void intel_gt_reset_global(struct intel_gt *gt, if (!test_bit(I915_WEDGED, >->reset.flags)) kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event); + else + drm_dev_wedged_event(>->i915->drm, DRM_WEDGE_RECOVERY_BUS_RESET); } /** diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c index fe905d65ddf7..0185fb41eb95 100644 --- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -711,6 +711,15 @@ static void i915_welcome_messages(struct drm_i915_private *dev_priv) "DRM_I915_DEBUG_RUNTIME_PM enabled\n"); } +static void i915_setup_wedge_recovery(struct drm_i915_private *i915) +{ + struct drm_device *dev = &i915->drm; + + /* Support both driver rebind and bus reset based recovery. */ + set_bit(DRM_WEDGE_RECOVERY_REBIND, &dev->wedge_recovery); + set_bit(DRM_WEDGE_RECOVERY_BUS_RESET, &dev->wedge_recovery); +} + static struct drm_i915_private * i915_driver_create(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -812,6 +821,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent) enable_rpm_wakeref_asserts(&i915->runtime_pm); + i915_setup_wedge_recovery(i915); i915_welcome_messages(i915); i915->do_release = true;