diff mbox series

[v3,1/3] drm: Introduce device wedged event

Message ID 20240902074859.2992849-2-raag.jadav@intel.com (mailing list archive)
State New, archived
Headers show
Series Introduce DRM device wedged event | expand

Commit Message

Raag Jadav Sept. 2, 2024, 7:48 a.m. UTC
Introduce device wedged event, which will notify userspace of wedged
(hanged/unusable) state of the DRM device through a uevent. This is
useful especially in cases where the device is in unrecoverable state
and requires userspace intervention for recovery.

Purpose of this implementation is to be vendor agnostic. Userspace
consumers (sysadmin) can define udev rules to parse this event and
take respective action to recover the device.

Consumer expectations:
----------------------
1) Unbind driver
2) Reset bus device
3) Re-bind driver

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---
 drivers/gpu/drm/drm_drv.c | 21 +++++++++++++++++++++
 include/drm/drm_drv.h     |  1 +
 2 files changed, 22 insertions(+)

Comments

Jani Nikula Sept. 2, 2024, 7:51 a.m. UTC | #1
On Mon, 02 Sep 2024, Raag Jadav <raag.jadav@intel.com> wrote:
> Introduce device wedged event, which will notify userspace of wedged
> (hanged/unusable) state of the DRM device through a uevent. This is
> useful especially in cases where the device is in unrecoverable state
> and requires userspace intervention for recovery.
>
> Purpose of this implementation is to be vendor agnostic. Userspace
> consumers (sysadmin) can define udev rules to parse this event and
> take respective action to recover the device.
>
> Consumer expectations:
> ----------------------
> 1) Unbind driver
> 2) Reset bus device
> 3) Re-bind driver
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
>  drivers/gpu/drm/drm_drv.c | 21 +++++++++++++++++++++
>  include/drm/drm_drv.h     |  1 +
>  2 files changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 93543071a500..dc55cc237d89 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -499,6 +499,27 @@ void drm_dev_unplug(struct drm_device *dev)
>  }
>  EXPORT_SYMBOL(drm_dev_unplug);
>  
> +/**
> + * drm_dev_wedged - declare DRM device as wedged
> + * @dev: DRM device
> + *
> + * This declares a DRM device specified by @dev as wedged (hanged/unusable)
> + * and generates a uevent for it, on the basis of which, userspace may take
> + * respective action to recover the device.
> + * Currently we only set WEDGED=1 in the uevent environment, but this can
> + * be expanded in the future.
> + */
> +void drm_dev_wedged(struct drm_device *dev)
> +{
> +	char *event_string = "WEDGED=1";
> +	char *envp[] = { event_string, NULL };
> +
> +	DRM_INFO("%s: device wedged, generating uevent\n", dev_name(dev->dev));

drm_info() please, and you can drop that handrolled dev_name().

BR,
Jani.

> +
> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_dev_wedged);
> +
>  /*
>   * DRM internal mount
>   * We want to be able to allocate our own "struct address_space" to control
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index cd37936c3926..a0b2d1435b86 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -489,6 +489,7 @@ void drm_put_dev(struct drm_device *dev);
>  bool drm_dev_enter(struct drm_device *dev, int *idx);
>  void drm_dev_exit(int idx);
>  void drm_dev_unplug(struct drm_device *dev);
> +void drm_dev_wedged(struct drm_device *dev);
>  
>  /**
>   * drm_dev_is_unplugged - is a DRM device unplugged
Aravind Iddamsetty Sept. 2, 2024, 9:14 a.m. UTC | #2
On 02/09/24 13:18, Raag Jadav wrote:
> Introduce device wedged event, which will notify userspace of wedged
> (hanged/unusable) state of the DRM device through a uevent. This is
> useful especially in cases where the device is in unrecoverable state
> and requires userspace intervention for recovery.
>
> Purpose of this implementation is to be vendor agnostic. Userspace
> consumers (sysadmin) can define udev rules to parse this event and
> take respective action to recover the device.
>
> Consumer expectations:
> ----------------------
> 1) Unbind driver
> 2) Reset bus device
> 3) Re-bind driver
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
>  drivers/gpu/drm/drm_drv.c | 21 +++++++++++++++++++++
>  include/drm/drm_drv.h     |  1 +
>  2 files changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 93543071a500..dc55cc237d89 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -499,6 +499,27 @@ void drm_dev_unplug(struct drm_device *dev)
>  }
>  EXPORT_SYMBOL(drm_dev_unplug);
>  
> +/**
> + * drm_dev_wedged - declare DRM device as wedged
> + * @dev: DRM device
> + *
> + * This declares a DRM device specified by @dev as wedged (hanged/unusable)
this doesn't seem to set any drm state as wedged, it is just sending an
uevent. you might need to correct the above statement.

Thanks,
Aravind.
> + * and generates a uevent for it, on the basis of which, userspace may take
> + * respective action to recover the device.
> + * Currently we only set WEDGED=1 in the uevent environment, but this can
> + * be expanded in the future.
> + */
> +void drm_dev_wedged(struct drm_device *dev)
> +{
> +	char *event_string = "WEDGED=1";
> +	char *envp[] = { event_string, NULL };
> +
> +	DRM_INFO("%s: device wedged, generating uevent\n", dev_name(dev->dev));
> +
> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_dev_wedged);
> +
>  /*
>   * DRM internal mount
>   * We want to be able to allocate our own "struct address_space" to control
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index cd37936c3926..a0b2d1435b86 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -489,6 +489,7 @@ void drm_put_dev(struct drm_device *dev);
>  bool drm_dev_enter(struct drm_device *dev, int *idx);
>  void drm_dev_exit(int idx);
>  void drm_dev_unplug(struct drm_device *dev);
> +void drm_dev_wedged(struct drm_device *dev);
>  
>  /**
>   * drm_dev_is_unplugged - is a DRM device unplugged
Raag Jadav Sept. 3, 2024, 7:48 a.m. UTC | #3
On Mon, Sep 02, 2024 at 02:44:21PM +0530, Aravind Iddamsetty wrote:
> 
> On 02/09/24 13:18, Raag Jadav wrote:
> > Introduce device wedged event, which will notify userspace of wedged
> > (hanged/unusable) state of the DRM device through a uevent. This is
> > useful especially in cases where the device is in unrecoverable state
> > and requires userspace intervention for recovery.
> >
> > Purpose of this implementation is to be vendor agnostic. Userspace
> > consumers (sysadmin) can define udev rules to parse this event and
> > take respective action to recover the device.
> >
> > Consumer expectations:
> > ----------------------
> > 1) Unbind driver
> > 2) Reset bus device
> > 3) Re-bind driver
> >
> > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > ---
> >  drivers/gpu/drm/drm_drv.c | 21 +++++++++++++++++++++
> >  include/drm/drm_drv.h     |  1 +
> >  2 files changed, 22 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 93543071a500..dc55cc237d89 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -499,6 +499,27 @@ void drm_dev_unplug(struct drm_device *dev)
> >  }
> >  EXPORT_SYMBOL(drm_dev_unplug);
> >  
> > +/**
> > + * drm_dev_wedged - declare DRM device as wedged
> > + * @dev: DRM device
> > + *
> > + * This declares a DRM device specified by @dev as wedged (hanged/unusable)
> this doesn't seem to set any drm state as wedged, it is just sending an
> uevent. you might need to correct the above statement.

On a second thought, perhaps this warrants any action on drm_device?

Raag
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 93543071a500..dc55cc237d89 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -499,6 +499,27 @@  void drm_dev_unplug(struct drm_device *dev)
 }
 EXPORT_SYMBOL(drm_dev_unplug);
 
+/**
+ * drm_dev_wedged - declare DRM device as wedged
+ * @dev: DRM device
+ *
+ * This declares a DRM device specified by @dev as wedged (hanged/unusable)
+ * and generates a uevent for it, on the basis of which, userspace may take
+ * respective action to recover the device.
+ * Currently we only set WEDGED=1 in the uevent environment, but this can
+ * be expanded in the future.
+ */
+void drm_dev_wedged(struct drm_device *dev)
+{
+	char *event_string = "WEDGED=1";
+	char *envp[] = { event_string, NULL };
+
+	DRM_INFO("%s: device wedged, generating uevent\n", dev_name(dev->dev));
+
+	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+EXPORT_SYMBOL(drm_dev_wedged);
+
 /*
  * DRM internal mount
  * We want to be able to allocate our own "struct address_space" to control
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index cd37936c3926..a0b2d1435b86 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -489,6 +489,7 @@  void drm_put_dev(struct drm_device *dev);
 bool drm_dev_enter(struct drm_device *dev, int *idx);
 void drm_dev_exit(int idx);
 void drm_dev_unplug(struct drm_device *dev);
+void drm_dev_wedged(struct drm_device *dev);
 
 /**
  * drm_dev_is_unplugged - is a DRM device unplugged