diff mbox series

[v4,5/6] drm/i915/dp_link_training: Set all downstream MST ports to BAD before retrying

Message ID 20230824205335.500163-6-gildekel@chromium.org (mailing list archive)
State New, archived
Headers show
Series drm/i915/dp_link_training: Define a final failure state when link training fails | expand

Commit Message

Gil Dekel Aug. 24, 2023, 8:50 p.m. UTC
Before sending a uevent to userspace in order to trigger a corrective
modeset, we change the failing connector's link-status to BAD. However,
the downstream MST branch ports are left in their original GOOD state.

This patch utilizes the drm helper function
drm_dp_set_mst_topology_link_status() to rectify this and set all
downstream MST connectors' link-status to BAD before emitting the uevent
to userspace.

Signed-off-by: Gil Dekel <gildekel@chromium.org>
---
 drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

Comments

Rodrigo Vivi Sept. 1, 2023, 6:55 p.m. UTC | #1
On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote:
> Before sending a uevent to userspace in order to trigger a corrective
> modeset, we change the failing connector's link-status to BAD. However,
> the downstream MST branch ports are left in their original GOOD state.
> 
> This patch utilizes the drm helper function
> drm_dp_set_mst_topology_link_status() to rectify this and set all
> downstream MST connectors' link-status to BAD before emitting the uevent
> to userspace.
> 
> Signed-off-by: Gil Dekel <gildekel@chromium.org>
> ---
>  drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
> index 42353b1ac487..e8b10f59e141 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
>  	struct intel_dp *intel_dp =
>  		container_of(work, typeof(*intel_dp), modeset_retry_work);
>  	struct drm_connector *connector = &intel_dp->attached_connector->base;
> -	drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
> -		    connector->name);
> 
> -	/* Grab the locks before changing connector property*/
> -	mutex_lock(&connector->dev->mode_config.mutex);
> -	/* Set connector link status to BAD and send a Uevent to notify
> -	 * userspace to do a modeset.
> +	/* Set the connector's (and possibly all its downstream MST ports') link
> +	 * status to BAD.
>  	 */
> +	mutex_lock(&connector->dev->mode_config.mutex);
> +	drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
> +		    connector->base.id, connector->name,
> +		    connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
>  	drm_connector_set_link_status_property(connector,
>  					       DRM_MODE_LINK_STATUS_BAD);
> +	if (intel_dp->is_mst) {
> +		drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
> +						    DRM_MODE_LINK_STATUS_BAD);

Something is weird with the locking here.
I noticed that on patch 3 this new function also gets the same
mutex_lock(&connector->dev->mode_config.mutex);

Since you didn't reach the deadlock, I'm clearly missing something
on the flow. But regardless of what I could be missing, I believe
this is totally not future proof and we will for sure hit dead-lock
cases.

> +	}
>  	mutex_unlock(&connector->dev->mode_config.mutex);
>  	/* Send Hotplug uevent so userspace can reprobe */
>  	drm_kms_helper_connector_hotplug_event(connector);
> --
> Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
Gil Dekel Sept. 1, 2023, 9:13 p.m. UTC | #2
On Fri, Sep 1, 2023 at 2:55 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
>
> On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote:
> > Before sending a uevent to userspace in order to trigger a corrective
> > modeset, we change the failing connector's link-status to BAD. However,
> > the downstream MST branch ports are left in their original GOOD state.
> >
> > This patch utilizes the drm helper function
> > drm_dp_set_mst_topology_link_status() to rectify this and set all
> > downstream MST connectors' link-status to BAD before emitting the uevent
> > to userspace.
> >
> > Signed-off-by: Gil Dekel <gildekel@chromium.org>
> > ---
> >  drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
> > index 42353b1ac487..e8b10f59e141 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dp.c
> > +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
> >       struct intel_dp *intel_dp =
> >               container_of(work, typeof(*intel_dp), modeset_retry_work);
> >       struct drm_connector *connector = &intel_dp->attached_connector->base;
> > -     drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
> > -                 connector->name);
> >
> > -     /* Grab the locks before changing connector property*/
> > -     mutex_lock(&connector->dev->mode_config.mutex);
> > -     /* Set connector link status to BAD and send a Uevent to notify
> > -      * userspace to do a modeset.
> > +     /* Set the connector's (and possibly all its downstream MST ports') link
> > +      * status to BAD.
> >        */
> > +     mutex_lock(&connector->dev->mode_config.mutex);
> > +     drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
> > +                 connector->base.id, connector->name,
> > +                 connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
> >       drm_connector_set_link_status_property(connector,
> >                                              DRM_MODE_LINK_STATUS_BAD);
> > +     if (intel_dp->is_mst) {
> > +             drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
> > +                                                 DRM_MODE_LINK_STATUS_BAD);
>
> Something is weird with the locking here.
> I noticed that on patch 3 this new function also gets the same
> mutex_lock(&connector->dev->mode_config.mutex);
>
> Since you didn't reach the deadlock, I'm clearly missing something
> on the flow. But regardless of what I could be missing, I believe
> this is totally not future proof and we will for sure hit dead-lock
> cases.
>
You are not wrong.

Something must have been wrong in my workflow, as I was positive I
tested the code with this lock, but I must remember wrong. I tried
testing my current code and it immediately locked, as you expected.
So thank you for catching this.

Lyude's original patch didn't include drm_dp_set_mst_topology_link_status()
as an exposed drm helper function, so when I adjusted it for this series, I
decided to add locks similar to how her other function using
drm_dp_set_mst_topology_link_status() did. However, I failed to use the
right lock, which is:
drm_modeset_lock(&connector->dev->mode_config.connection_mutex, NULL);
drm_modeset_unlock(&connector->dev->mode_config.connection_mutex);
This is similar to how drm_connector_set_link_status_property() locks
before writing to link_status.

I made sure to test my code with the above locks, and it runs well. Here's
an instrumented log excerpt for failing link-training with an MST hub
(I hacked the driver to always fail non eDP connectors and print the
raw pointer addresses of the drm_device and mutex right before locking):
[   43.466329] i915 0000:00:02.0: [drm] *ERROR* Link Training
Unsuccessful via gildekel HACK - (not eDP)
[   43.594950] i915 0000:00:02.0: [drm] *ERROR* Link Training
Unsuccessful via gildekel HACK - (not eDP)
[   43.594979] i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful
[   43.595023] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:273:DP-3]:
[   43.595028] i915 0000:00:02.0: [drm] *ERROR*
connector->dev=00000000d4850450
[   43.595033] i915 0000:00:02.0: [drm] *ERROR*
connector->dev->mode_config.mutex=00000000aac3fe45
[   44.771091] i915 0000:00:02.0: [drm] *ERROR*
[MST-CONNECTOR:300:DP-5]:
[   44.771108] i915 0000:00:02.0: [drm] *ERROR*
connector->dev=000000003fb97435
[   44.771115] i915 0000:00:02.0: [drm] *ERROR*
&connector->dev->mode_config.connection_mutex=000000009aece20e
[   44.771127] i915 0000:00:02.0: [drm] *ERROR*
[MST-CONNECTOR:303:DP-6]:
[   44.771132] i915 0000:00:02.0: [drm] *ERROR*
connector->dev=0000000075236b75
[   44.771137] i915 0000:00:02.0: [drm] *ERROR*
&connector->dev->mode_config.connection_mutex=000000009aece20e

Also, I was under the assumption that all connectors in an MST topology
should reference the same drm_device object, but it seems like that's
not the case. Is my assumption wrong?

> > +     }
> >       mutex_unlock(&connector->dev->mode_config.mutex);
> >       /* Send Hotplug uevent so userspace can reprobe */
> >       drm_kms_helper_connector_hotplug_event(connector);
> > --
> > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics


Thanks for your time and comments!
--
Best,
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
Gil Dekel Sept. 1, 2023, 11:24 p.m. UTC | #3
On Fri, Sep 1, 2023 at 5:13 PM Gil Dekel <gildekel@chromium.org> wrote:
>
> On Fri, Sep 1, 2023 at 2:55 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> >
> > On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote:
> > > Before sending a uevent to userspace in order to trigger a corrective
> > > modeset, we change the failing connector's link-status to BAD. However,
> > > the downstream MST branch ports are left in their original GOOD state.
> > >
> > > This patch utilizes the drm helper function
> > > drm_dp_set_mst_topology_link_status() to rectify this and set all
> > > downstream MST connectors' link-status to BAD before emitting the uevent
> > > to userspace.
> > >
> > > Signed-off-by: Gil Dekel <gildekel@chromium.org>
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------
> > >  1 file changed, 10 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
> > > index 42353b1ac487..e8b10f59e141 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_dp.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> > > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
> > >       struct intel_dp *intel_dp =
> > >               container_of(work, typeof(*intel_dp), modeset_retry_work);
> > >       struct drm_connector *connector = &intel_dp->attached_connector->base;
> > > -     drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
> > > -                 connector->name);
> > >
> > > -     /* Grab the locks before changing connector property*/
> > > -     mutex_lock(&connector->dev->mode_config.mutex);
> > > -     /* Set connector link status to BAD and send a Uevent to notify
> > > -      * userspace to do a modeset.
> > > +     /* Set the connector's (and possibly all its downstream MST ports') link
> > > +      * status to BAD.
> > >        */
> > > +     mutex_lock(&connector->dev->mode_config.mutex);
> > > +     drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
> > > +                 connector->base.id, connector->name,
> > > +                 connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
> > >       drm_connector_set_link_status_property(connector,
> > >                                              DRM_MODE_LINK_STATUS_BAD);
> > > +     if (intel_dp->is_mst) {
> > > +             drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
> > > +                                                 DRM_MODE_LINK_STATUS_BAD);
> >
> > Something is weird with the locking here.
> > I noticed that on patch 3 this new function also gets the same
> > mutex_lock(&connector->dev->mode_config.mutex);
> >
> > Since you didn't reach the deadlock, I'm clearly missing something
> > on the flow. But regardless of what I could be missing, I believe
> > this is totally not future proof and we will for sure hit dead-lock
> > cases.
> >
> You are not wrong.
>
> Something must have been wrong in my workflow, as I was positive I
> tested the code with this lock, but I must remember wrong. I tried
> testing my current code and it immediately locked, as you expected.
> So thank you for catching this.
>
> Lyude's original patch didn't include drm_dp_set_mst_topology_link_status()
> as an exposed drm helper function, so when I adjusted it for this series, I
> decided to add locks similar to how her other function using
> drm_dp_set_mst_topology_link_status() did. However, I failed to use the
> right lock, which is:
> drm_modeset_lock(&connector->dev->mode_config.connection_mutex, NULL);
> drm_modeset_unlock(&connector->dev->mode_config.connection_mutex);
> This is similar to how drm_connector_set_link_status_property() locks
> before writing to link_status.
>
> I made sure to test my code with the above locks, and it runs well. Here's
> an instrumented log excerpt for failing link-training with an MST hub
> (I hacked the driver to always fail non eDP connectors and print the
> raw pointer addresses of the drm_device and mutex right before locking):
> [   43.466329] i915 0000:00:02.0: [drm] *ERROR* Link Training
> Unsuccessful via gildekel HACK - (not eDP)
> [   43.594950] i915 0000:00:02.0: [drm] *ERROR* Link Training
> Unsuccessful via gildekel HACK - (not eDP)
> [   43.594979] i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful
> [   43.595023] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:273:DP-3]:
> [   43.595028] i915 0000:00:02.0: [drm] *ERROR*
> connector->dev=00000000d4850450
> [   43.595033] i915 0000:00:02.0: [drm] *ERROR*
> connector->dev->mode_config.mutex=00000000aac3fe45
> [   44.771091] i915 0000:00:02.0: [drm] *ERROR*
> [MST-CONNECTOR:300:DP-5]:
> [   44.771108] i915 0000:00:02.0: [drm] *ERROR*
> connector->dev=000000003fb97435
> [   44.771115] i915 0000:00:02.0: [drm] *ERROR*
> &connector->dev->mode_config.connection_mutex=000000009aece20e
> [   44.771127] i915 0000:00:02.0: [drm] *ERROR*
> [MST-CONNECTOR:303:DP-6]:
> [   44.771132] i915 0000:00:02.0: [drm] *ERROR*
> connector->dev=0000000075236b75
> [   44.771137] i915 0000:00:02.0: [drm] *ERROR*
> &connector->dev->mode_config.connection_mutex=000000009aece20e
>
> Also, I was under the assumption that all connectors in an MST topology
> should reference the same drm_device object, but it seems like that's
> not the case. Is my assumption wrong?
>
Sorry, please disregard. I was printing the pointers' address instead
of the value's address. Same drm_device is shared:

Link Training Unsuccessful via gildekel HACK - (not eDP)
Link Training Unsuccessful via gildekel HACK - (not eDP)
Link Training Unsuccessful
[CONNECTOR:273:DP-3]:
  connector->dev=00000000b88b882c
  connector->dev->mode_config.mutex=00000000d64b22db
    [MST-CONNECTOR:297:DP-5]:
      connector->dev=00000000b88b882c
      &connector->dev->mode_config.connection_mutex=000000002d876227
    [MST-CONNECTOR:301:DP-6]:
      connector->dev=00000000b88b882c
      &connector->dev->mode_config.connection_mutex=000000002d876227

Sorry for the noise.

> > > +     }
> > >       mutex_unlock(&connector->dev->mode_config.mutex);
> > >       /* Send Hotplug uevent so userspace can reprobe */
> > >       drm_kms_helper_connector_hotplug_event(connector);
> > > --
> > > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
>
>
> Thanks for your time and comments!
> --
> Best,
> Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 42353b1ac487..e8b10f59e141 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5995,16 +5995,20 @@  static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
 	struct intel_dp *intel_dp =
 		container_of(work, typeof(*intel_dp), modeset_retry_work);
 	struct drm_connector *connector = &intel_dp->attached_connector->base;
-	drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
-		    connector->name);

-	/* Grab the locks before changing connector property*/
-	mutex_lock(&connector->dev->mode_config.mutex);
-	/* Set connector link status to BAD and send a Uevent to notify
-	 * userspace to do a modeset.
+	/* Set the connector's (and possibly all its downstream MST ports') link
+	 * status to BAD.
 	 */
+	mutex_lock(&connector->dev->mode_config.mutex);
+	drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
+		    connector->base.id, connector->name,
+		    connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
 	drm_connector_set_link_status_property(connector,
 					       DRM_MODE_LINK_STATUS_BAD);
+	if (intel_dp->is_mst) {
+		drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
+						    DRM_MODE_LINK_STATUS_BAD);
+	}
 	mutex_unlock(&connector->dev->mode_config.mutex);
 	/* Send Hotplug uevent so userspace can reprobe */
 	drm_kms_helper_connector_hotplug_event(connector);