diff mbox

[v10] drm/i915: Implement Link Rate fallback on Link training failure

Message ID 1489529511-7856-1-git-send-email-manasi.d.navare@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Navare, Manasi March 14, 2017, 10:11 p.m. UTC
If link training at a link rate optimal for a particular
mode fails during modeset's atomic commit phase, then we
let the modeset complete and then retry. We save the link rate
value at which link training failed, update the link status property
to "BAD" and use a lower link rate to prune the modes. It will redo
the modeset on the current mode at lower link rate or if the current
mode gets pruned due to lower link constraints then, it will send a
hotplug uevent for userspace to handle it.

This is also required to pass DP CTS tests 4.3.1.3, 4.3.1.4,
4.3.1.6.

This patch is a resend of the original commit id (233ce881dd91fb
"drm/i915: Implement Link Rate fallback on Link training failure")
which got reverted in this commit id (afc1ebf4562a14 Revert
"drm/i915: Implement Link Rate fallback on Link training failure")
due to CI failures.

After investigating the CI failures it was found that these
were essentially the failures which were always there but hidden because
they used to be DRM_DEBUG_KMS messages for link failures so never got
caught by CI. But now this patch actually throws DRM_ERROR if the link
training fails at RBR and 1 lane. So it caught these link train failures.

There were two failures:
1. On SKL 6700k this was because the machine in CI lab is a SKL desktop
without eDP on Port A. But our VBT initialization code in the driver writes
VBT defaults in a way that it always sets DP flag on Port A and this does
not get cleared after parsing the VBT outputs. This has been fixed in
commit id (bb1d132935c2f8 "drm/i915/vbt: split out defaults that are set
when there is no VBT) and (665788572c6410b "drm/i915/vbt: don't propagate
errors from intel_bios_init())

2. On ILK-650 desktop - This was happening because of a bad monitor desktop
combination. I switched the monitor in the CI lab and that helped get rid
of the link failures on ILK system.

v10:
* Rebase on drm-tip and resend after revert
v9:
* Use the trimmed max values of link rate/lane count based on
link train fallback (Daniel Vetter)
v8:
* Set link_status to BAD first and then call mode_valid (Jani Nikula)
v7:
Remove the redundant variable in previous patch itself
v6:
* Obtain link rate index from fallback_link_rate using
the helper intel_dp_link_rate_index (Jani Nikula)
* Include fallback within intel_dp_start_link_train (Jani Nikula)
v5:
* Move set link status to drm core (Daniel Vetter, Jani Nikula)
v4:
* Add fallback support for non DDI platforms too
* Set connector->link status inside set_link_status function
(Jani Nikula)
v3:
* Set link status property to BAd unconditionally (Jani Nikula)
* Dont use two separate variables link_train_failed and link_status
to indicate same thing (Jani Nikula)
v2:
* Squashed a few patches (Jani Nikula)

Acked-by: Tony Cheng <tony.cheng@amd.com>
Acked-by: Harry Wentland <Harry.wentland@amd.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/i915/intel_dp.c               | 27 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_dp_link_training.c | 22 ++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_drv.h              |  3 +++
 3 files changed, 50 insertions(+), 2 deletions(-)

Comments

Navare, Manasi March 22, 2017, 3:44 p.m. UTC | #1
Hi Jani/Ville,

I need to add another quick fix which would be required for the fallback
to happen as expected. Should I respin this patch to add that fix or
should I wait for this to get landed?

I have mentioned the fix suggested below, please let me know your thoughts on that.


On Tue, Mar 14, 2017 at 03:11:51PM -0700, Manasi Navare wrote:
> If link training at a link rate optimal for a particular
> mode fails during modeset's atomic commit phase, then we
> let the modeset complete and then retry. We save the link rate
> value at which link training failed, update the link status property
> to "BAD" and use a lower link rate to prune the modes. It will redo
> the modeset on the current mode at lower link rate or if the current
> mode gets pruned due to lower link constraints then, it will send a
> hotplug uevent for userspace to handle it.
> 
> This is also required to pass DP CTS tests 4.3.1.3, 4.3.1.4,
> 4.3.1.6.
> 
> This patch is a resend of the original commit id (233ce881dd91fb
> "drm/i915: Implement Link Rate fallback on Link training failure")
> which got reverted in this commit id (afc1ebf4562a14 Revert
> "drm/i915: Implement Link Rate fallback on Link training failure")
> due to CI failures.
> 
> After investigating the CI failures it was found that these
> were essentially the failures which were always there but hidden because
> they used to be DRM_DEBUG_KMS messages for link failures so never got
> caught by CI. But now this patch actually throws DRM_ERROR if the link
> training fails at RBR and 1 lane. So it caught these link train failures.
> 
> There were two failures:
> 1. On SKL 6700k this was because the machine in CI lab is a SKL desktop
> without eDP on Port A. But our VBT initialization code in the driver writes
> VBT defaults in a way that it always sets DP flag on Port A and this does
> not get cleared after parsing the VBT outputs. This has been fixed in
> commit id (bb1d132935c2f8 "drm/i915/vbt: split out defaults that are set
> when there is no VBT) and (665788572c6410b "drm/i915/vbt: don't propagate
> errors from intel_bios_init())
> 
> 2. On ILK-650 desktop - This was happening because of a bad monitor desktop
> combination. I switched the monitor in the CI lab and that helped get rid
> of the link failures on ILK system.
> 
> v10:
> * Rebase on drm-tip and resend after revert
> v9:
> * Use the trimmed max values of link rate/lane count based on
> link train fallback (Daniel Vetter)
> v8:
> * Set link_status to BAD first and then call mode_valid (Jani Nikula)
> v7:
> Remove the redundant variable in previous patch itself
> v6:
> * Obtain link rate index from fallback_link_rate using
> the helper intel_dp_link_rate_index (Jani Nikula)
> * Include fallback within intel_dp_start_link_train (Jani Nikula)
> v5:
> * Move set link status to drm core (Daniel Vetter, Jani Nikula)
> v4:
> * Add fallback support for non DDI platforms too
> * Set connector->link status inside set_link_status function
> (Jani Nikula)
> v3:
> * Set link status property to BAd unconditionally (Jani Nikula)
> * Dont use two separate variables link_train_failed and link_status
> to indicate same thing (Jani Nikula)
> v2:
> * Squashed a few patches (Jani Nikula)
> 
> Acked-by: Tony Cheng <tony.cheng@amd.com>
> Acked-by: Harry Wentland <Harry.wentland@amd.com>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
> Reviewed-by: Jani Nikula <jani.nikula@intel.com>
> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp.c               | 27 +++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_dp_link_training.c | 22 ++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_drv.h              |  3 +++
>  3 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> index fd96a6c..895f934 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -5924,6 +5924,29 @@ intel_dp_init_connector_port_info(struct intel_digital_port *intel_dig_port)
>  	}
>  }
>  
> +static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
> +{
> +	struct intel_connector *intel_connector;
> +	struct drm_connector *connector;
> +
> +	intel_connector = container_of(work, typeof(*intel_connector),
> +				       modeset_retry_work);
> +	connector = &intel_connector->base;
> +	DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n", connector->base.id,
> +		      connector->name);
> +
> +	/* Grab the locks before changing connector property*/
> +	mutex_lock(&connector->dev->mode_config.mutex);
> +	/* Set connector link status to BAD and send a Uevent to notify
> +	 * userspace to do a modeset.
> +	 */
> +	drm_mode_connector_set_link_status_property(connector,
> +						    DRM_MODE_LINK_STATUS_BAD);
> +	mutex_unlock(&connector->dev->mode_config.mutex);
> +	/* Send Hotplug uevent so userspace can reprobe */
> +	drm_kms_helper_hotplug_event(connector->dev);
> +}
> +
>  bool
>  intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
>  			struct intel_connector *intel_connector)
> @@ -5936,6 +5959,10 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
>  	enum port port = intel_dig_port->port;
>  	int type;
>  
> +	/* Initialize the work for modeset in case of link train failure */
> +	INIT_WORK(&intel_connector->modeset_retry_work,
> +		  intel_dp_modeset_retry_work_fn);
> +
>  	if (WARN(intel_dig_port->max_lanes < 1,
>  		 "Not enough lanes (%d) for DP on port %c\n",
>  		 intel_dig_port->max_lanes, port_name(port)))
> diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> index 0048b52..955b239 100644
> --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> @@ -313,6 +313,24 @@ void intel_dp_stop_link_train(struct intel_dp *intel_dp)
>  void
>  intel_dp_start_link_train(struct intel_dp *intel_dp)
>  {
> -	intel_dp_link_training_clock_recovery(intel_dp);
> -	intel_dp_link_training_channel_equalization(intel_dp);
> +	struct intel_connector *intel_connector = intel_dp->attached_connector;
> +
> +	if (!intel_dp_link_training_clock_recovery(intel_dp))
> +		goto failure_handling;
> +	if (!intel_dp_link_training_channel_equalization(intel_dp))
> +		goto failure_handling;
> +
> +	DRM_DEBUG_KMS("Link Training Passed at Link Rate = %d, Lane count = %d",
> +		      intel_dp->link_rate, intel_dp->lane_count);
> +	return;
> +
> + failure_handling:
> +	DRM_DEBUG_KMS("Link Training failed at link rate = %d, lane count = %d",
> +		      intel_dp->link_rate, intel_dp->lane_count);
> +	if (!intel_dp_get_link_train_fallback_values(intel_dp,
> +						     intel_dp->link_rate,
> +						     intel_dp->lane_count))
> +		/* Schedule a Hotplug Uevent to userspace to start modeset */
> +		schedule_work(&intel_connector->modeset_retry_work);

This is where the new boolean intel_dp->train_set_valid will have to be set to false on failure.
This ensures that we dont end up retraining at the same parameters in intel_dp_check_link_status()

So in intel_dp_check-link_status , just before retraining it will add a check, 
if (!intel_dp->train_set_valid)
return;

Ville/Jani what are your thoughts on this?

Regards
Manasi

> +	return;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 51228fe..0fe1ac8 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -321,6 +321,9 @@ struct intel_connector {
>  	void *port; /* store this opaque as its illegal to dereference it */
>  
>  	struct intel_dp *mst_port;
> +
> +	/* Work struct to schedule a uevent on link train failure */
> +	struct work_struct modeset_retry_work;
>  };
>  
>  struct dpll {
> -- 
> 2.1.4
>
Navare, Manasi March 30, 2017, 10:54 p.m. UTC | #2
Hi Jani,

I have reviewed your recent DP link rate / lane count
refactoring patch series and I believe that it will soon get merged.

Are you waiting on merging this patch so that your refactoring
series gets merged first? Or can we merge this patch since
it already has ACKs and R-bs.

Please advise on next steps for this patch, since some
of the PSR related patches are pending merge because of this.

Regards
Manasi


On Wed, Mar 22, 2017 at 08:44:36AM -0700, Manasi Navare wrote:
> Hi Jani/Ville,
> 
> I need to add another quick fix which would be required for the fallback
> to happen as expected. Should I respin this patch to add that fix or
> should I wait for this to get landed?
> 
> I have mentioned the fix suggested below, please let me know your thoughts on that.
> 
> 
> On Tue, Mar 14, 2017 at 03:11:51PM -0700, Manasi Navare wrote:
> > If link training at a link rate optimal for a particular
> > mode fails during modeset's atomic commit phase, then we
> > let the modeset complete and then retry. We save the link rate
> > value at which link training failed, update the link status property
> > to "BAD" and use a lower link rate to prune the modes. It will redo
> > the modeset on the current mode at lower link rate or if the current
> > mode gets pruned due to lower link constraints then, it will send a
> > hotplug uevent for userspace to handle it.
> > 
> > This is also required to pass DP CTS tests 4.3.1.3, 4.3.1.4,
> > 4.3.1.6.
> > 
> > This patch is a resend of the original commit id (233ce881dd91fb
> > "drm/i915: Implement Link Rate fallback on Link training failure")
> > which got reverted in this commit id (afc1ebf4562a14 Revert
> > "drm/i915: Implement Link Rate fallback on Link training failure")
> > due to CI failures.
> > 
> > After investigating the CI failures it was found that these
> > were essentially the failures which were always there but hidden because
> > they used to be DRM_DEBUG_KMS messages for link failures so never got
> > caught by CI. But now this patch actually throws DRM_ERROR if the link
> > training fails at RBR and 1 lane. So it caught these link train failures.
> > 
> > There were two failures:
> > 1. On SKL 6700k this was because the machine in CI lab is a SKL desktop
> > without eDP on Port A. But our VBT initialization code in the driver writes
> > VBT defaults in a way that it always sets DP flag on Port A and this does
> > not get cleared after parsing the VBT outputs. This has been fixed in
> > commit id (bb1d132935c2f8 "drm/i915/vbt: split out defaults that are set
> > when there is no VBT) and (665788572c6410b "drm/i915/vbt: don't propagate
> > errors from intel_bios_init())
> > 
> > 2. On ILK-650 desktop - This was happening because of a bad monitor desktop
> > combination. I switched the monitor in the CI lab and that helped get rid
> > of the link failures on ILK system.
> > 
> > v10:
> > * Rebase on drm-tip and resend after revert
> > v9:
> > * Use the trimmed max values of link rate/lane count based on
> > link train fallback (Daniel Vetter)
> > v8:
> > * Set link_status to BAD first and then call mode_valid (Jani Nikula)
> > v7:
> > Remove the redundant variable in previous patch itself
> > v6:
> > * Obtain link rate index from fallback_link_rate using
> > the helper intel_dp_link_rate_index (Jani Nikula)
> > * Include fallback within intel_dp_start_link_train (Jani Nikula)
> > v5:
> > * Move set link status to drm core (Daniel Vetter, Jani Nikula)
> > v4:
> > * Add fallback support for non DDI platforms too
> > * Set connector->link status inside set_link_status function
> > (Jani Nikula)
> > v3:
> > * Set link status property to BAd unconditionally (Jani Nikula)
> > * Dont use two separate variables link_train_failed and link_status
> > to indicate same thing (Jani Nikula)
> > v2:
> > * Squashed a few patches (Jani Nikula)
> > 
> > Acked-by: Tony Cheng <tony.cheng@amd.com>
> > Acked-by: Harry Wentland <Harry.wentland@amd.com>
> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> > Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
> > Reviewed-by: Jani Nikula <jani.nikula@intel.com>
> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_dp.c               | 27 +++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 22 ++++++++++++++++++++--
> >  drivers/gpu/drm/i915/intel_drv.h              |  3 +++
> >  3 files changed, 50 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> > index fd96a6c..895f934 100644
> > --- a/drivers/gpu/drm/i915/intel_dp.c
> > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > @@ -5924,6 +5924,29 @@ intel_dp_init_connector_port_info(struct intel_digital_port *intel_dig_port)
> >  	}
> >  }
> >  
> > +static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
> > +{
> > +	struct intel_connector *intel_connector;
> > +	struct drm_connector *connector;
> > +
> > +	intel_connector = container_of(work, typeof(*intel_connector),
> > +				       modeset_retry_work);
> > +	connector = &intel_connector->base;
> > +	DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n", connector->base.id,
> > +		      connector->name);
> > +
> > +	/* Grab the locks before changing connector property*/
> > +	mutex_lock(&connector->dev->mode_config.mutex);
> > +	/* Set connector link status to BAD and send a Uevent to notify
> > +	 * userspace to do a modeset.
> > +	 */
> > +	drm_mode_connector_set_link_status_property(connector,
> > +						    DRM_MODE_LINK_STATUS_BAD);
> > +	mutex_unlock(&connector->dev->mode_config.mutex);
> > +	/* Send Hotplug uevent so userspace can reprobe */
> > +	drm_kms_helper_hotplug_event(connector->dev);
> > +}
> > +
> >  bool
> >  intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
> >  			struct intel_connector *intel_connector)
> > @@ -5936,6 +5959,10 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
> >  	enum port port = intel_dig_port->port;
> >  	int type;
> >  
> > +	/* Initialize the work for modeset in case of link train failure */
> > +	INIT_WORK(&intel_connector->modeset_retry_work,
> > +		  intel_dp_modeset_retry_work_fn);
> > +
> >  	if (WARN(intel_dig_port->max_lanes < 1,
> >  		 "Not enough lanes (%d) for DP on port %c\n",
> >  		 intel_dig_port->max_lanes, port_name(port)))
> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > index 0048b52..955b239 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > @@ -313,6 +313,24 @@ void intel_dp_stop_link_train(struct intel_dp *intel_dp)
> >  void
> >  intel_dp_start_link_train(struct intel_dp *intel_dp)
> >  {
> > -	intel_dp_link_training_clock_recovery(intel_dp);
> > -	intel_dp_link_training_channel_equalization(intel_dp);
> > +	struct intel_connector *intel_connector = intel_dp->attached_connector;
> > +
> > +	if (!intel_dp_link_training_clock_recovery(intel_dp))
> > +		goto failure_handling;
> > +	if (!intel_dp_link_training_channel_equalization(intel_dp))
> > +		goto failure_handling;
> > +
> > +	DRM_DEBUG_KMS("Link Training Passed at Link Rate = %d, Lane count = %d",
> > +		      intel_dp->link_rate, intel_dp->lane_count);
> > +	return;
> > +
> > + failure_handling:
> > +	DRM_DEBUG_KMS("Link Training failed at link rate = %d, lane count = %d",
> > +		      intel_dp->link_rate, intel_dp->lane_count);
> > +	if (!intel_dp_get_link_train_fallback_values(intel_dp,
> > +						     intel_dp->link_rate,
> > +						     intel_dp->lane_count))
> > +		/* Schedule a Hotplug Uevent to userspace to start modeset */
> > +		schedule_work(&intel_connector->modeset_retry_work);
> 
> This is where the new boolean intel_dp->train_set_valid will have to be set to false on failure.
> This ensures that we dont end up retraining at the same parameters in intel_dp_check_link_status()
> 
> So in intel_dp_check-link_status , just before retraining it will add a check, 
> if (!intel_dp->train_set_valid)
> return;
> 
> Ville/Jani what are your thoughts on this?
> 
> Regards
> Manasi
> 
> > +	return;
> >  }
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 51228fe..0fe1ac8 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -321,6 +321,9 @@ struct intel_connector {
> >  	void *port; /* store this opaque as its illegal to dereference it */
> >  
> >  	struct intel_dp *mst_port;
> > +
> > +	/* Work struct to schedule a uevent on link train failure */
> > +	struct work_struct modeset_retry_work;
> >  };
> >  
> >  struct dpll {
> > -- 
> > 2.1.4
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Jani Nikula March 31, 2017, 10:33 a.m. UTC | #3
On Fri, 31 Mar 2017, Manasi Navare <manasi.d.navare@intel.com> wrote:
> Hi Jani,
>
> I have reviewed your recent DP link rate / lane count
> refactoring patch series and I believe that it will soon get merged.

I haven't received reviews on all of the patches in my series yet, in
particular the first patches are missing reviews so I can't even get
started pushing them.

BR,
Jani.


>
> Are you waiting on merging this patch so that your refactoring
> series gets merged first? Or can we merge this patch since
> it already has ACKs and R-bs.
>
> Please advise on next steps for this patch, since some
> of the PSR related patches are pending merge because of this.
>
> Regards
> Manasi
>
>
> On Wed, Mar 22, 2017 at 08:44:36AM -0700, Manasi Navare wrote:
>> Hi Jani/Ville,
>> 
>> I need to add another quick fix which would be required for the fallback
>> to happen as expected. Should I respin this patch to add that fix or
>> should I wait for this to get landed?
>> 
>> I have mentioned the fix suggested below, please let me know your thoughts on that.
>> 
>> 
>> On Tue, Mar 14, 2017 at 03:11:51PM -0700, Manasi Navare wrote:
>> > If link training at a link rate optimal for a particular
>> > mode fails during modeset's atomic commit phase, then we
>> > let the modeset complete and then retry. We save the link rate
>> > value at which link training failed, update the link status property
>> > to "BAD" and use a lower link rate to prune the modes. It will redo
>> > the modeset on the current mode at lower link rate or if the current
>> > mode gets pruned due to lower link constraints then, it will send a
>> > hotplug uevent for userspace to handle it.
>> > 
>> > This is also required to pass DP CTS tests 4.3.1.3, 4.3.1.4,
>> > 4.3.1.6.
>> > 
>> > This patch is a resend of the original commit id (233ce881dd91fb
>> > "drm/i915: Implement Link Rate fallback on Link training failure")
>> > which got reverted in this commit id (afc1ebf4562a14 Revert
>> > "drm/i915: Implement Link Rate fallback on Link training failure")
>> > due to CI failures.
>> > 
>> > After investigating the CI failures it was found that these
>> > were essentially the failures which were always there but hidden because
>> > they used to be DRM_DEBUG_KMS messages for link failures so never got
>> > caught by CI. But now this patch actually throws DRM_ERROR if the link
>> > training fails at RBR and 1 lane. So it caught these link train failures.
>> > 
>> > There were two failures:
>> > 1. On SKL 6700k this was because the machine in CI lab is a SKL desktop
>> > without eDP on Port A. But our VBT initialization code in the driver writes
>> > VBT defaults in a way that it always sets DP flag on Port A and this does
>> > not get cleared after parsing the VBT outputs. This has been fixed in
>> > commit id (bb1d132935c2f8 "drm/i915/vbt: split out defaults that are set
>> > when there is no VBT) and (665788572c6410b "drm/i915/vbt: don't propagate
>> > errors from intel_bios_init())
>> > 
>> > 2. On ILK-650 desktop - This was happening because of a bad monitor desktop
>> > combination. I switched the monitor in the CI lab and that helped get rid
>> > of the link failures on ILK system.
>> > 
>> > v10:
>> > * Rebase on drm-tip and resend after revert
>> > v9:
>> > * Use the trimmed max values of link rate/lane count based on
>> > link train fallback (Daniel Vetter)
>> > v8:
>> > * Set link_status to BAD first and then call mode_valid (Jani Nikula)
>> > v7:
>> > Remove the redundant variable in previous patch itself
>> > v6:
>> > * Obtain link rate index from fallback_link_rate using
>> > the helper intel_dp_link_rate_index (Jani Nikula)
>> > * Include fallback within intel_dp_start_link_train (Jani Nikula)
>> > v5:
>> > * Move set link status to drm core (Daniel Vetter, Jani Nikula)
>> > v4:
>> > * Add fallback support for non DDI platforms too
>> > * Set connector->link status inside set_link_status function
>> > (Jani Nikula)
>> > v3:
>> > * Set link status property to BAd unconditionally (Jani Nikula)
>> > * Dont use two separate variables link_train_failed and link_status
>> > to indicate same thing (Jani Nikula)
>> > v2:
>> > * Squashed a few patches (Jani Nikula)
>> > 
>> > Acked-by: Tony Cheng <tony.cheng@amd.com>
>> > Acked-by: Harry Wentland <Harry.wentland@amd.com>
>> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
>> > Cc: Daniel Vetter <daniel.vetter@intel.com>
>> > Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
>> > Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
>> > Reviewed-by: Jani Nikula <jani.nikula@intel.com>
>> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_dp.c               | 27 +++++++++++++++++++++++++++
>> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 22 ++++++++++++++++++++--
>> >  drivers/gpu/drm/i915/intel_drv.h              |  3 +++
>> >  3 files changed, 50 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
>> > index fd96a6c..895f934 100644
>> > --- a/drivers/gpu/drm/i915/intel_dp.c
>> > +++ b/drivers/gpu/drm/i915/intel_dp.c
>> > @@ -5924,6 +5924,29 @@ intel_dp_init_connector_port_info(struct intel_digital_port *intel_dig_port)
>> >  	}
>> >  }
>> >  
>> > +static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
>> > +{
>> > +	struct intel_connector *intel_connector;
>> > +	struct drm_connector *connector;
>> > +
>> > +	intel_connector = container_of(work, typeof(*intel_connector),
>> > +				       modeset_retry_work);
>> > +	connector = &intel_connector->base;
>> > +	DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n", connector->base.id,
>> > +		      connector->name);
>> > +
>> > +	/* Grab the locks before changing connector property*/
>> > +	mutex_lock(&connector->dev->mode_config.mutex);
>> > +	/* Set connector link status to BAD and send a Uevent to notify
>> > +	 * userspace to do a modeset.
>> > +	 */
>> > +	drm_mode_connector_set_link_status_property(connector,
>> > +						    DRM_MODE_LINK_STATUS_BAD);
>> > +	mutex_unlock(&connector->dev->mode_config.mutex);
>> > +	/* Send Hotplug uevent so userspace can reprobe */
>> > +	drm_kms_helper_hotplug_event(connector->dev);
>> > +}
>> > +
>> >  bool
>> >  intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
>> >  			struct intel_connector *intel_connector)
>> > @@ -5936,6 +5959,10 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
>> >  	enum port port = intel_dig_port->port;
>> >  	int type;
>> >  
>> > +	/* Initialize the work for modeset in case of link train failure */
>> > +	INIT_WORK(&intel_connector->modeset_retry_work,
>> > +		  intel_dp_modeset_retry_work_fn);
>> > +
>> >  	if (WARN(intel_dig_port->max_lanes < 1,
>> >  		 "Not enough lanes (%d) for DP on port %c\n",
>> >  		 intel_dig_port->max_lanes, port_name(port)))
>> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
>> > index 0048b52..955b239 100644
>> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
>> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
>> > @@ -313,6 +313,24 @@ void intel_dp_stop_link_train(struct intel_dp *intel_dp)
>> >  void
>> >  intel_dp_start_link_train(struct intel_dp *intel_dp)
>> >  {
>> > -	intel_dp_link_training_clock_recovery(intel_dp);
>> > -	intel_dp_link_training_channel_equalization(intel_dp);
>> > +	struct intel_connector *intel_connector = intel_dp->attached_connector;
>> > +
>> > +	if (!intel_dp_link_training_clock_recovery(intel_dp))
>> > +		goto failure_handling;
>> > +	if (!intel_dp_link_training_channel_equalization(intel_dp))
>> > +		goto failure_handling;
>> > +
>> > +	DRM_DEBUG_KMS("Link Training Passed at Link Rate = %d, Lane count = %d",
>> > +		      intel_dp->link_rate, intel_dp->lane_count);
>> > +	return;
>> > +
>> > + failure_handling:
>> > +	DRM_DEBUG_KMS("Link Training failed at link rate = %d, lane count = %d",
>> > +		      intel_dp->link_rate, intel_dp->lane_count);
>> > +	if (!intel_dp_get_link_train_fallback_values(intel_dp,
>> > +						     intel_dp->link_rate,
>> > +						     intel_dp->lane_count))
>> > +		/* Schedule a Hotplug Uevent to userspace to start modeset */
>> > +		schedule_work(&intel_connector->modeset_retry_work);
>> 
>> This is where the new boolean intel_dp->train_set_valid will have to be set to false on failure.
>> This ensures that we dont end up retraining at the same parameters in intel_dp_check_link_status()
>> 
>> So in intel_dp_check-link_status , just before retraining it will add a check, 
>> if (!intel_dp->train_set_valid)
>> return;
>> 
>> Ville/Jani what are your thoughts on this?
>> 
>> Regards
>> Manasi
>> 
>> > +	return;
>> >  }
>> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>> > index 51228fe..0fe1ac8 100644
>> > --- a/drivers/gpu/drm/i915/intel_drv.h
>> > +++ b/drivers/gpu/drm/i915/intel_drv.h
>> > @@ -321,6 +321,9 @@ struct intel_connector {
>> >  	void *port; /* store this opaque as its illegal to dereference it */
>> >  
>> >  	struct intel_dp *mst_port;
>> > +
>> > +	/* Work struct to schedule a uevent on link train failure */
>> > +	struct work_struct modeset_retry_work;
>> >  };
>> >  
>> >  struct dpll {
>> > -- 
>> > 2.1.4
>> > 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Navare, Manasi March 31, 2017, 6:17 p.m. UTC | #4
On Fri, Mar 31, 2017 at 01:33:04PM +0300, Jani Nikula wrote:
> On Fri, 31 Mar 2017, Manasi Navare <manasi.d.navare@intel.com> wrote:
> > Hi Jani,
> >
> > I have reviewed your recent DP link rate / lane count
> > refactoring patch series and I believe that it will soon get merged.
> 
> I haven't received reviews on all of the patches in my series yet, in
> particular the first patches are missing reviews so I can't even get
> started pushing them.
> 
> BR,
> Jani.
>

Ok, I will spend some time reviewing these patches today so you
can push those. 
Is that why you are waiting to merge this patch?

Regards
Manasi

 
> 
> >
> > Are you waiting on merging this patch so that your refactoring
> > series gets merged first? Or can we merge this patch since
> > it already has ACKs and R-bs.
> >
> > Please advise on next steps for this patch, since some
> > of the PSR related patches are pending merge because of this.
> >
> > Regards
> > Manasi
> >
> >
> > On Wed, Mar 22, 2017 at 08:44:36AM -0700, Manasi Navare wrote:
> >> Hi Jani/Ville,
> >> 
> >> I need to add another quick fix which would be required for the fallback
> >> to happen as expected. Should I respin this patch to add that fix or
> >> should I wait for this to get landed?
> >> 
> >> I have mentioned the fix suggested below, please let me know your thoughts on that.
> >> 
> >> 
> >> On Tue, Mar 14, 2017 at 03:11:51PM -0700, Manasi Navare wrote:
> >> > If link training at a link rate optimal for a particular
> >> > mode fails during modeset's atomic commit phase, then we
> >> > let the modeset complete and then retry. We save the link rate
> >> > value at which link training failed, update the link status property
> >> > to "BAD" and use a lower link rate to prune the modes. It will redo
> >> > the modeset on the current mode at lower link rate or if the current
> >> > mode gets pruned due to lower link constraints then, it will send a
> >> > hotplug uevent for userspace to handle it.
> >> > 
> >> > This is also required to pass DP CTS tests 4.3.1.3, 4.3.1.4,
> >> > 4.3.1.6.
> >> > 
> >> > This patch is a resend of the original commit id (233ce881dd91fb
> >> > "drm/i915: Implement Link Rate fallback on Link training failure")
> >> > which got reverted in this commit id (afc1ebf4562a14 Revert
> >> > "drm/i915: Implement Link Rate fallback on Link training failure")
> >> > due to CI failures.
> >> > 
> >> > After investigating the CI failures it was found that these
> >> > were essentially the failures which were always there but hidden because
> >> > they used to be DRM_DEBUG_KMS messages for link failures so never got
> >> > caught by CI. But now this patch actually throws DRM_ERROR if the link
> >> > training fails at RBR and 1 lane. So it caught these link train failures.
> >> > 
> >> > There were two failures:
> >> > 1. On SKL 6700k this was because the machine in CI lab is a SKL desktop
> >> > without eDP on Port A. But our VBT initialization code in the driver writes
> >> > VBT defaults in a way that it always sets DP flag on Port A and this does
> >> > not get cleared after parsing the VBT outputs. This has been fixed in
> >> > commit id (bb1d132935c2f8 "drm/i915/vbt: split out defaults that are set
> >> > when there is no VBT) and (665788572c6410b "drm/i915/vbt: don't propagate
> >> > errors from intel_bios_init())
> >> > 
> >> > 2. On ILK-650 desktop - This was happening because of a bad monitor desktop
> >> > combination. I switched the monitor in the CI lab and that helped get rid
> >> > of the link failures on ILK system.
> >> > 
> >> > v10:
> >> > * Rebase on drm-tip and resend after revert
> >> > v9:
> >> > * Use the trimmed max values of link rate/lane count based on
> >> > link train fallback (Daniel Vetter)
> >> > v8:
> >> > * Set link_status to BAD first and then call mode_valid (Jani Nikula)
> >> > v7:
> >> > Remove the redundant variable in previous patch itself
> >> > v6:
> >> > * Obtain link rate index from fallback_link_rate using
> >> > the helper intel_dp_link_rate_index (Jani Nikula)
> >> > * Include fallback within intel_dp_start_link_train (Jani Nikula)
> >> > v5:
> >> > * Move set link status to drm core (Daniel Vetter, Jani Nikula)
> >> > v4:
> >> > * Add fallback support for non DDI platforms too
> >> > * Set connector->link status inside set_link_status function
> >> > (Jani Nikula)
> >> > v3:
> >> > * Set link status property to BAd unconditionally (Jani Nikula)
> >> > * Dont use two separate variables link_train_failed and link_status
> >> > to indicate same thing (Jani Nikula)
> >> > v2:
> >> > * Squashed a few patches (Jani Nikula)
> >> > 
> >> > Acked-by: Tony Cheng <tony.cheng@amd.com>
> >> > Acked-by: Harry Wentland <Harry.wentland@amd.com>
> >> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> >> > Cc: Daniel Vetter <daniel.vetter@intel.com>
> >> > Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> >> > Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
> >> > Reviewed-by: Jani Nikula <jani.nikula@intel.com>
> >> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/intel_dp.c               | 27 +++++++++++++++++++++++++++
> >> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 22 ++++++++++++++++++++--
> >> >  drivers/gpu/drm/i915/intel_drv.h              |  3 +++
> >> >  3 files changed, 50 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> >> > index fd96a6c..895f934 100644
> >> > --- a/drivers/gpu/drm/i915/intel_dp.c
> >> > +++ b/drivers/gpu/drm/i915/intel_dp.c
> >> > @@ -5924,6 +5924,29 @@ intel_dp_init_connector_port_info(struct intel_digital_port *intel_dig_port)
> >> >  	}
> >> >  }
> >> >  
> >> > +static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
> >> > +{
> >> > +	struct intel_connector *intel_connector;
> >> > +	struct drm_connector *connector;
> >> > +
> >> > +	intel_connector = container_of(work, typeof(*intel_connector),
> >> > +				       modeset_retry_work);
> >> > +	connector = &intel_connector->base;
> >> > +	DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n", connector->base.id,
> >> > +		      connector->name);
> >> > +
> >> > +	/* Grab the locks before changing connector property*/
> >> > +	mutex_lock(&connector->dev->mode_config.mutex);
> >> > +	/* Set connector link status to BAD and send a Uevent to notify
> >> > +	 * userspace to do a modeset.
> >> > +	 */
> >> > +	drm_mode_connector_set_link_status_property(connector,
> >> > +						    DRM_MODE_LINK_STATUS_BAD);
> >> > +	mutex_unlock(&connector->dev->mode_config.mutex);
> >> > +	/* Send Hotplug uevent so userspace can reprobe */
> >> > +	drm_kms_helper_hotplug_event(connector->dev);
> >> > +}
> >> > +
> >> >  bool
> >> >  intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
> >> >  			struct intel_connector *intel_connector)
> >> > @@ -5936,6 +5959,10 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
> >> >  	enum port port = intel_dig_port->port;
> >> >  	int type;
> >> >  
> >> > +	/* Initialize the work for modeset in case of link train failure */
> >> > +	INIT_WORK(&intel_connector->modeset_retry_work,
> >> > +		  intel_dp_modeset_retry_work_fn);
> >> > +
> >> >  	if (WARN(intel_dig_port->max_lanes < 1,
> >> >  		 "Not enough lanes (%d) for DP on port %c\n",
> >> >  		 intel_dig_port->max_lanes, port_name(port)))
> >> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> >> > index 0048b52..955b239 100644
> >> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> >> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> >> > @@ -313,6 +313,24 @@ void intel_dp_stop_link_train(struct intel_dp *intel_dp)
> >> >  void
> >> >  intel_dp_start_link_train(struct intel_dp *intel_dp)
> >> >  {
> >> > -	intel_dp_link_training_clock_recovery(intel_dp);
> >> > -	intel_dp_link_training_channel_equalization(intel_dp);
> >> > +	struct intel_connector *intel_connector = intel_dp->attached_connector;
> >> > +
> >> > +	if (!intel_dp_link_training_clock_recovery(intel_dp))
> >> > +		goto failure_handling;
> >> > +	if (!intel_dp_link_training_channel_equalization(intel_dp))
> >> > +		goto failure_handling;
> >> > +
> >> > +	DRM_DEBUG_KMS("Link Training Passed at Link Rate = %d, Lane count = %d",
> >> > +		      intel_dp->link_rate, intel_dp->lane_count);
> >> > +	return;
> >> > +
> >> > + failure_handling:
> >> > +	DRM_DEBUG_KMS("Link Training failed at link rate = %d, lane count = %d",
> >> > +		      intel_dp->link_rate, intel_dp->lane_count);
> >> > +	if (!intel_dp_get_link_train_fallback_values(intel_dp,
> >> > +						     intel_dp->link_rate,
> >> > +						     intel_dp->lane_count))
> >> > +		/* Schedule a Hotplug Uevent to userspace to start modeset */
> >> > +		schedule_work(&intel_connector->modeset_retry_work);
> >> 
> >> This is where the new boolean intel_dp->train_set_valid will have to be set to false on failure.
> >> This ensures that we dont end up retraining at the same parameters in intel_dp_check_link_status()
> >> 
> >> So in intel_dp_check-link_status , just before retraining it will add a check, 
> >> if (!intel_dp->train_set_valid)
> >> return;
> >> 
> >> Ville/Jani what are your thoughts on this?
> >> 
> >> Regards
> >> Manasi
> >> 
> >> > +	return;
> >> >  }
> >> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> >> > index 51228fe..0fe1ac8 100644
> >> > --- a/drivers/gpu/drm/i915/intel_drv.h
> >> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> >> > @@ -321,6 +321,9 @@ struct intel_connector {
> >> >  	void *port; /* store this opaque as its illegal to dereference it */
> >> >  
> >> >  	struct intel_dp *mst_port;
> >> > +
> >> > +	/* Work struct to schedule a uevent on link train failure */
> >> > +	struct work_struct modeset_retry_work;
> >> >  };
> >> >  
> >> >  struct dpll {
> >> > -- 
> >> > 2.1.4
> >> > 
> >> _______________________________________________
> >> Intel-gfx mailing list
> >> Intel-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index fd96a6c..895f934 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -5924,6 +5924,29 @@  intel_dp_init_connector_port_info(struct intel_digital_port *intel_dig_port)
 	}
 }
 
+static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
+{
+	struct intel_connector *intel_connector;
+	struct drm_connector *connector;
+
+	intel_connector = container_of(work, typeof(*intel_connector),
+				       modeset_retry_work);
+	connector = &intel_connector->base;
+	DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n", connector->base.id,
+		      connector->name);
+
+	/* Grab the locks before changing connector property*/
+	mutex_lock(&connector->dev->mode_config.mutex);
+	/* Set connector link status to BAD and send a Uevent to notify
+	 * userspace to do a modeset.
+	 */
+	drm_mode_connector_set_link_status_property(connector,
+						    DRM_MODE_LINK_STATUS_BAD);
+	mutex_unlock(&connector->dev->mode_config.mutex);
+	/* Send Hotplug uevent so userspace can reprobe */
+	drm_kms_helper_hotplug_event(connector->dev);
+}
+
 bool
 intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 			struct intel_connector *intel_connector)
@@ -5936,6 +5959,10 @@  intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 	enum port port = intel_dig_port->port;
 	int type;
 
+	/* Initialize the work for modeset in case of link train failure */
+	INIT_WORK(&intel_connector->modeset_retry_work,
+		  intel_dp_modeset_retry_work_fn);
+
 	if (WARN(intel_dig_port->max_lanes < 1,
 		 "Not enough lanes (%d) for DP on port %c\n",
 		 intel_dig_port->max_lanes, port_name(port)))
diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
index 0048b52..955b239 100644
--- a/drivers/gpu/drm/i915/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
@@ -313,6 +313,24 @@  void intel_dp_stop_link_train(struct intel_dp *intel_dp)
 void
 intel_dp_start_link_train(struct intel_dp *intel_dp)
 {
-	intel_dp_link_training_clock_recovery(intel_dp);
-	intel_dp_link_training_channel_equalization(intel_dp);
+	struct intel_connector *intel_connector = intel_dp->attached_connector;
+
+	if (!intel_dp_link_training_clock_recovery(intel_dp))
+		goto failure_handling;
+	if (!intel_dp_link_training_channel_equalization(intel_dp))
+		goto failure_handling;
+
+	DRM_DEBUG_KMS("Link Training Passed at Link Rate = %d, Lane count = %d",
+		      intel_dp->link_rate, intel_dp->lane_count);
+	return;
+
+ failure_handling:
+	DRM_DEBUG_KMS("Link Training failed at link rate = %d, lane count = %d",
+		      intel_dp->link_rate, intel_dp->lane_count);
+	if (!intel_dp_get_link_train_fallback_values(intel_dp,
+						     intel_dp->link_rate,
+						     intel_dp->lane_count))
+		/* Schedule a Hotplug Uevent to userspace to start modeset */
+		schedule_work(&intel_connector->modeset_retry_work);
+	return;
 }
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 51228fe..0fe1ac8 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -321,6 +321,9 @@  struct intel_connector {
 	void *port; /* store this opaque as its illegal to dereference it */
 
 	struct intel_dp *mst_port;
+
+	/* Work struct to schedule a uevent on link train failure */
+	struct work_struct modeset_retry_work;
 };
 
 struct dpll {