diff mbox

[v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts

Message ID 1531764885-18185-1-git-send-email-nathan.d.ciobanu@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nathan Ciobanu July 16, 2018, 6:14 p.m. UTC
Limit the link training clock recovery loop to 10 failed attempts at
LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2. Some USB-C MST hubs
cause us to get stuck in this loop indefinitely requesting

    voltage swing: 0, pre-emphasis level: 2
    voltage swing: 1, pre-emphasis level: 2
    voltage swing: 0, pre-emphasis level: 3

over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
never returns true and voltage_tries always gets reset to 1. The driver
sends those values to the hub but the hub keeps requesting new values
every time in the pattern shown above.

Changes in v2:
    - updated commit message (DK, Manasi)
    - defined DP_DP14_MAX_CR_TRIES (Marc)
    - made the loop iterate for max 10 times (Rodrigo, Marc)

Changes in v3:
    - changed error message to use DP_DP14_MAX_CR_TRIES

Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
 include/drm/drm_dp_helper.h                   | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

Comments

Marc Herbert July 16, 2018, 7:22 p.m. UTC | #1
While the shortness of v3 is really nice, I think it's rather a good
opportunity to make much clearer (as a separate, no functional change
patch?)  its existing terminating condition(s) and what the existing loop
iterates on. I mean it's painful and risky enough to _combine multiple
counters in the same loop_, so at least these different counters should be
_invidually_ crystal-clear with enough comments. For instance let's pretend
there are 4 possible voltage values and that each value is tried 4 times -
then the last value will never be tried! Can this ever happen? If not then
why not? Not even with a buggy sink?  If it can happen then is it fine to
give up before trying some values? Is it compliant with the spec? Etc.

This should incidentally help clarify why and how the current logic allows
infinite loops.

BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
remove this confusing "increment from false to true":

- if (max_vswing_tries == 1)
+ if (max_vswing_tries)

- max_vswing_tries++;
+ max_vwing_tries = true;

Marc


On 16/07/2018 11:14, Nathan Ciobanu wrote:
> Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
>  include/drm/drm_dp_helper.h                   | 1 +
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> index 4da6e33c7fa1..4bfba65c431c 100644
> --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
>  {
>  	uint8_t voltage;
> -	int voltage_tries, max_vswing_tries;
> +	int voltage_tries, max_vswing_tries, cr_tries;
>  	uint8_t link_config[2];
>  	uint8_t link_bw, rate_select;
>  
> @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  
>  	voltage_tries = 1;
>  	max_vswing_tries = 0;
> -	for (;;) {
> +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
>  		uint8_t link_status[DP_LINK_STATUS_SIZE];
>  
>  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  			++max_vswing_tries;
>  
>  	}
> +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> +		  DP_DP14_MAX_CR_TRIES);
> +	return false;
>  }
>
Rodrigo Vivi July 16, 2018, 10:30 p.m. UTC | #2
On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote:
> 
> While the shortness of v3 is really nice, I think it's rather a good
> opportunity to make much clearer (as a separate, no functional change
> patch?)  its existing terminating condition(s) and what the existing loop
> iterates on. I mean it's painful and risky enough to _combine multiple
> counters in the same loop_, so at least these different counters should be
> _invidually_ crystal-clear with enough comments. For instance let's pretend
> there are 4 possible voltage values and that each value is tried 4 times -
> then the last value will never be tried! Can this ever happen? If not then
> why not? Not even with a buggy sink?  If it can happen then is it fine to
> give up before trying some values? Is it compliant with the spec? Etc.

hm.. it seems that that code has room for improvements to make it more
clear and easier to map with the spec...
but yeap, separated patches.

> 
> This should incidentally help clarify why and how the current logic allows
> infinite loops.
> 
> BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
> remove this confusing "increment from false to true":
> 
> - if (max_vswing_tries == 1)
> + if (max_vswing_tries)
> 
> - max_vswing_tries++;
> + max_vwing_tries = true;

if we change to boolean it is better to really change the type
and remove "_tries" from the name....

> 
> Marc
> 
> 
> On 16/07/2018 11:14, Nathan Ciobanu wrote:
> > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> >  include/drm/drm_dp_helper.h                   | 1 +
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > index 4da6e33c7fa1..4bfba65c431c 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> >  {
> >  	uint8_t voltage;
> > -	int voltage_tries, max_vswing_tries;
> > +	int voltage_tries, max_vswing_tries, cr_tries;
> >  	uint8_t link_config[2];
> >  	uint8_t link_bw, rate_select;
> >  
> > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  
> >  	voltage_tries = 1;
> >  	max_vswing_tries = 0;
> > -	for (;;) {
> > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> >  
> >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  			++max_vswing_tries;
> >  
> >  	}
> > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > +		  DP_DP14_MAX_CR_TRIES);
> > +	return false;
> >  }
> >
Rodrigo Vivi July 16, 2018, 10:34 p.m. UTC | #3
On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote:
> Limit the link training clock recovery loop to 10 failed attempts at
> LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2.

Thanks... so this made me look to DP 1.3 and DP 1.4 differences here.
Are we already addressing all new conditions?

I wonder if that should be at drm level.

I noticed that one difference, 100us wait difference is already
addressed there at drm level...

> Some USB-C MST hubs
> cause us to get stuck in this loop indefinitely requesting
> 
>     voltage swing: 0, pre-emphasis level: 2
>     voltage swing: 1, pre-emphasis level: 2
>     voltage swing: 0, pre-emphasis level: 3
> 
> over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
> never returns true and voltage_tries always gets reset to 1. The driver
> sends those values to the hub but the hub keeps requesting new values
> every time in the pattern shown above.
> 
> Changes in v2:
>     - updated commit message (DK, Manasi)
>     - defined DP_DP14_MAX_CR_TRIES (Marc)
>     - made the loop iterate for max 10 times (Rodrigo, Marc)
> 
> Changes in v3:
>     - changed error message to use DP_DP14_MAX_CR_TRIES
> 
> Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
>  include/drm/drm_dp_helper.h                   | 1 +
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> index 4da6e33c7fa1..4bfba65c431c 100644
> --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
>  {
>  	uint8_t voltage;
> -	int voltage_tries, max_vswing_tries;
> +	int voltage_tries, max_vswing_tries, cr_tries;
>  	uint8_t link_config[2];
>  	uint8_t link_bw, rate_select;
>  
> @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  
>  	voltage_tries = 1;
>  	max_vswing_tries = 0;
> -	for (;;) {
> +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {

I know that I was the on that asked you to make this global here,
but just now with better spec pointers and definition is that I noticed
that this is only for DP 1.4 spec while this code here runs for all
DP...  So we need to change in a way that we check the spec version somehow...

and possibly at drm level?!

>  		uint8_t link_status[DP_LINK_STATUS_SIZE];
>  
>  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  			++max_vswing_tries;
>  
>  	}
> +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> +		  DP_DP14_MAX_CR_TRIES);
> +	return false;
>  }
>  
>  /*
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index c01564991a9f..2303ad8ed24e 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -820,6 +820,7 @@
>  
>  #define DP_DP13_DPCD_REV                    0x2200
>  #define DP_DP13_MAX_LINK_RATE               0x2201
> +#define DP_DP14_MAX_CR_TRIES                10
>  
>  #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
>  # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
> -- 
> 1.9.1
>
Nathan Ciobanu July 16, 2018, 11:23 p.m. UTC | #4
On Mon, Jul 16, 2018 at 03:34:58PM -0700, Rodrigo Vivi wrote:
> On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote:
> > Limit the link training clock recovery loop to 10 failed attempts at
> > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2.
> 
> Thanks... so this made me look to DP 1.3 and DP 1.4 differences here.
> Are we already addressing all new conditions?
> 
> I wonder if that should be at drm level.
> 
> I noticed that one difference, 100us wait difference is already
> addressed there at drm level...
> 
> > Some USB-C MST hubs
> > cause us to get stuck in this loop indefinitely requesting
> > 
> >     voltage swing: 0, pre-emphasis level: 2
> >     voltage swing: 1, pre-emphasis level: 2
> >     voltage swing: 0, pre-emphasis level: 3
> > 
> > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
> > never returns true and voltage_tries always gets reset to 1. The driver
> > sends those values to the hub but the hub keeps requesting new values
> > every time in the pattern shown above.
> > 
> > Changes in v2:
> >     - updated commit message (DK, Manasi)
> >     - defined DP_DP14_MAX_CR_TRIES (Marc)
> >     - made the loop iterate for max 10 times (Rodrigo, Marc)
> > 
> > Changes in v3:
> >     - changed error message to use DP_DP14_MAX_CR_TRIES
> > 
> > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> >  include/drm/drm_dp_helper.h                   | 1 +
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > index 4da6e33c7fa1..4bfba65c431c 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> >  {
> >  	uint8_t voltage;
> > -	int voltage_tries, max_vswing_tries;
> > +	int voltage_tries, max_vswing_tries, cr_tries;
> >  	uint8_t link_config[2];
> >  	uint8_t link_bw, rate_select;
> >  
> > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  
> >  	voltage_tries = 1;
> >  	max_vswing_tries = 0;
> > -	for (;;) {
> > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> 
> I know that I was the on that asked you to make this global here,
> but just now with better spec pointers and definition is that I noticed
> that this is only for DP 1.4 spec while this code here runs for all
> DP...  So we need to change in a way that we check the spec version somehow...
I think the bug is with this infinite loop which is at the mercy of an external device
and in my case I have this MST hub which appears to be DP 1.2 that triggers the
infinite loop case. We have to limit the number of iterations and
DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case
shouldn't we apply the same counter to all <= "DP 1.4" devices?

-Nathan 
> 
> and possibly at drm level?!
> 
> >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> >  
> >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  			++max_vswing_tries;
> >  
> >  	}
> > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > +		  DP_DP14_MAX_CR_TRIES);
> > +	return false;
> >  }
> >  
> >  /*
> > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> > index c01564991a9f..2303ad8ed24e 100644
> > --- a/include/drm/drm_dp_helper.h
> > +++ b/include/drm/drm_dp_helper.h
> > @@ -820,6 +820,7 @@
> >  
> >  #define DP_DP13_DPCD_REV                    0x2200
> >  #define DP_DP13_MAX_LINK_RATE               0x2201
> > +#define DP_DP14_MAX_CR_TRIES                10
> >  
> >  #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
> >  # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
> > -- 
> > 1.9.1
> > 
>
Nathan Ciobanu July 16, 2018, 11:27 p.m. UTC | #5
On Mon, Jul 16, 2018 at 03:30:49PM -0700, Rodrigo Vivi wrote:
> On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote:
> > 
> > While the shortness of v3 is really nice, I think it's rather a good
> > opportunity to make much clearer (as a separate, no functional change
> > patch?)  its existing terminating condition(s) and what the existing loop
> > iterates on. I mean it's painful and risky enough to _combine multiple
> > counters in the same loop_, so at least these different counters should be
> > _invidually_ crystal-clear with enough comments. For instance let's pretend
> > there are 4 possible voltage values and that each value is tried 4 times -
> > then the last value will never be tried! Can this ever happen? If not then
> > why not? Not even with a buggy sink?  If it can happen then is it fine to
> > give up before trying some values? Is it compliant with the spec? Etc.
> 
> hm.. it seems that that code has room for improvements to make it more
> clear and easier to map with the spec...
> but yeap, separated patches.
Do we like to add macros for the other counters? I can try to clarify this a bit.

> 
> > 
> > This should incidentally help clarify why and how the current logic allows
> > infinite loops.
> > 
> > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
> > remove this confusing "increment from false to true":
> > 
> > - if (max_vswing_tries == 1)
> > + if (max_vswing_tries)
> > 
> > - max_vswing_tries++;
> > + max_vwing_tries = true;
> 
> if we change to boolean it is better to really change the type
> and remove "_tries" from the name....
Yup, will do!

> 
> > 
> > Marc
> > 
> > 
> > On 16/07/2018 11:14, Nathan Ciobanu wrote:
> > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> > >  include/drm/drm_dp_helper.h                   | 1 +
> > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > index 4da6e33c7fa1..4bfba65c431c 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> > >  {
> > >  	uint8_t voltage;
> > > -	int voltage_tries, max_vswing_tries;
> > > +	int voltage_tries, max_vswing_tries, cr_tries;
> > >  	uint8_t link_config[2];
> > >  	uint8_t link_bw, rate_select;
> > >  
> > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  
> > >  	voltage_tries = 1;
> > >  	max_vswing_tries = 0;
> > > -	for (;;) {
> > > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> > >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> > >  
> > >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  			++max_vswing_tries;
> > >  
> > >  	}
> > > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > > +		  DP_DP14_MAX_CR_TRIES);
> > > +	return false;
> > >  }
> > >  
>
Rodrigo Vivi July 16, 2018, 11:39 p.m. UTC | #6
On Mon, Jul 16, 2018 at 04:27:47PM -0700, Nathan Ciobanu wrote:
> On Mon, Jul 16, 2018 at 03:30:49PM -0700, Rodrigo Vivi wrote:
> > On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote:
> > > 
> > > While the shortness of v3 is really nice, I think it's rather a good
> > > opportunity to make much clearer (as a separate, no functional change
> > > patch?)  its existing terminating condition(s) and what the existing loop
> > > iterates on. I mean it's painful and risky enough to _combine multiple
> > > counters in the same loop_, so at least these different counters should be
> > > _invidually_ crystal-clear with enough comments. For instance let's pretend
> > > there are 4 possible voltage values and that each value is tried 4 times -
> > > then the last value will never be tried! Can this ever happen? If not then
> > > why not? Not even with a buggy sink?  If it can happen then is it fine to
> > > give up before trying some values? Is it compliant with the spec? Etc.
> > 
> > hm.. it seems that that code has room for improvements to make it more
> > clear and easier to map with the spec...
> > but yeap, separated patches.
> Do we like to add macros for the other counters? I can try to clarify this a bit.
> 
> > 
> > > 
> > > This should incidentally help clarify why and how the current logic allows
> > > infinite loops.
> > > 
> > > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
> > > remove this confusing "increment from false to true":
> > > 
> > > - if (max_vswing_tries == 1)
> > > + if (max_vswing_tries)
> > > 
> > > - max_vswing_tries++;
> > > + max_vwing_tries = true;
> > 
> > if we change to boolean it is better to really change the type
> > and remove "_tries" from the name....
> Yup, will do!

please note I'm not telling that we "should do"... but only "if we do" ;)

> 
> > 
> > > 
> > > Marc
> > > 
> > > 
> > > On 16/07/2018 11:14, Nathan Ciobanu wrote:
> > > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> > > >  include/drm/drm_dp_helper.h                   | 1 +
> > > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > > index 4da6e33c7fa1..4bfba65c431c 100644
> > > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > > >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> > > >  {
> > > >  	uint8_t voltage;
> > > > -	int voltage_tries, max_vswing_tries;
> > > > +	int voltage_tries, max_vswing_tries, cr_tries;
> > > >  	uint8_t link_config[2];
> > > >  	uint8_t link_bw, rate_select;
> > > >  
> > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > > >  
> > > >  	voltage_tries = 1;
> > > >  	max_vswing_tries = 0;
> > > > -	for (;;) {
> > > > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> > > >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> > > >  
> > > >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > > >  			++max_vswing_tries;
> > > >  
> > > >  	}
> > > > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > > > +		  DP_DP14_MAX_CR_TRIES);
> > > > +	return false;
> > > >  }
> > > >  
> >
Rodrigo Vivi July 16, 2018, 11:40 p.m. UTC | #7
On Mon, Jul 16, 2018 at 04:23:23PM -0700, Nathan Ciobanu wrote:
> On Mon, Jul 16, 2018 at 03:34:58PM -0700, Rodrigo Vivi wrote:
> > On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote:
> > > Limit the link training clock recovery loop to 10 failed attempts at
> > > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2.
> > 
> > Thanks... so this made me look to DP 1.3 and DP 1.4 differences here.
> > Are we already addressing all new conditions?
> > 
> > I wonder if that should be at drm level.
> > 
> > I noticed that one difference, 100us wait difference is already
> > addressed there at drm level...
> > 
> > > Some USB-C MST hubs
> > > cause us to get stuck in this loop indefinitely requesting
> > > 
> > >     voltage swing: 0, pre-emphasis level: 2
> > >     voltage swing: 1, pre-emphasis level: 2
> > >     voltage swing: 0, pre-emphasis level: 3
> > > 
> > > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
> > > never returns true and voltage_tries always gets reset to 1. The driver
> > > sends those values to the hub but the hub keeps requesting new values
> > > every time in the pattern shown above.
> > > 
> > > Changes in v2:
> > >     - updated commit message (DK, Manasi)
> > >     - defined DP_DP14_MAX_CR_TRIES (Marc)
> > >     - made the loop iterate for max 10 times (Rodrigo, Marc)
> > > 
> > > Changes in v3:
> > >     - changed error message to use DP_DP14_MAX_CR_TRIES
> > > 
> > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> > >  include/drm/drm_dp_helper.h                   | 1 +
> > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > index 4da6e33c7fa1..4bfba65c431c 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> > >  {
> > >  	uint8_t voltage;
> > > -	int voltage_tries, max_vswing_tries;
> > > +	int voltage_tries, max_vswing_tries, cr_tries;
> > >  	uint8_t link_config[2];
> > >  	uint8_t link_bw, rate_select;
> > >  
> > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  
> > >  	voltage_tries = 1;
> > >  	max_vswing_tries = 0;
> > > -	for (;;) {
> > > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> > 
> > I know that I was the on that asked you to make this global here,
> > but just now with better spec pointers and definition is that I noticed
> > that this is only for DP 1.4 spec while this code here runs for all
> > DP...  So we need to change in a way that we check the spec version somehow...
> I think the bug is with this infinite loop which is at the mercy of an external device
> and in my case I have this MST hub which appears to be DP 1.2 that triggers the
> infinite loop case. We have to limit the number of iterations and
> DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case
> shouldn't we apply the same counter to all <= "DP 1.4" devices?

ok, the infinite loop situation is really bad... but I don't believe
we are going with the right fix...
and a good indication is that your fix is for DP-1.4 while your bug is seeing on DP-1.2

> 
> -Nathan 
> > 
> > and possibly at drm level?!
> > 
> > >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> > >  
> > >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  			++max_vswing_tries;
> > >  
> > >  	}
> > > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > > +		  DP_DP14_MAX_CR_TRIES);
> > > +	return false;
> > >  }
> > >  
> > >  /*
> > > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> > > index c01564991a9f..2303ad8ed24e 100644
> > > --- a/include/drm/drm_dp_helper.h
> > > +++ b/include/drm/drm_dp_helper.h
> > > @@ -820,6 +820,7 @@
> > >  
> > >  #define DP_DP13_DPCD_REV                    0x2200
> > >  #define DP_DP13_MAX_LINK_RATE               0x2201
> > > +#define DP_DP14_MAX_CR_TRIES                10
> > >  
> > >  #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
> > >  # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
> > > -- 
> > > 1.9.1
> > > 
> >
Marc Herbert July 16, 2018, 11:51 p.m. UTC | #8
>> I think the bug is with this infinite loop which is at the mercy of an external device
>> and in my case I have this MST hub which appears to be DP 1.2 that triggers the
>> infinite loop case. We have to limit the number of iterations and
>> DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case
>> shouldn't we apply the same counter to all <= "DP 1.4" devices?
> 
> ok, the infinite loop situation is really bad... but I don't believe
> we are going with the right fix...
> and a good indication is that your fix is for DP-1.4 while your bug is seeing on DP-1.2

I thought the only reason the infinite loop fix isn't in 1.2 is just because there's
no 1.2.1 spec... (plus the naive assumption that buggy sinks don't exist)
Dhinakaran Pandiyan July 17, 2018, 10:21 p.m. UTC | #9
On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > 
> > > 
> > > I think the bug is with this infinite loop which is at the mercy
> > > of an external device
> > > and in my case I have this MST hub which appears to be DP 1.2
> > > that triggers the
> > > infinite loop case. We have to limit the number of iterations and
> > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > but in this case
> > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > ok, the infinite loop situation is really bad... but I don't
> > believe
> > we are going with the right fix...
> > and a good indication is that your fix is for DP-1.4 while your bug
> > is seeing on DP-1.2
> I thought the only reason the infinite loop fix isn't in 1.2 is just
> because there's
> no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> exist)
> 
Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
out there that keep toggling between two values there should be an
overall limit to how many times this loop gets executed. Even if this
isn't right fix for the issue at hand, the loop has to break.

Then there's the question of what the limit should be. We could use the
DP1.4 limit as a reference and apply it widely. But there's a problem
here, we have 4 vswing values and 4 pre-emphasis values. If the sink
progressively changes one variable at a time, we'll need at least 16
iterations. Nathan's patch here will prematurely error out and that
doesn't sound reasonable to impose on non DP1.4 sinks.

-DK


> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Nathan Ciobanu July 18, 2018, 1:05 a.m. UTC | #10
On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > 
> > > > 
> > > > I think the bug is with this infinite loop which is at the mercy
> > > > of an external device
> > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > that triggers the
> > > > infinite loop case. We have to limit the number of iterations and
> > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > > but in this case
> > > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > > ok, the infinite loop situation is really bad... but I don't
> > > believe
> > > we are going with the right fix...
> > > and a good indication is that your fix is for DP-1.4 while your bug
> > > is seeing on DP-1.2
> > I thought the only reason the infinite loop fix isn't in 1.2 is just
> > because there's
> > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> > exist)
> > 
> Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
> out there that keep toggling between two values there should be an
> overall limit to how many times this loop gets executed. Even if this
> isn't right fix for the issue at hand, the loop has to break.
> 
> Then there's the question of what the limit should be. We could use the
> DP1.4 limit as a reference and apply it widely. But there's a problem
> here, we have 4 vswing values and 4 pre-emphasis values. If the sink
> progressively changes one variable at a time, we'll need at least 16
> iterations. Nathan's patch here will prematurely error out and that
> doesn't sound reasonable to impose on non DP1.4 sinks.

I think it would be a max of 13 iterations since one of the vswing
values will be max_vswing and the loop will terminate at that point
without trying the other 3 preemph values for that voltage, correct?

-Nathan
> 
> -DK
> 
> 
> > 
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Dhinakaran Pandiyan July 18, 2018, 3 a.m. UTC | #11
On Tue, 2018-07-17 at 18:05 -0700, Nathan Ciobanu wrote:
> On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> > 
> > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > I think the bug is with this infinite loop which is at the
> > > > > mercy
> > > > > of an external device
> > > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > > that triggers the
> > > > > infinite loop case. We have to limit the number of iterations
> > > > > and
> > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this
> > > > > area
> > > > > but in this case
> > > > > shouldn't we apply the same counter to all <= "DP 1.4"
> > > > > devices?
> > > > ok, the infinite loop situation is really bad... but I don't
> > > > believe
> > > > we are going with the right fix...
> > > > and a good indication is that your fix is for DP-1.4 while your
> > > > bug
> > > > is seeing on DP-1.2
> > > I thought the only reason the infinite loop fix isn't in 1.2 is
> > > just
> > > because there's
> > > no 1.2.1 spec... (plus the naive assumption that buggy sinks
> > > don't
> > > exist)
> > > 
> > Irrespective of whether the sink is DP1.2 or 1.4, if there are
> > sinks
> > out there that keep toggling between two values there should be an
> > overall limit to how many times this loop gets executed. Even if
> > this
> > isn't right fix for the issue at hand, the loop has to break.
> > 
> > Then there's the question of what the limit should be. We could use
> > the
> > DP1.4 limit as a reference and apply it widely. But there's a
> > problem
> > here, we have 4 vswing values and 4 pre-emphasis values. If the
> > sink
> > progressively changes one variable at a time, we'll need at least
> > 16
> > iterations. Nathan's patch here will prematurely error out and that
> > doesn't sound reasonable to impose on non DP1.4 sinks.
> I think it would be a max of 13 iterations since one of the vswing
> values will be max_vswing and the loop will terminate at that point
> without trying the other 3 preemph values for that voltage, correct?
> 
The spec defines maximum voltage swing level as the "sum of the
VOLTAGE_SWING_LANEx and PREEMPHASIS_LEVEL_LANEx value". So, we should
be looping through all values.

Can you check if https://patchwork.freedesktop.org/patch/239520/ helps?
Also, please file a fdo bug with dmesg.

-DK



> -Nathan
> > 
> > 
> > -DK
> > 
> > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Rodrigo Vivi July 19, 2018, 5:01 p.m. UTC | #12
On Tue, Jul 17, 2018 at 06:05:51PM -0700, Nathan Ciobanu wrote:
> On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > > 
> > > > > 
> > > > > I think the bug is with this infinite loop which is at the mercy
> > > > > of an external device
> > > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > > that triggers the
> > > > > infinite loop case. We have to limit the number of iterations and
> > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > > > but in this case
> > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > > > ok, the infinite loop situation is really bad... but I don't
> > > > believe
> > > > we are going with the right fix...
> > > > and a good indication is that your fix is for DP-1.4 while your bug
> > > > is seeing on DP-1.2
> > > I thought the only reason the infinite loop fix isn't in 1.2 is just
> > > because there's
> > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> > > exist)
> > > 
> > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
> > out there that keep toggling between two values there should be an
> > overall limit to how many times this loop gets executed. Even if this
> > isn't right fix for the issue at hand, the loop has to break.
> > 
> > Then there's the question of what the limit should be. We could use the
> > DP1.4 limit as a reference and apply it widely. But there's a problem
> > here, we have 4 vswing values and 4 pre-emphasis values. If the sink
> > progressively changes one variable at a time, we'll need at least 16
> > iterations. Nathan's patch here will prematurely error out and that
> > doesn't sound reasonable to impose on non DP1.4 sinks.
> 
> I think it would be a max of 13 iterations since one of the vswing
> values will be max_vswing and the loop will terminate at that point
> without trying the other 3 preemph values for that voltage, correct?

I was talking to DK yesterday and he suggested that we should go with
a huge number for DP_1.2 and with the spec for DP_1.4. According to DK
80 was covering all combinations few times.

So you get your patch and create some max_cr_tries = dp_1.4 ? spec : 80;

or something like that.

I think I like that because infinite loop situation here is so bad
and we should avoid even if it was something else that got really wrong.

Thanks,
Rodrigo.

> 
> -Nathan
> > 
> > -DK
> > 
> > 
> > > 
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Nathan Ciobanu July 19, 2018, 6:42 p.m. UTC | #13
On Thu, Jul 19, 2018 at 10:01:41AM -0700, Rodrigo Vivi wrote:
> On Tue, Jul 17, 2018 at 06:05:51PM -0700, Nathan Ciobanu wrote:
> > On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> > > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > > > 
> > > > > > 
> > > > > > I think the bug is with this infinite loop which is at the mercy
> > > > > > of an external device
> > > > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > > > that triggers the
> > > > > > infinite loop case. We have to limit the number of iterations and
> > > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > > > > but in this case
> > > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > > > > ok, the infinite loop situation is really bad... but I don't
> > > > > believe
> > > > > we are going with the right fix...
> > > > > and a good indication is that your fix is for DP-1.4 while your bug
> > > > > is seeing on DP-1.2
> > > > I thought the only reason the infinite loop fix isn't in 1.2 is just
> > > > because there's
> > > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> > > > exist)
> > > > 
> > > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
> > > out there that keep toggling between two values there should be an
> > > overall limit to how many times this loop gets executed. Even if this
> > > isn't right fix for the issue at hand, the loop has to break.
> > > 
> > > Then there's the question of what the limit should be. We could use the
> > > DP1.4 limit as a reference and apply it widely. But there's a problem
> > > here, we have 4 vswing values and 4 pre-emphasis values. If the sink
> > > progressively changes one variable at a time, we'll need at least 16
> > > iterations. Nathan's patch here will prematurely error out and that
> > > doesn't sound reasonable to impose on non DP1.4 sinks.
> > 
> > I think it would be a max of 13 iterations since one of the vswing
> > values will be max_vswing and the loop will terminate at that point
> > without trying the other 3 preemph values for that voltage, correct?
> 
> I was talking to DK yesterday and he suggested that we should go with
> a huge number for DP_1.2 and with the spec for DP_1.4. According to DK
> 80 was covering all combinations few times.
> 
> So you get your patch and create some max_cr_tries = dp_1.4 ? spec : 80;
> 
> or something like that.
> 
> I think I like that because infinite loop situation here is so bad
> and we should avoid even if it was something else that got really wrong.
I just sent v4 to do that. Thanks!
-Nathan
> 
> Thanks,
> Rodrigo.
> 
> > 
> > -Nathan
> > > 
> > > -DK
> > > 
> > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
index 4da6e33c7fa1..4bfba65c431c 100644
--- a/drivers/gpu/drm/i915/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
@@ -129,7 +129,7 @@  static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
 intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
 {
 	uint8_t voltage;
-	int voltage_tries, max_vswing_tries;
+	int voltage_tries, max_vswing_tries, cr_tries;
 	uint8_t link_config[2];
 	uint8_t link_bw, rate_select;
 
@@ -172,7 +172,7 @@  static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
 
 	voltage_tries = 1;
 	max_vswing_tries = 0;
-	for (;;) {
+	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
 		uint8_t link_status[DP_LINK_STATUS_SIZE];
 
 		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
@@ -216,6 +216,9 @@  static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
 			++max_vswing_tries;
 
 	}
+	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
+		  DP_DP14_MAX_CR_TRIES);
+	return false;
 }
 
 /*
diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index c01564991a9f..2303ad8ed24e 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -820,6 +820,7 @@ 
 
 #define DP_DP13_DPCD_REV                    0x2200
 #define DP_DP13_MAX_LINK_RATE               0x2201
+#define DP_DP14_MAX_CR_TRIES                10
 
 #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
 # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */