Message ID | 1531764885-18185-1-git-send-email-nathan.d.ciobanu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
While the shortness of v3 is really nice, I think it's rather a good opportunity to make much clearer (as a separate, no functional change patch?) its existing terminating condition(s) and what the existing loop iterates on. I mean it's painful and risky enough to _combine multiple counters in the same loop_, so at least these different counters should be _invidually_ crystal-clear with enough comments. For instance let's pretend there are 4 possible voltage values and that each value is tried 4 times - then the last value will never be tried! Can this ever happen? If not then why not? Not even with a buggy sink? If it can happen then is it fine to give up before trying some values? Is it compliant with the spec? Etc. This should incidentally help clarify why and how the current logic allows infinite loops. BTW "max_vswing_tries" looks like a boolean, correct? If correct then please remove this confusing "increment from false to true": - if (max_vswing_tries == 1) + if (max_vswing_tries) - max_vswing_tries++; + max_vwing_tries = true; Marc On 16/07/2018 11:14, Nathan Ciobanu wrote: > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > include/drm/drm_dp_helper.h | 1 + > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > index 4da6e33c7fa1..4bfba65c431c 100644 > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > { > uint8_t voltage; > - int voltage_tries, max_vswing_tries; > + int voltage_tries, max_vswing_tries, cr_tries; > uint8_t link_config[2]; > uint8_t link_bw, rate_select; > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > voltage_tries = 1; > max_vswing_tries = 0; > - for (;;) { > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > ++max_vswing_tries; > > } > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > + DP_DP14_MAX_CR_TRIES); > + return false; > } >
On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote: > > While the shortness of v3 is really nice, I think it's rather a good > opportunity to make much clearer (as a separate, no functional change > patch?) its existing terminating condition(s) and what the existing loop > iterates on. I mean it's painful and risky enough to _combine multiple > counters in the same loop_, so at least these different counters should be > _invidually_ crystal-clear with enough comments. For instance let's pretend > there are 4 possible voltage values and that each value is tried 4 times - > then the last value will never be tried! Can this ever happen? If not then > why not? Not even with a buggy sink? If it can happen then is it fine to > give up before trying some values? Is it compliant with the spec? Etc. hm.. it seems that that code has room for improvements to make it more clear and easier to map with the spec... but yeap, separated patches. > > This should incidentally help clarify why and how the current logic allows > infinite loops. > > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please > remove this confusing "increment from false to true": > > - if (max_vswing_tries == 1) > + if (max_vswing_tries) > > - max_vswing_tries++; > + max_vwing_tries = true; if we change to boolean it is better to really change the type and remove "_tries" from the name.... > > Marc > > > On 16/07/2018 11:14, Nathan Ciobanu wrote: > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > > --- > > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > > include/drm/drm_dp_helper.h | 1 + > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > > index 4da6e33c7fa1..4bfba65c431c 100644 > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > > { > > uint8_t voltage; > > - int voltage_tries, max_vswing_tries; > > + int voltage_tries, max_vswing_tries, cr_tries; > > uint8_t link_config[2]; > > uint8_t link_bw, rate_select; > > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > voltage_tries = 1; > > max_vswing_tries = 0; > > - for (;;) { > > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { > > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > ++max_vswing_tries; > > > > } > > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > > + DP_DP14_MAX_CR_TRIES); > > + return false; > > } > >
On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote: > Limit the link training clock recovery loop to 10 failed attempts at > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2. Thanks... so this made me look to DP 1.3 and DP 1.4 differences here. Are we already addressing all new conditions? I wonder if that should be at drm level. I noticed that one difference, 100us wait difference is already addressed there at drm level... > Some USB-C MST hubs > cause us to get stuck in this loop indefinitely requesting > > voltage swing: 0, pre-emphasis level: 2 > voltage swing: 1, pre-emphasis level: 2 > voltage swing: 0, pre-emphasis level: 3 > > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok() > never returns true and voltage_tries always gets reset to 1. The driver > sends those values to the hub but the hub keeps requesting new values > every time in the pattern shown above. > > Changes in v2: > - updated commit message (DK, Manasi) > - defined DP_DP14_MAX_CR_TRIES (Marc) > - made the loop iterate for max 10 times (Rodrigo, Marc) > > Changes in v3: > - changed error message to use DP_DP14_MAX_CR_TRIES > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > include/drm/drm_dp_helper.h | 1 + > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > index 4da6e33c7fa1..4bfba65c431c 100644 > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > { > uint8_t voltage; > - int voltage_tries, max_vswing_tries; > + int voltage_tries, max_vswing_tries, cr_tries; > uint8_t link_config[2]; > uint8_t link_bw, rate_select; > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > voltage_tries = 1; > max_vswing_tries = 0; > - for (;;) { > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { I know that I was the on that asked you to make this global here, but just now with better spec pointers and definition is that I noticed that this is only for DP 1.4 spec while this code here runs for all DP... So we need to change in a way that we check the spec version somehow... and possibly at drm level?! > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > ++max_vswing_tries; > > } > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > + DP_DP14_MAX_CR_TRIES); > + return false; > } > > /* > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h > index c01564991a9f..2303ad8ed24e 100644 > --- a/include/drm/drm_dp_helper.h > +++ b/include/drm/drm_dp_helper.h > @@ -820,6 +820,7 @@ > > #define DP_DP13_DPCD_REV 0x2200 > #define DP_DP13_MAX_LINK_RATE 0x2201 > +#define DP_DP14_MAX_CR_TRIES 10 > > #define DP_DPRX_FEATURE_ENUMERATION_LIST 0x2210 /* DP 1.3 */ > # define DP_GTC_CAP (1 << 0) /* DP 1.3 */ > -- > 1.9.1 >
On Mon, Jul 16, 2018 at 03:34:58PM -0700, Rodrigo Vivi wrote: > On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote: > > Limit the link training clock recovery loop to 10 failed attempts at > > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2. > > Thanks... so this made me look to DP 1.3 and DP 1.4 differences here. > Are we already addressing all new conditions? > > I wonder if that should be at drm level. > > I noticed that one difference, 100us wait difference is already > addressed there at drm level... > > > Some USB-C MST hubs > > cause us to get stuck in this loop indefinitely requesting > > > > voltage swing: 0, pre-emphasis level: 2 > > voltage swing: 1, pre-emphasis level: 2 > > voltage swing: 0, pre-emphasis level: 3 > > > > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok() > > never returns true and voltage_tries always gets reset to 1. The driver > > sends those values to the hub but the hub keeps requesting new values > > every time in the pattern shown above. > > > > Changes in v2: > > - updated commit message (DK, Manasi) > > - defined DP_DP14_MAX_CR_TRIES (Marc) > > - made the loop iterate for max 10 times (Rodrigo, Marc) > > > > Changes in v3: > > - changed error message to use DP_DP14_MAX_CR_TRIES > > > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > > --- > > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > > include/drm/drm_dp_helper.h | 1 + > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > > index 4da6e33c7fa1..4bfba65c431c 100644 > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > > { > > uint8_t voltage; > > - int voltage_tries, max_vswing_tries; > > + int voltage_tries, max_vswing_tries, cr_tries; > > uint8_t link_config[2]; > > uint8_t link_bw, rate_select; > > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > voltage_tries = 1; > > max_vswing_tries = 0; > > - for (;;) { > > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { > > I know that I was the on that asked you to make this global here, > but just now with better spec pointers and definition is that I noticed > that this is only for DP 1.4 spec while this code here runs for all > DP... So we need to change in a way that we check the spec version somehow... I think the bug is with this infinite loop which is at the mercy of an external device and in my case I have this MST hub which appears to be DP 1.2 that triggers the infinite loop case. We have to limit the number of iterations and DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case shouldn't we apply the same counter to all <= "DP 1.4" devices? -Nathan > > and possibly at drm level?! > > > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > ++max_vswing_tries; > > > > } > > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > > + DP_DP14_MAX_CR_TRIES); > > + return false; > > } > > > > /* > > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h > > index c01564991a9f..2303ad8ed24e 100644 > > --- a/include/drm/drm_dp_helper.h > > +++ b/include/drm/drm_dp_helper.h > > @@ -820,6 +820,7 @@ > > > > #define DP_DP13_DPCD_REV 0x2200 > > #define DP_DP13_MAX_LINK_RATE 0x2201 > > +#define DP_DP14_MAX_CR_TRIES 10 > > > > #define DP_DPRX_FEATURE_ENUMERATION_LIST 0x2210 /* DP 1.3 */ > > # define DP_GTC_CAP (1 << 0) /* DP 1.3 */ > > -- > > 1.9.1 > > >
On Mon, Jul 16, 2018 at 03:30:49PM -0700, Rodrigo Vivi wrote: > On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote: > > > > While the shortness of v3 is really nice, I think it's rather a good > > opportunity to make much clearer (as a separate, no functional change > > patch?) its existing terminating condition(s) and what the existing loop > > iterates on. I mean it's painful and risky enough to _combine multiple > > counters in the same loop_, so at least these different counters should be > > _invidually_ crystal-clear with enough comments. For instance let's pretend > > there are 4 possible voltage values and that each value is tried 4 times - > > then the last value will never be tried! Can this ever happen? If not then > > why not? Not even with a buggy sink? If it can happen then is it fine to > > give up before trying some values? Is it compliant with the spec? Etc. > > hm.. it seems that that code has room for improvements to make it more > clear and easier to map with the spec... > but yeap, separated patches. Do we like to add macros for the other counters? I can try to clarify this a bit. > > > > > This should incidentally help clarify why and how the current logic allows > > infinite loops. > > > > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please > > remove this confusing "increment from false to true": > > > > - if (max_vswing_tries == 1) > > + if (max_vswing_tries) > > > > - max_vswing_tries++; > > + max_vwing_tries = true; > > if we change to boolean it is better to really change the type > and remove "_tries" from the name.... Yup, will do! > > > > > Marc > > > > > > On 16/07/2018 11:14, Nathan Ciobanu wrote: > > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > > > include/drm/drm_dp_helper.h | 1 + > > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > > > index 4da6e33c7fa1..4bfba65c431c 100644 > > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > > > { > > > uint8_t voltage; > > > - int voltage_tries, max_vswing_tries; > > > + int voltage_tries, max_vswing_tries, cr_tries; > > > uint8_t link_config[2]; > > > uint8_t link_bw, rate_select; > > > > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > > > voltage_tries = 1; > > > max_vswing_tries = 0; > > > - for (;;) { > > > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { > > > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > > > > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > ++max_vswing_tries; > > > > > > } > > > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > > > + DP_DP14_MAX_CR_TRIES); > > > + return false; > > > } > > > >
On Mon, Jul 16, 2018 at 04:27:47PM -0700, Nathan Ciobanu wrote: > On Mon, Jul 16, 2018 at 03:30:49PM -0700, Rodrigo Vivi wrote: > > On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote: > > > > > > While the shortness of v3 is really nice, I think it's rather a good > > > opportunity to make much clearer (as a separate, no functional change > > > patch?) its existing terminating condition(s) and what the existing loop > > > iterates on. I mean it's painful and risky enough to _combine multiple > > > counters in the same loop_, so at least these different counters should be > > > _invidually_ crystal-clear with enough comments. For instance let's pretend > > > there are 4 possible voltage values and that each value is tried 4 times - > > > then the last value will never be tried! Can this ever happen? If not then > > > why not? Not even with a buggy sink? If it can happen then is it fine to > > > give up before trying some values? Is it compliant with the spec? Etc. > > > > hm.. it seems that that code has room for improvements to make it more > > clear and easier to map with the spec... > > but yeap, separated patches. > Do we like to add macros for the other counters? I can try to clarify this a bit. > > > > > > > > > This should incidentally help clarify why and how the current logic allows > > > infinite loops. > > > > > > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please > > > remove this confusing "increment from false to true": > > > > > > - if (max_vswing_tries == 1) > > > + if (max_vswing_tries) > > > > > > - max_vswing_tries++; > > > + max_vwing_tries = true; > > > > if we change to boolean it is better to really change the type > > and remove "_tries" from the name.... > Yup, will do! please note I'm not telling that we "should do"... but only "if we do" ;) > > > > > > > > > Marc > > > > > > > > > On 16/07/2018 11:14, Nathan Ciobanu wrote: > > > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > > > > --- > > > > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > > > > include/drm/drm_dp_helper.h | 1 + > > > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > > > > index 4da6e33c7fa1..4bfba65c431c 100644 > > > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > > > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > > > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > > > > { > > > > uint8_t voltage; > > > > - int voltage_tries, max_vswing_tries; > > > > + int voltage_tries, max_vswing_tries, cr_tries; > > > > uint8_t link_config[2]; > > > > uint8_t link_bw, rate_select; > > > > > > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > > > > > voltage_tries = 1; > > > > max_vswing_tries = 0; > > > > - for (;;) { > > > > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { > > > > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > > > > > > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > > > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > ++max_vswing_tries; > > > > > > > > } > > > > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > > > > + DP_DP14_MAX_CR_TRIES); > > > > + return false; > > > > } > > > > > >
On Mon, Jul 16, 2018 at 04:23:23PM -0700, Nathan Ciobanu wrote: > On Mon, Jul 16, 2018 at 03:34:58PM -0700, Rodrigo Vivi wrote: > > On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote: > > > Limit the link training clock recovery loop to 10 failed attempts at > > > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2. > > > > Thanks... so this made me look to DP 1.3 and DP 1.4 differences here. > > Are we already addressing all new conditions? > > > > I wonder if that should be at drm level. > > > > I noticed that one difference, 100us wait difference is already > > addressed there at drm level... > > > > > Some USB-C MST hubs > > > cause us to get stuck in this loop indefinitely requesting > > > > > > voltage swing: 0, pre-emphasis level: 2 > > > voltage swing: 1, pre-emphasis level: 2 > > > voltage swing: 0, pre-emphasis level: 3 > > > > > > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok() > > > never returns true and voltage_tries always gets reset to 1. The driver > > > sends those values to the hub but the hub keeps requesting new values > > > every time in the pattern shown above. > > > > > > Changes in v2: > > > - updated commit message (DK, Manasi) > > > - defined DP_DP14_MAX_CR_TRIES (Marc) > > > - made the loop iterate for max 10 times (Rodrigo, Marc) > > > > > > Changes in v3: > > > - changed error message to use DP_DP14_MAX_CR_TRIES > > > > > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- > > > include/drm/drm_dp_helper.h | 1 + > > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c > > > index 4da6e33c7fa1..4bfba65c431c 100644 > > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c > > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c > > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) > > > { > > > uint8_t voltage; > > > - int voltage_tries, max_vswing_tries; > > > + int voltage_tries, max_vswing_tries, cr_tries; > > > uint8_t link_config[2]; > > > uint8_t link_bw, rate_select; > > > > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > > > > voltage_tries = 1; > > > max_vswing_tries = 0; > > > - for (;;) { > > > + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { > > > > I know that I was the on that asked you to make this global here, > > but just now with better spec pointers and definition is that I noticed > > that this is only for DP 1.4 spec while this code here runs for all > > DP... So we need to change in a way that we check the spec version somehow... > I think the bug is with this infinite loop which is at the mercy of an external device > and in my case I have this MST hub which appears to be DP 1.2 that triggers the > infinite loop case. We have to limit the number of iterations and > DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case > shouldn't we apply the same counter to all <= "DP 1.4" devices? ok, the infinite loop situation is really bad... but I don't believe we are going with the right fix... and a good indication is that your fix is for DP-1.4 while your bug is seeing on DP-1.2 > > -Nathan > > > > and possibly at drm level?! > > > > > uint8_t link_status[DP_LINK_STATUS_SIZE]; > > > > > > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) > > > ++max_vswing_tries; > > > > > > } > > > + DRM_ERROR("Failed clock recovery %d times, giving up!\n", > > > + DP_DP14_MAX_CR_TRIES); > > > + return false; > > > } > > > > > > /* > > > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h > > > index c01564991a9f..2303ad8ed24e 100644 > > > --- a/include/drm/drm_dp_helper.h > > > +++ b/include/drm/drm_dp_helper.h > > > @@ -820,6 +820,7 @@ > > > > > > #define DP_DP13_DPCD_REV 0x2200 > > > #define DP_DP13_MAX_LINK_RATE 0x2201 > > > +#define DP_DP14_MAX_CR_TRIES 10 > > > > > > #define DP_DPRX_FEATURE_ENUMERATION_LIST 0x2210 /* DP 1.3 */ > > > # define DP_GTC_CAP (1 << 0) /* DP 1.3 */ > > > -- > > > 1.9.1 > > > > >
>> I think the bug is with this infinite loop which is at the mercy of an external device >> and in my case I have this MST hub which appears to be DP 1.2 that triggers the >> infinite loop case. We have to limit the number of iterations and >> DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case >> shouldn't we apply the same counter to all <= "DP 1.4" devices? > > ok, the infinite loop situation is really bad... but I don't believe > we are going with the right fix... > and a good indication is that your fix is for DP-1.4 while your bug is seeing on DP-1.2 I thought the only reason the infinite loop fix isn't in 1.2 is just because there's no 1.2.1 spec... (plus the naive assumption that buggy sinks don't exist)
On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote: > > > > > > > > I think the bug is with this infinite loop which is at the mercy > > > of an external device > > > and in my case I have this MST hub which appears to be DP 1.2 > > > that triggers the > > > infinite loop case. We have to limit the number of iterations and > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area > > > but in this case > > > shouldn't we apply the same counter to all <= "DP 1.4" devices? > > ok, the infinite loop situation is really bad... but I don't > > believe > > we are going with the right fix... > > and a good indication is that your fix is for DP-1.4 while your bug > > is seeing on DP-1.2 > I thought the only reason the infinite loop fix isn't in 1.2 is just > because there's > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't > exist) > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks out there that keep toggling between two values there should be an overall limit to how many times this loop gets executed. Even if this isn't right fix for the issue at hand, the loop has to break. Then there's the question of what the limit should be. We could use the DP1.4 limit as a reference and apply it widely. But there's a problem here, we have 4 vswing values and 4 pre-emphasis values. If the sink progressively changes one variable at a time, we'll need at least 16 iterations. Nathan's patch here will prematurely error out and that doesn't sound reasonable to impose on non DP1.4 sinks. -DK > > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote: > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote: > > > > > > > > > > > I think the bug is with this infinite loop which is at the mercy > > > > of an external device > > > > and in my case I have this MST hub which appears to be DP 1.2 > > > > that triggers the > > > > infinite loop case. We have to limit the number of iterations and > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area > > > > but in this case > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices? > > > ok, the infinite loop situation is really bad... but I don't > > > believe > > > we are going with the right fix... > > > and a good indication is that your fix is for DP-1.4 while your bug > > > is seeing on DP-1.2 > > I thought the only reason the infinite loop fix isn't in 1.2 is just > > because there's > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't > > exist) > > > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks > out there that keep toggling between two values there should be an > overall limit to how many times this loop gets executed. Even if this > isn't right fix for the issue at hand, the loop has to break. > > Then there's the question of what the limit should be. We could use the > DP1.4 limit as a reference and apply it widely. But there's a problem > here, we have 4 vswing values and 4 pre-emphasis values. If the sink > progressively changes one variable at a time, we'll need at least 16 > iterations. Nathan's patch here will prematurely error out and that > doesn't sound reasonable to impose on non DP1.4 sinks. I think it would be a max of 13 iterations since one of the vswing values will be max_vswing and the loop will terminate at that point without trying the other 3 preemph values for that voltage, correct? -Nathan > > -DK > > > > > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx >
On Tue, 2018-07-17 at 18:05 -0700, Nathan Ciobanu wrote: > On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote: > > > > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > I think the bug is with this infinite loop which is at the > > > > > mercy > > > > > of an external device > > > > > and in my case I have this MST hub which appears to be DP 1.2 > > > > > that triggers the > > > > > infinite loop case. We have to limit the number of iterations > > > > > and > > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this > > > > > area > > > > > but in this case > > > > > shouldn't we apply the same counter to all <= "DP 1.4" > > > > > devices? > > > > ok, the infinite loop situation is really bad... but I don't > > > > believe > > > > we are going with the right fix... > > > > and a good indication is that your fix is for DP-1.4 while your > > > > bug > > > > is seeing on DP-1.2 > > > I thought the only reason the infinite loop fix isn't in 1.2 is > > > just > > > because there's > > > no 1.2.1 spec... (plus the naive assumption that buggy sinks > > > don't > > > exist) > > > > > Irrespective of whether the sink is DP1.2 or 1.4, if there are > > sinks > > out there that keep toggling between two values there should be an > > overall limit to how many times this loop gets executed. Even if > > this > > isn't right fix for the issue at hand, the loop has to break. > > > > Then there's the question of what the limit should be. We could use > > the > > DP1.4 limit as a reference and apply it widely. But there's a > > problem > > here, we have 4 vswing values and 4 pre-emphasis values. If the > > sink > > progressively changes one variable at a time, we'll need at least > > 16 > > iterations. Nathan's patch here will prematurely error out and that > > doesn't sound reasonable to impose on non DP1.4 sinks. > I think it would be a max of 13 iterations since one of the vswing > values will be max_vswing and the loop will terminate at that point > without trying the other 3 preemph values for that voltage, correct? > The spec defines maximum voltage swing level as the "sum of the VOLTAGE_SWING_LANEx and PREEMPHASIS_LEVEL_LANEx value". So, we should be looping through all values. Can you check if https://patchwork.freedesktop.org/patch/239520/ helps? Also, please file a fdo bug with dmesg. -DK > -Nathan > > > > > > -DK > > > > > > > > > > > > > > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, Jul 17, 2018 at 06:05:51PM -0700, Nathan Ciobanu wrote: > On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote: > > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote: > > > > > > > > > > > > > > I think the bug is with this infinite loop which is at the mercy > > > > > of an external device > > > > > and in my case I have this MST hub which appears to be DP 1.2 > > > > > that triggers the > > > > > infinite loop case. We have to limit the number of iterations and > > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area > > > > > but in this case > > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices? > > > > ok, the infinite loop situation is really bad... but I don't > > > > believe > > > > we are going with the right fix... > > > > and a good indication is that your fix is for DP-1.4 while your bug > > > > is seeing on DP-1.2 > > > I thought the only reason the infinite loop fix isn't in 1.2 is just > > > because there's > > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't > > > exist) > > > > > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks > > out there that keep toggling between two values there should be an > > overall limit to how many times this loop gets executed. Even if this > > isn't right fix for the issue at hand, the loop has to break. > > > > Then there's the question of what the limit should be. We could use the > > DP1.4 limit as a reference and apply it widely. But there's a problem > > here, we have 4 vswing values and 4 pre-emphasis values. If the sink > > progressively changes one variable at a time, we'll need at least 16 > > iterations. Nathan's patch here will prematurely error out and that > > doesn't sound reasonable to impose on non DP1.4 sinks. > > I think it would be a max of 13 iterations since one of the vswing > values will be max_vswing and the loop will terminate at that point > without trying the other 3 preemph values for that voltage, correct? I was talking to DK yesterday and he suggested that we should go with a huge number for DP_1.2 and with the spec for DP_1.4. According to DK 80 was covering all combinations few times. So you get your patch and create some max_cr_tries = dp_1.4 ? spec : 80; or something like that. I think I like that because infinite loop situation here is so bad and we should avoid even if it was something else that got really wrong. Thanks, Rodrigo. > > -Nathan > > > > -DK > > > > > > > > > > > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Jul 19, 2018 at 10:01:41AM -0700, Rodrigo Vivi wrote: > On Tue, Jul 17, 2018 at 06:05:51PM -0700, Nathan Ciobanu wrote: > > On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote: > > > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote: > > > > > > > > > > > > > > > > > I think the bug is with this infinite loop which is at the mercy > > > > > > of an external device > > > > > > and in my case I have this MST hub which appears to be DP 1.2 > > > > > > that triggers the > > > > > > infinite loop case. We have to limit the number of iterations and > > > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area > > > > > > but in this case > > > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices? > > > > > ok, the infinite loop situation is really bad... but I don't > > > > > believe > > > > > we are going with the right fix... > > > > > and a good indication is that your fix is for DP-1.4 while your bug > > > > > is seeing on DP-1.2 > > > > I thought the only reason the infinite loop fix isn't in 1.2 is just > > > > because there's > > > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't > > > > exist) > > > > > > > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks > > > out there that keep toggling between two values there should be an > > > overall limit to how many times this loop gets executed. Even if this > > > isn't right fix for the issue at hand, the loop has to break. > > > > > > Then there's the question of what the limit should be. We could use the > > > DP1.4 limit as a reference and apply it widely. But there's a problem > > > here, we have 4 vswing values and 4 pre-emphasis values. If the sink > > > progressively changes one variable at a time, we'll need at least 16 > > > iterations. Nathan's patch here will prematurely error out and that > > > doesn't sound reasonable to impose on non DP1.4 sinks. > > > > I think it would be a max of 13 iterations since one of the vswing > > values will be max_vswing and the loop will terminate at that point > > without trying the other 3 preemph values for that voltage, correct? > > I was talking to DK yesterday and he suggested that we should go with > a huge number for DP_1.2 and with the spec for DP_1.4. According to DK > 80 was covering all combinations few times. > > So you get your patch and create some max_cr_tries = dp_1.4 ? spec : 80; > > or something like that. > > I think I like that because infinite loop situation here is so bad > and we should avoid even if it was something else that got really wrong. I just sent v4 to do that. Thanks! -Nathan > > Thanks, > Rodrigo. > > > > > -Nathan > > > > > > -DK > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Intel-gfx mailing list > > > > Intel-gfx@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx >
diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c index 4da6e33c7fa1..4bfba65c431c 100644 --- a/drivers/gpu/drm/i915/intel_dp_link_training.c +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp) { uint8_t voltage; - int voltage_tries, max_vswing_tries; + int voltage_tries, max_vswing_tries, cr_tries; uint8_t link_config[2]; uint8_t link_bw, rate_select; @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) voltage_tries = 1; max_vswing_tries = 0; - for (;;) { + for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) { uint8_t link_status[DP_LINK_STATUS_SIZE]; drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp) ++max_vswing_tries; } + DRM_ERROR("Failed clock recovery %d times, giving up!\n", + DP_DP14_MAX_CR_TRIES); + return false; } /* diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h index c01564991a9f..2303ad8ed24e 100644 --- a/include/drm/drm_dp_helper.h +++ b/include/drm/drm_dp_helper.h @@ -820,6 +820,7 @@ #define DP_DP13_DPCD_REV 0x2200 #define DP_DP13_MAX_LINK_RATE 0x2201 +#define DP_DP14_MAX_CR_TRIES 10 #define DP_DPRX_FEATURE_ENUMERATION_LIST 0x2210 /* DP 1.3 */ # define DP_GTC_CAP (1 << 0) /* DP 1.3 */
Limit the link training clock recovery loop to 10 failed attempts at LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2. Some USB-C MST hubs cause us to get stuck in this loop indefinitely requesting voltage swing: 0, pre-emphasis level: 2 voltage swing: 1, pre-emphasis level: 2 voltage swing: 0, pre-emphasis level: 3 over and over: max_vswing is never reached, drm_dp_clock_recovery_ok() never returns true and voltage_tries always gets reset to 1. The driver sends those values to the hub but the hub keeps requesting new values every time in the pattern shown above. Changes in v2: - updated commit message (DK, Manasi) - defined DP_DP14_MAX_CR_TRIES (Marc) - made the loop iterate for max 10 times (Rodrigo, Marc) Changes in v3: - changed error message to use DP_DP14_MAX_CR_TRIES Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com> --- drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++-- include/drm/drm_dp_helper.h | 1 + 2 files changed, 6 insertions(+), 2 deletions(-)