diff mbox

drm/dp: Power cycle display if LINK_ADDRESS fails.

Message ID 20171221063624.2309-1-dhinakaran.pandiyan@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dhinakaran Pandiyan Dec. 21, 2017, 6:36 a.m. UTC
Occasionally there are LINK_ADDRESS sideband messages timing out with the
Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These
failures lead to the display not coming up on boot. Power cycling the port
corresponding to the MST monitor's branch device and resending the message
fixes the issue. I am not entirely sure if this is specific to my setup.
However, as the power state is toggled conditionally on LINK_ADDRESS
timeouts, this should not affect the working cases.

Cc: Lyude <lyude@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Jani Nikula Dec. 21, 2017, 6:53 a.m. UTC | #1
On Wed, 20 Dec 2017, Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> wrote:
> Occasionally there are LINK_ADDRESS sideband messages timing out with the
> Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These
> failures lead to the display not coming up on boot. Power cycling the port
> corresponding to the MST monitor's branch device and resending the message
> fixes the issue. I am not entirely sure if this is specific to my setup.
> However, as the power state is toggled conditionally on LINK_ADDRESS
> timeouts, this should not affect the working cases.

With stuff like this, I always wonder if catering for a failing setup
blocks us from improving working setups, because once we add this, we
can't regress it. For example, is there a valid scenario where we'd want
to fail fast here, instead of retrying?

Some nits below.

> Cc: Lyude <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 70dcfa58d3c2..e06defcdcf18 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
>  	int len;
>  	struct drm_dp_sideband_msg_tx *txmsg;
>  	int ret;
> +	int attempts = 5;
>  
> -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
>  	if (!txmsg)
>  		return;
>  
> @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
>  			}
>  			(*mgr->cbs->hotplug)(mgr);
>  		}
> +	} else if (attempts--) {

You'll end up doing (attempts + 1) attempts, including the first one.

> +		kfree(txmsg);

How about memset(txmsg, 0, sizoef(*txmsg)); here and move the goto label
down to avoid repeated allocations?

> +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> +					     false);
> +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> +					     true);
> +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);

Maybe do the debug message before you power down/up?

BR,
Jani.

> +		goto retry;
>  	} else {
>  		mstb->link_address_sent = false;
> -		DRM_DEBUG_KMS("link address failed %d\n", ret);
> +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);
>  	}
>  
>  	kfree(txmsg);
Navare, Manasi Dec. 21, 2017, 6:52 p.m. UTC | #2
On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote:
> Occasionally there are LINK_ADDRESS sideband messages timing out with the
> Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These
> failures lead to the display not coming up on boot. Power cycling the port
> corresponding to the MST monitor's branch device and resending the message
> fixes the issue. I am not entirely sure if this is specific to my setup.
> However, as the power state is toggled conditionally on LINK_ADDRESS
> timeouts, this should not affect the working cases.
> 

> Cc: Lyude <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 70dcfa58d3c2..e06defcdcf18 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
>  	int len;
>  	struct drm_dp_sideband_msg_tx *txmsg;
>  	int ret;
> +	int attempts = 5;
>

Does the spec say this should be retried 5 times or is this just an
experimental number?
We have such magical numbers for retries all over the DP code and that makes debugging
harder later, so atleast a comment of why its 5 would help.
  
> -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
>  	if (!txmsg)
>  		return;
>  
> @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
>  			}
>  			(*mgr->cbs->hotplug)(mgr);
>  		}
> +	} else if (attempts--) {

This should be --attempts if you want the total attempts to be 5

Manasi

> +		kfree(txmsg);
> +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> +					     false);
> +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> +					     true);
> +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);
> +		goto retry;
>  	} else {
>  		mstb->link_address_sent = false;
> -		DRM_DEBUG_KMS("link address failed %d\n", ret);
> +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);
>  	}
>  
>  	kfree(txmsg);
> -- 
> 2.11.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Dhinakaran Pandiyan Dec. 22, 2017, 12:48 a.m. UTC | #3
On Thu, 2017-12-21 at 08:53 +0200, Jani Nikula wrote:
> On Wed, 20 Dec 2017, Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> wrote:

> > Occasionally there are LINK_ADDRESS sideband messages timing out with the

> > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These

> > failures lead to the display not coming up on boot. Power cycling the port

> > corresponding to the MST monitor's branch device and resending the message

> > fixes the issue. I am not entirely sure if this is specific to my setup.

> > However, as the power state is toggled conditionally on LINK_ADDRESS

> > timeouts, this should not affect the working cases.

> 

> With stuff like this, I always wonder if catering for a failing setup

> blocks us from improving working setups, because once we add this, we

> can't regress it. For example, is there a valid scenario where we'd want

> to fail fast here, instead of retrying?


Link address failure would result in not probing a branch device that
already has been detected. I guess the fail fast case will be applicable
if the said device was not really present but the parent branch told us
otherwise.
 
> 

> Some nits below.

> 

> > Cc: Lyude <lyude@redhat.com>

> > Cc: Dave Airlie <airlied@redhat.com>

> > Cc: Jani Nikula <jani.nikula@intel.com>

> > Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>

> > ---

> >  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--

> >  1 file changed, 11 insertions(+), 2 deletions(-)

> >

> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c

> > index 70dcfa58d3c2..e06defcdcf18 100644

> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c

> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c

> > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,

> >  	int len;

> >  	struct drm_dp_sideband_msg_tx *txmsg;

> >  	int ret;

> > +	int attempts = 5;

> >  

> > -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);

> > +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);

> >  	if (!txmsg)

> >  		return;

> >  

> > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,

> >  			}

> >  			(*mgr->cbs->hotplug)(mgr);

> >  		}

> > +	} else if (attempts--) {

> 

> You'll end up doing (attempts + 1) attempts, including the first one.

Yeah, that is what I intended to do :) I renamed it from 'retry' to
'attempt' at the last moment, which made it a bit confusing I suppose.


I am stressing testing my setup more to see how well this recovery works
and update this patch.

 

> 

> > +		kfree(txmsg);

> 

> How about memset(txmsg, 0, sizoef(*txmsg)); here and move the goto label

> down to avoid repeated allocations?

Absolutely.

> 

> > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,

> > +					     false);

> > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,

> > +					     true);

> > +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);

> 

> Maybe do the debug message before you power down/up?

Ok.
> 

> BR,

> Jani.

> 

> > +		goto retry;

> >  	} else {

> >  		mstb->link_address_sent = false;

> > -		DRM_DEBUG_KMS("link address failed %d\n", ret);

> > +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);

> >  	}

> >  

> >  	kfree(txmsg);

>
Dhinakaran Pandiyan Dec. 22, 2017, 1:06 a.m. UTC | #4
On Thu, 2017-12-21 at 10:52 -0800, Manasi Navare wrote:
> On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote:

> > Occasionally there are LINK_ADDRESS sideband messages timing out with the

> > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These

> > failures lead to the display not coming up on boot. Power cycling the port

> > corresponding to the MST monitor's branch device and resending the message

> > fixes the issue. I am not entirely sure if this is specific to my setup.

> > However, as the power state is toggled conditionally on LINK_ADDRESS

> > timeouts, this should not affect the working cases.

> > 

> 

> > Cc: Lyude <lyude@redhat.com>

> > Cc: Dave Airlie <airlied@redhat.com>

> > Cc: Jani Nikula <jani.nikula@intel.com>

> > Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>

> > ---

> >  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--

> >  1 file changed, 11 insertions(+), 2 deletions(-)

> > 

> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c

> > index 70dcfa58d3c2..e06defcdcf18 100644

> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c

> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c

> > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,

> >  	int len;

> >  	struct drm_dp_sideband_msg_tx *txmsg;

> >  	int ret;

> > +	int attempts = 5;

> >

> 

> Does the spec say this should be retried 5 times or is this just an

> experimental number?


The spec. does not say how many times to retry, but it does say the
source is responsible for retrying. 

> We have such magical numbers for retries all over the DP code and that makes debugging

> harder later, so atleast a comment of why its 5 would help.


Takes about 22 seconds from boot to complete 5 retries on my SKL, I
think that's enough trying from the kernel side before pulling out the
DP cable makes sense :)

>   

> > -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);

> > +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);

> >  	if (!txmsg)

> >  		return;

> >  

> > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,

> >  			}

> >  			(*mgr->cbs->hotplug)(mgr);

> >  		}

> > +	} else if (attempts--) {

> 

> This should be --attempts if you want the total attempts to be 5

I don't.
> 

> Manasi

> 

> > +		kfree(txmsg);

> > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,

> > +					     false);

> > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,

> > +					     true);

> > +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);

> > +		goto retry;

> >  	} else {

> >  		mstb->link_address_sent = false;

> > -		DRM_DEBUG_KMS("link address failed %d\n", ret);

> > +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);

> >  	}

> >  

> >  	kfree(txmsg);

> > -- 

> > 2.11.0

> > 

> > _______________________________________________

> > dri-devel mailing list

> > dri-devel@lists.freedesktop.org

> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Navare, Manasi Dec. 22, 2017, 1:32 a.m. UTC | #5
On Thu, Dec 21, 2017 at 05:06:22PM -0800, Pandiyan, Dhinakaran wrote:
> On Thu, 2017-12-21 at 10:52 -0800, Manasi Navare wrote:
> > On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote:
> > > Occasionally there are LINK_ADDRESS sideband messages timing out with the
> > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These
> > > failures lead to the display not coming up on boot. Power cycling the port
> > > corresponding to the MST monitor's branch device and resending the message
> > > fixes the issue. I am not entirely sure if this is specific to my setup.
> > > However, as the power state is toggled conditionally on LINK_ADDRESS
> > > timeouts, this should not affect the working cases.
> > > 
> > 
> > > Cc: Lyude <lyude@redhat.com>
> > > Cc: Dave Airlie <airlied@redhat.com>
> > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
> > > ---
> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 70dcfa58d3c2..e06defcdcf18 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
> > >  	int len;
> > >  	struct drm_dp_sideband_msg_tx *txmsg;
> > >  	int ret;
> > > +	int attempts = 5;
> > >
> > 
> > Does the spec say this should be retried 5 times or is this just an
> > experimental number?
> 
> The spec. does not say how many times to retry, but it does say the
> source is responsible for retrying. 
> 
> > We have such magical numbers for retries all over the DP code and that makes debugging
> > harder later, so atleast a comment of why its 5 would help.
>
 
> Takes about 22 seconds from boot to complete 5 retries on my SKL, I
> think that's enough trying from the kernel side before pulling out the
> DP cable makes sense :)
>

It still appears to be a magical number so better to comment it properly.
Helps in the debug
 
> >   
> > > -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > > +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > >  	if (!txmsg)
> > >  		return;
> > >  
> > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
> > >  			}
> > >  			(*mgr->cbs->hotplug)(mgr);
> > >  		}
> > > +	} else if (attempts--) {
> > 
> > This should be --attempts if you want the total attempts to be 5
> I don't.

Yes the variable attempts is misleading in that case. Probably call it "tries"

Manasi

> > 
> > Manasi
> > 
> > > +		kfree(txmsg);
> > > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> > > +					     false);
> > > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> > > +					     true);
> > > +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);
> > > +		goto retry;
> > >  	} else {
> > >  		mstb->link_address_sent = false;
> > > -		DRM_DEBUG_KMS("link address failed %d\n", ret);
> > > +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);
> > >  	}
> > >  
> > >  	kfree(txmsg);
> > > -- 
> > > 2.11.0
> > > 
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Navare, Manasi Dec. 22, 2017, 1:37 a.m. UTC | #6
On Thu, Dec 21, 2017 at 05:32:43PM -0800, Manasi Navare wrote:
> On Thu, Dec 21, 2017 at 05:06:22PM -0800, Pandiyan, Dhinakaran wrote:
> > On Thu, 2017-12-21 at 10:52 -0800, Manasi Navare wrote:
> > > On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote:
> > > > Occasionally there are LINK_ADDRESS sideband messages timing out with the
> > > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These
> > > > failures lead to the display not coming up on boot. Power cycling the port
> > > > corresponding to the MST monitor's branch device and resending the message
> > > > fixes the issue. I am not entirely sure if this is specific to my setup.
> > > > However, as the power state is toggled conditionally on LINK_ADDRESS
> > > > timeouts, this should not affect the working cases.
> > > > 
> > > 
> > > > Cc: Lyude <lyude@redhat.com>
> > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--
> > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > index 70dcfa58d3c2..e06defcdcf18 100644
> > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
> > > >  	int len;
> > > >  	struct drm_dp_sideband_msg_tx *txmsg;
> > > >  	int ret;
> > > > +	int attempts = 5;
> > > >
> > > 
> > > Does the spec say this should be retried 5 times or is this just an
> > > experimental number?
> > 
> > The spec. does not say how many times to retry, but it does say the
> > source is responsible for retrying. 
> > 
> > > We have such magical numbers for retries all over the DP code and that makes debugging
> > > harder later, so atleast a comment of why its 5 would help.
> >
>  
> > Takes about 22 seconds from boot to complete 5 retries on my SKL, I
> > think that's enough trying from the kernel side before pulling out the
> > DP cable makes sense :)
> >
> 
> It still appears to be a magical number so better to comment it properly.
> Helps in the debug
>  
> > >   
> > > > -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > > > +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> > > >  	if (!txmsg)
> > > >  		return;
> > > >  
> > > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
> > > >  			}
> > > >  			(*mgr->cbs->hotplug)(mgr);
> > > >  		}
> > > > +	} else if (attempts--) {
> > > 
> > > This should be --attempts if you want the total attempts to be 5
> > I don't.
> 
> Yes the variable attempts is misleading in that case. Probably call it "tries"
>
I meant retries..

Manasi 
> Manasi
> 
> > > 
> > > Manasi
> > > 
> > > > +		kfree(txmsg);
> > > > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> > > > +					     false);
> > > > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> > > > +					     true);
> > > > +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);
> > > > +		goto retry;
> > > >  	} else {
> > > >  		mstb->link_address_sent = false;
> > > > -		DRM_DEBUG_KMS("link address failed %d\n", ret);
> > > > +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);
> > > >  	}
> > > >  
> > > >  	kfree(txmsg);
> > > > -- 
> > > > 2.11.0
> > > > 
> > > > _______________________________________________
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Dhinakaran Pandiyan Dec. 22, 2017, 6:24 a.m. UTC | #7
On Fri, 2017-12-22 at 00:48 +0000, Pandiyan, Dhinakaran wrote:
> On Thu, 2017-12-21 at 08:53 +0200, Jani Nikula wrote:

> > On Wed, 20 Dec 2017, Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> wrote:

> > > Occasionally there are LINK_ADDRESS sideband messages timing out with the

> > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These

> > > failures lead to the display not coming up on boot. Power cycling the port

> > > corresponding to the MST monitor's branch device and resending the message

> > > fixes the issue. I am not entirely sure if this is specific to my setup.

> > > However, as the power state is toggled conditionally on LINK_ADDRESS

> > > timeouts, this should not affect the working cases.

> > 

> > With stuff like this, I always wonder if catering for a failing setup

> > blocks us from improving working setups, because once we add this, we

> > can't regress it. For example, is there a valid scenario where we'd want

> > to fail fast here, instead of retrying?

> 

> Link address failure would result in not probing a branch device that

> already has been detected. I guess the fail fast case will be applicable

> if the said device was not really present but the parent branch told us

> otherwise.

>  


The other option is we could check the device ID of the dock and
implement this as a quirk.

Btw I found some relevant information in the spec.

"The Message Transaction originator must perform the reply timeout
check. If an error to a request causes the system to be in an invalid
state (e.g., all nodes failed to delete a virtual channel, it is the
responsibility of the Message Transaction originator to return the
system to a valid state). The Message Transaction originator is
responsible for any retries."

and

"SET_POWER_STATE
Must be programmed to 001 (binary) upon power-on reset or an upstream
device disconnect.
001 = Set local Sink device and all downstream Sink devices to D0
(normal operation mode)."

I wonder if the dock is missing this step because the monitor seems to
work well with a discrete MST hub.


> > 

> > Some nits below.

> > 

> > > Cc: Lyude <lyude@redhat.com>

> > > Cc: Dave Airlie <airlied@redhat.com>

> > > Cc: Jani Nikula <jani.nikula@intel.com>

> > > Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>

> > > ---

> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--

> > >  1 file changed, 11 insertions(+), 2 deletions(-)

> > >

> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c

> > > index 70dcfa58d3c2..e06defcdcf18 100644

> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c

> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c

> > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,

> > >  	int len;

> > >  	struct drm_dp_sideband_msg_tx *txmsg;

> > >  	int ret;

> > > +	int attempts = 5;

> > >  

> > > -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);

> > > +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);

> > >  	if (!txmsg)

> > >  		return;

> > >  

> > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,

> > >  			}

> > >  			(*mgr->cbs->hotplug)(mgr);

> > >  		}

> > > +	} else if (attempts--) {

> > 

> > You'll end up doing (attempts + 1) attempts, including the first one.

> Yeah, that is what I intended to do :) I renamed it from 'retry' to

> 'attempt' at the last moment, which made it a bit confusing I suppose.

> 

> 

> I am stressing testing my setup more to see how well this recovery works

> and update this patch.

> 


Here's what I got with 266 rounds of reboots. 
Correct link address at
 1st try		180
 Retry 1		45
 Retry 2		32
 Retry 3		0
 Retry 4		0
 Retry 5		2
Giving up		3
Incorrect link address	4

The retries help about 30% of the cases. 

>  

> 

> > 

> > > +		kfree(txmsg);

> > 

> > How about memset(txmsg, 0, sizoef(*txmsg)); here and move the goto label

> > down to avoid repeated allocations?

> Absolutely.

> 

> > 

> > > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,

> > > +					     false);

> > > +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,

> > > +					     true);

> > > +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);

> > 

> > Maybe do the debug message before you power down/up?

> Ok.

> > 

> > BR,

> > Jani.

> > 

> > > +		goto retry;

> > >  	} else {

> > >  		mstb->link_address_sent = false;

> > > -		DRM_DEBUG_KMS("link address failed %d\n", ret);

> > > +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);

> > >  	}

> > >  

> > >  	kfree(txmsg);

> > 

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Lyude Paul Jan. 4, 2018, 11:21 p.m. UTC | #8
Sorry for the late reply, I've been having very similar issues on my own MST hub
and I wanted to make sure that they were the same issue, although it seems like
they aren't.

So; I've been doing a lot of MST debugging this week and last and something I've
discovered needs to be taken into account sometimes with MST hubs is the actual
state that they're in at the point that the DRM driver detects them. I've
managed to on multiple occasions, get my hub into a weird state by:

- Plugging in MST displays into the hub
- Turning on the machine
- Unplugging MST displays from the hub (while still in the BIOS)
- Booting into linux
- Plugging MST displays into the hub
- Everything times out, the world explodes, the economy collapses, etc.

I think maybe, especially since this should be perfectly valid behavior and not
break well or poor behaving hubs, we should do a power cycle with the display
like this when the DP port initially detects an MST hub and before we start
doing any serious communication with it. Could you see if this fixes your issue
instead of the patch you've got here?

As well, mind attaching your full dmesg with drm.debug=0x6?

On Wed, 2017-12-20 at 22:36 -0800, Dhinakaran Pandiyan wrote:
> Occasionally there are LINK_ADDRESS sideband messages timing out with the
> Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These
> failures lead to the display not coming up on boot. Power cycling the port
> corresponding to the MST monitor's branch device and resending the message
> fixes the issue. I am not entirely sure if this is specific to my setup.
> However, as the power state is toggled conditionally on LINK_ADDRESS
> timeouts, this should not affect the working cases.
> 
> Cc: Lyude <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 70dcfa58d3c2..e06defcdcf18 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct
> drm_dp_mst_topology_mgr *mgr,
>  	int len;
>  	struct drm_dp_sideband_msg_tx *txmsg;
>  	int ret;
> +	int attempts = 5;
>  
> -	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> +retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
>  	if (!txmsg)
>  		return;
>  
> @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct
> drm_dp_mst_topology_mgr *mgr,
>  			}
>  			(*mgr->cbs->hotplug)(mgr);
>  		}
> +	} else if (attempts--) {
> +		kfree(txmsg);
> +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> +					     false);
> +		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
> +					     true);
> +		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);
> +		goto retry;
>  	} else {
>  		mstb->link_address_sent = false;
> -		DRM_DEBUG_KMS("link address failed %d\n", ret);
> +		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);
>  	}
>  
>  	kfree(txmsg);
Dhinakaran Pandiyan Jan. 4, 2018, 11:46 p.m. UTC | #9
On Thu, 2018-01-04 at 18:21 -0500, Lyude Paul wrote:
> Sorry for the late reply, I've been having very similar issues on my own MST hub

> and I wanted to make sure that they were the same issue, although it seems like

> they aren't.

> 

> So; I've been doing a lot of MST debugging this week and last and something I've

> discovered needs to be taken into account sometimes with MST hubs is the actual

> state that they're in at the point that the DRM driver detects them. I've

> managed to on multiple occasions, get my hub into a weird state by:

> 

> - Plugging in MST displays into the hub

> - Turning on the machine

> - Unplugging MST displays from the hub (while still in the BIOS)

> - Booting into linux

> - Plugging MST displays into the hub

> - Everything times out, the world explodes, the economy collapses, etc.

> 

> I think maybe, especially since this should be perfectly valid behavior and not

> break well or poor behaving hubs, we should do a power cycle with the display

> like this when the DP port initially detects an MST hub and before we start

> doing any serious communication with it. Could you see if this fixes your issue

> instead of the patch you've got here?

> 


Well, my reasoning to power cycle after a failure was to not affect hubs
that work. Besides that I also saw an odd cycle of timeouts and late
replies when I did this.

1. Detect hub
2. Toggle power state and send LINK_ADDRESS req.
3. LINK_ADDRESS req times out.
4. Toggle power state and send LINK_ADDRESS req.
5. Get late response for the first (and expired) LINK_ADDRESS req.
6. Go to step 3

It seems like toggling the power states flushes out some message buffers
in the MST hub.

But, in the retry approach, power cycling resulted in getting the
response for the current LINK_ADDRESS request along with the previous
one that timed out.

I could not come up with an explanation for all the behavior. So, I
decided to get back to this later.


> As well, mind attaching your full dmesg with drm.debug=0x6?


I don't have the old logs anymore, will try to get something for you.

-DK
Dhinakaran Pandiyan Jan. 5, 2018, 12:44 a.m. UTC | #10
On Thu, 2018-01-04 at 23:46 +0000, Pandiyan, Dhinakaran wrote:
> On Thu, 2018-01-04 at 18:21 -0500, Lyude Paul wrote:

> > Sorry for the late reply, I've been having very similar issues on my own MST hub

> > and I wanted to make sure that they were the same issue, although it seems like

> > they aren't.

> > 

> > So; I've been doing a lot of MST debugging this week and last and something I've

> > discovered needs to be taken into account sometimes with MST hubs is the actual

> > state that they're in at the point that the DRM driver detects them. I've

> > managed to on multiple occasions, get my hub into a weird state by:

> > 

> > - Plugging in MST displays into the hub

> > - Turning on the machine

> > - Unplugging MST displays from the hub (while still in the BIOS)

> > - Booting into linux

> > - Plugging MST displays into the hub

> > - Everything times out, the world explodes, the economy collapses, etc.

> > 

> > I think maybe, especially since this should be perfectly valid behavior and not

> > break well or poor behaving hubs, we should do a power cycle with the display

> > like this when the DP port initially detects an MST hub and before we start

> > doing any serious communication with it. Could you see if this fixes your issue

> > instead of the patch you've got here?

> > 

> 

> Well, my reasoning to power cycle after a failure was to not affect hubs

> that work. Besides that I also saw an odd cycle of timeouts and late

> replies when I did this.

> 

> 1. Detect hub

> 2. Toggle power state and send LINK_ADDRESS req.

> 3. LINK_ADDRESS req times out.

> 4. Toggle power state and send LINK_ADDRESS req.

> 5. Get late response for the first (and expired) LINK_ADDRESS req.

> 6. Go to step 3

> 

> It seems like toggling the power states flushes out some message buffers

> in the MST hub.

> 

> But, in the retry approach, power cycling resulted in getting the

> response for the current LINK_ADDRESS request along with the previous

> one that timed out.

> 

> I could not come up with an explanation for all the behavior. So, I

> decided to get back to this later.

> 

> 

> > As well, mind attaching your full dmesg with drm.debug=0x6?

> 

> I don't have the old logs anymore, will try to get something for you.

Here you go

Setup: Laptop -> ThinkPad dock -> Dell MST monitor -> Dell monitor
Hotplugged display to dock[passed] - https://pastebin.com/CemRGsCb
Connected boot[failed] - https://pastebin.com/jjXP6HzB

> 

> -DK

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 70dcfa58d3c2..e06defcdcf18 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1596,8 +1596,9 @@  static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
 	int len;
 	struct drm_dp_sideband_msg_tx *txmsg;
 	int ret;
+	int attempts = 5;
 
-	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
+retry:	txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
 	if (!txmsg)
 		return;
 
@@ -1635,9 +1636,17 @@  static void drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
 			}
 			(*mgr->cbs->hotplug)(mgr);
 		}
+	} else if (attempts--) {
+		kfree(txmsg);
+		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
+					     false);
+		drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent,
+					     true);
+		DRM_DEBUG_KMS("link address failed %d, retrying\n", ret);
+		goto retry;
 	} else {
 		mstb->link_address_sent = false;
-		DRM_DEBUG_KMS("link address failed %d\n", ret);
+		DRM_DEBUG_KMS("link address failed %d, giving up\n", ret);
 	}
 
 	kfree(txmsg);