diff mbox

[4/4] drm/i915: Retry for live status

Message ID 1440504093-19370-5-git-send-email-sonika.jindal@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

sonika.jindal@intel.com Aug. 25, 2015, 12:01 p.m. UTC
Some monitors take time in setting the live status.
So retry for few times if this is a connect HPD

Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
---
 drivers/gpu/drm/i915/intel_hdmi.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Daniel Vetter Aug. 26, 2015, 9:39 a.m. UTC | #1
On Tue, Aug 25, 2015 at 05:31:33PM +0530, Sonika Jindal wrote:
> Some monitors take time in setting the live status.
> So retry for few times if this is a connect HPD
> 
> Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>

Why was this bugfix not part of the original series? Now I have to retest
on my ivb to figure out whether maybe this one here is the issue ...

Also how exactly does this work? I thought the hpd bits control whether we
get an interrupt, not the other way round? Why exactly does this help?
Definitely needs a lot more explanation.

Also this seems to break bisect, since before the preceeding patch to
check hpd status we just retried edid reading for a while.

This kind of hacking doesn't really convince me that hpd status is
working, just that our own testing isn't good enough to catch all
real-world issues.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_hdmi.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
> index 59518b4..239d70d 100644
> --- a/drivers/gpu/drm/i915/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/intel_hdmi.c
> @@ -1415,6 +1415,7 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
>  	struct intel_connector *intel_connector =
>  				intel_hdmi->attached_connector;
>  	bool live_status = false;
> +	unsigned int retry = 3;
>  
>  	/*
>  	 * Sometimes DDI ports are enumerated as DP as well as HDMI and
> @@ -1425,6 +1426,20 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
>  		return;
>  
>  	live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
> +	if (!intel_connector->detect_edid && live_status == false) {
> +		/*
> +		 * Hotplug had occurred and old status was disconnected,
> +		 * so it might be possible that live status is not set,
> +		 * so retry for few times
> +		 */
> +		do {
> +			mdelay(10);
> +			live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
> +			if (live_status)
> +				break;
> +		} while (retry--);
> +	}
> +
>  	/*
>  	 * We are here, means there is a hotplug or a force
>  	 * detection. Clear the cached EDID and probe the
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
sonika.jindal@intel.com Aug. 26, 2015, 10:05 a.m. UTC | #2
HPD bits control the interrupt but the live status (with some monitors) takes time to get set.
We had experienced this with VLV and CHV with few monitors.
So Android code always has this retry for live status.

Yes, this was not added in the previous series because we planned to add the next set of optimization a little while later.
But this seems to be an important one.

It will be great if you can try it with your ivb. But for that you would need to first change the gen check and add a call to check live status for ivb.

Regards,
Sonika
-----Original Message-----
From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
Sent: Wednesday, August 26, 2015 3:10 PM
To: Jindal, Sonika
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 4/4] drm/i915: Retry for live status

On Tue, Aug 25, 2015 at 05:31:33PM +0530, Sonika Jindal wrote:
> Some monitors take time in setting the live status.
> So retry for few times if this is a connect HPD
> 
> Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>

Why was this bugfix not part of the original series? Now I have to retest on my ivb to figure out whether maybe this one here is the issue ...

Also how exactly does this work? I thought the hpd bits control whether we get an interrupt, not the other way round? Why exactly does this help?
Definitely needs a lot more explanation.

Also this seems to break bisect, since before the preceeding patch to check hpd status we just retried edid reading for a while.

This kind of hacking doesn't really convince me that hpd status is working, just that our own testing isn't good enough to catch all real-world issues.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_hdmi.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hdmi.c 
> b/drivers/gpu/drm/i915/intel_hdmi.c
> index 59518b4..239d70d 100644
> --- a/drivers/gpu/drm/i915/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/intel_hdmi.c
> @@ -1415,6 +1415,7 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
>  	struct intel_connector *intel_connector =
>  				intel_hdmi->attached_connector;
>  	bool live_status = false;
> +	unsigned int retry = 3;
>  
>  	/*
>  	 * Sometimes DDI ports are enumerated as DP as well as HDMI and @@ 
> -1425,6 +1426,20 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
>  		return;
>  
>  	live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
> +	if (!intel_connector->detect_edid && live_status == false) {
> +		/*
> +		 * Hotplug had occurred and old status was disconnected,
> +		 * so it might be possible that live status is not set,
> +		 * so retry for few times
> +		 */
> +		do {
> +			mdelay(10);
> +			live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
> +			if (live_status)
> +				break;
> +		} while (retry--);
> +	}
> +
>  	/*
>  	 * We are here, means there is a hotplug or a force
>  	 * detection. Clear the cached EDID and probe the
> --
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Daniel Vetter Aug. 26, 2015, 3:17 p.m. UTC | #3
On Wed, Aug 26, 2015 at 10:05:00AM +0000, Jindal, Sonika wrote:
> HPD bits control the interrupt but the live status (with some monitors) takes time to get set.
> We had experienced this with VLV and CHV with few monitors.
> So Android code always has this retry for live status.
> 
> Yes, this was not added in the previous series because we planned to add the next set of optimization a little while later.
> But this seems to be an important one.
> 
> It will be great if you can try it with your ivb. But for that you would need to first change the gen check and add a call to check live status for ivb.

Done (well I just quickly hacked up the same idea on top of your old
patches). Lessons to be learned from this:
- Make sure that you really include _all_ the bugfixes. This pach here
  isn't just tuning, it's crucial to make it work. And this isn't the
  first time vpg teams upstream something and later on we notice that
  important bugfixes have been forgotten.

  Because this wasn't done both you & me wasted a lot of time arguing
  about these patches and trying to test them.

- Please squash this patch in with patch 3 since otherwise we have a
  regression. Also please try to dig out why exactly this works like this
  since the hpd irq happening _before_ hpd status settles sounds to me
  like we have a little time machine in our silicon which can predict the
  future ...

- Please respin the patch series with the IS_VLV || gen >= 8 checks drop,
  I'm fairly confident that this bugfix here is the bit we've been looking
  for since years. At least it would be good to retest on all platforms
  for maximal test coverage.

Cheers, Daniel
> 
> Regards,
> Sonika
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
> Sent: Wednesday, August 26, 2015 3:10 PM
> To: Jindal, Sonika
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 4/4] drm/i915: Retry for live status
> 
> On Tue, Aug 25, 2015 at 05:31:33PM +0530, Sonika Jindal wrote:
> > Some monitors take time in setting the live status.
> > So retry for few times if this is a connect HPD
> > 
> > Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
> 
> Why was this bugfix not part of the original series? Now I have to retest on my ivb to figure out whether maybe this one here is the issue ...
> 
> Also how exactly does this work? I thought the hpd bits control whether we get an interrupt, not the other way round? Why exactly does this help?
> Definitely needs a lot more explanation.
> 
> Also this seems to break bisect, since before the preceeding patch to check hpd status we just retried edid reading for a while.
> 
> This kind of hacking doesn't really convince me that hpd status is working, just that our own testing isn't good enough to catch all real-world issues.
> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/intel_hdmi.c |   15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_hdmi.c 
> > b/drivers/gpu/drm/i915/intel_hdmi.c
> > index 59518b4..239d70d 100644
> > --- a/drivers/gpu/drm/i915/intel_hdmi.c
> > +++ b/drivers/gpu/drm/i915/intel_hdmi.c
> > @@ -1415,6 +1415,7 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
> >  	struct intel_connector *intel_connector =
> >  				intel_hdmi->attached_connector;
> >  	bool live_status = false;
> > +	unsigned int retry = 3;
> >  
> >  	/*
> >  	 * Sometimes DDI ports are enumerated as DP as well as HDMI and @@ 
> > -1425,6 +1426,20 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
> >  		return;
> >  
> >  	live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
> > +	if (!intel_connector->detect_edid && live_status == false) {
> > +		/*
> > +		 * Hotplug had occurred and old status was disconnected,
> > +		 * so it might be possible that live status is not set,
> > +		 * so retry for few times
> > +		 */
> > +		do {
> > +			mdelay(10);
> > +			live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
> > +			if (live_status)
> > +				break;
> > +		} while (retry--);
> > +	}
> > +
> >  	/*
> >  	 * We are here, means there is a hotplug or a force
> >  	 * detection. Clear the cached EDID and probe the
> > --
> > 1.7.10.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Sharma, Shashank Aug. 27, 2015, 9:16 a.m. UTC | #4
Regards
Shashank

On 8/26/2015 8:47 PM, Daniel Vetter wrote:
> On Wed, Aug 26, 2015 at 10:05:00AM +0000, Jindal, Sonika wrote:
>> HPD bits control the interrupt but the live status (with some monitors) takes time to get set.
>> We had experienced this with VLV and CHV with few monitors.
>> So Android code always has this retry for live status.
>>
>> Yes, this was not added in the previous series because we planned to add the next set of optimization a little while later.
>> But this seems to be an important one.
>>
>> It will be great if you can try it with your ivb. But for that you would need to first change the gen check and add a call to check live status for ivb.
>
> Done (well I just quickly hacked up the same idea on top of your old
> patches). Lessons to be learned from this:
> - Make sure that you really include _all_ the bugfixes. This pach here
>    isn't just tuning, it's crucial to make it work. And this isn't the
>    first time vpg teams upstream something and later on we notice that
>    important bugfixes have been forgotten.
>
>    Because this wasn't done both you & me wasted a lot of time arguing
>    about these patches and trying to test them.
>
Agree. We thought once the basic optimization goes in, we will add this 
as fine tuning patch. We were afraid of you guys doubting this approach 
at first itself. It looks like a little hack, but the HW itself is 
screwed up like this, to deal with.
> - Please squash this patch in with patch 3 since otherwise we have a
>    regression. Also please try to dig out why exactly this works like this
>    since the hpd irq happening _before_ hpd status settles sounds to me
>    like we have a little time machine in our silicon which can predict the
>    future ...
>
Actually this depends on the monitor also. Few monitors are slow to 
assert the HPD line, or sometimes they don't provide the right voltage 
on that, causing live status to fluctuate for a while. While VLV/CHV 
beta testing we have done this experiment with a big range of monitors, 
and concluded that 30ms(retry of 10ms * 3) is the optimized time where 
most of the monitors respond well.

We saw that we cant delay further, because HDCP compliance expects us to 
respond to HPD (out) with in 100ms. So after careful testing with many 
monitors, we have concluded this range.
> - Please respin the patch series with the IS_VLV || gen >= 8 checks drop,
>    I'm fairly confident that this bugfix here is the bit we've been looking
>    for since years. At least it would be good to retest on all platforms
>    for maximal test coverage.
>
Sure. Will do that.
> Cheers, Daniel
>>
>> Regards,
>> Sonika
>> -----Original Message-----
>> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
>> Sent: Wednesday, August 26, 2015 3:10 PM
>> To: Jindal, Sonika
>> Cc: intel-gfx@lists.freedesktop.org
>> Subject: Re: [Intel-gfx] [PATCH 4/4] drm/i915: Retry for live status
>>
>> On Tue, Aug 25, 2015 at 05:31:33PM +0530, Sonika Jindal wrote:
>>> Some monitors take time in setting the live status.
>>> So retry for few times if this is a connect HPD
>>>
>>> Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
>>
>> Why was this bugfix not part of the original series? Now I have to retest on my ivb to figure out whether maybe this one here is the issue ...
>>
>> Also how exactly does this work? I thought the hpd bits control whether we get an interrupt, not the other way round? Why exactly does this help?
>> Definitely needs a lot more explanation.
>>
>> Also this seems to break bisect, since before the preceeding patch to check hpd status we just retried edid reading for a while.
>>
>> This kind of hacking doesn't really convince me that hpd status is working, just that our own testing isn't good enough to catch all real-world issues.
>> -Daniel
>>
>>> ---
>>>   drivers/gpu/drm/i915/intel_hdmi.c |   15 +++++++++++++++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_hdmi.c
>>> b/drivers/gpu/drm/i915/intel_hdmi.c
>>> index 59518b4..239d70d 100644
>>> --- a/drivers/gpu/drm/i915/intel_hdmi.c
>>> +++ b/drivers/gpu/drm/i915/intel_hdmi.c
>>> @@ -1415,6 +1415,7 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
>>>   	struct intel_connector *intel_connector =
>>>   				intel_hdmi->attached_connector;
>>>   	bool live_status = false;
>>> +	unsigned int retry = 3;
>>>
>>>   	/*
>>>   	 * Sometimes DDI ports are enumerated as DP as well as HDMI and @@
>>> -1425,6 +1426,20 @@ void intel_hdmi_probe(struct intel_encoder *intel_encoder)
>>>   		return;
>>>
>>>   	live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
>>> +	if (!intel_connector->detect_edid && live_status == false) {
>>> +		/*
>>> +		 * Hotplug had occurred and old status was disconnected,
>>> +		 * so it might be possible that live status is not set,
>>> +		 * so retry for few times
>>> +		 */
>>> +		do {
>>> +			mdelay(10);
>>> +			live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
>>> +			if (live_status)
>>> +				break;
>>> +		} while (retry--);
>>> +	}
>>> +
>>>   	/*
>>>   	 * We are here, means there is a hotplug or a force
>>>   	 * detection. Clear the cached EDID and probe the
>>> --
>>> 1.7.10.4
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>
Shuang He Aug. 29, 2015, 11:10 a.m. UTC | #5
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 7251
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
ILK                 -2              302/302              300/302
SNB                                  315/315              315/315
IVB                                  336/336              336/336
BYT                 -7              283/283              276/283
HSW                                  378/378              378/378
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*ILK  igt@kms_flip@flip-vs-dpms-interruptible      PASS(1)      DMESG_WARN(1)
*ILK  igt@kms_flip@wf_vblank-vs-modeset-interruptible      PASS(1)      DMESG_WARN(1)
*BYT  igt@drm_read@empty-block      PASS(1)      NSPT(1)
*BYT  igt@drm_read@empty-nonblock      PASS(1)      NSPT(1)
*BYT  igt@drm_read@invalid-buffer      PASS(1)      NSPT(1)
*BYT  igt@drm_read@short-buffer-block      PASS(1)      NSPT(1)
*BYT  igt@drm_read@short-buffer-nonblock      PASS(1)      NSPT(1)
*BYT  igt@gem_partial_pwrite_pread@reads-uncached      PASS(1)      FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@reads      PASS(1)      FAIL(1)
Note: You need to pay more attention to line start with '*'
Daniel Vetter Sept. 2, 2015, 8:46 a.m. UTC | #6
On Thu, Aug 27, 2015 at 02:46:26PM +0530, Sharma, Shashank wrote:
> Regards
> Shashank
> 
> On 8/26/2015 8:47 PM, Daniel Vetter wrote:
> >On Wed, Aug 26, 2015 at 10:05:00AM +0000, Jindal, Sonika wrote:
> >>HPD bits control the interrupt but the live status (with some monitors) takes time to get set.
> >>We had experienced this with VLV and CHV with few monitors.
> >>So Android code always has this retry for live status.
> >>
> >>Yes, this was not added in the previous series because we planned to add the next set of optimization a little while later.
> >>But this seems to be an important one.
> >>
> >>It will be great if you can try it with your ivb. But for that you would need to first change the gen check and add a call to check live status for ivb.
> >
> >Done (well I just quickly hacked up the same idea on top of your old
> >patches). Lessons to be learned from this:
> >- Make sure that you really include _all_ the bugfixes. This pach here
> >   isn't just tuning, it's crucial to make it work. And this isn't the
> >   first time vpg teams upstream something and later on we notice that
> >   important bugfixes have been forgotten.
> >
> >   Because this wasn't done both you & me wasted a lot of time arguing
> >   about these patches and trying to test them.
> >
> Agree. We thought once the basic optimization goes in, we will add this as
> fine tuning patch. We were afraid of you guys doubting this approach at
> first itself. It looks like a little hack, but the HW itself is screwed up
> like this, to deal with.

Reality always wins, even if it looks really broken at first. The
important bit in those cases is to justify the change with a really good
commit message to make it clear what exactly is going on and why this
change is the correct fix/work-around.

> >- Please squash this patch in with patch 3 since otherwise we have a
> >   regression. Also please try to dig out why exactly this works like this
> >   since the hpd irq happening _before_ hpd status settles sounds to me
> >   like we have a little time machine in our silicon which can predict the
> >   future ...
> >
> Actually this depends on the monitor also. Few monitors are slow to assert
> the HPD line, or sometimes they don't provide the right voltage on that,
> causing live status to fluctuate for a while. While VLV/CHV beta testing we
> have done this experiment with a big range of monitors, and concluded that
> 30ms(retry of 10ms * 3) is the optimized time where most of the monitors
> respond well.
> 
> We saw that we cant delay further, because HDCP compliance expects us to
> respond to HPD (out) with in 100ms. So after careful testing with many
> monitors, we have concluded this range.

So what's happening on the wire is
                               
                              /------------
                ^      /\    /
               / \/\  /  \/\/              <- hpd threshold
----------/\--/     \/

                ^ first early hpd causing irq
                               ^ hpd status stable

If that's what's going on then yeah your patch makes a lot of sense, since
it perfectly explains why we've seen this on all platforms. Have you
confirmed this with oscilloscopes and all that? Please add all that
information (including pretty ascii graphs pls) to the commit message to
make sure we really have this documented correctly.
			  
> >- Please respin the patch series with the IS_VLV || gen >= 8 checks drop,
> >   I'm fairly confident that this bugfix here is the bit we've been looking
> >   for since years. At least it would be good to retest on all platforms
> >   for maximal test coverage.
> >
> Sure. Will do that.

Awesome, really looking forward to finally settling this after years!

Cheers, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index 59518b4..239d70d 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -1415,6 +1415,7 @@  void intel_hdmi_probe(struct intel_encoder *intel_encoder)
 	struct intel_connector *intel_connector =
 				intel_hdmi->attached_connector;
 	bool live_status = false;
+	unsigned int retry = 3;
 
 	/*
 	 * Sometimes DDI ports are enumerated as DP as well as HDMI and
@@ -1425,6 +1426,20 @@  void intel_hdmi_probe(struct intel_encoder *intel_encoder)
 		return;
 
 	live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
+	if (!intel_connector->detect_edid && live_status == false) {
+		/*
+		 * Hotplug had occurred and old status was disconnected,
+		 * so it might be possible that live status is not set,
+		 * so retry for few times
+		 */
+		do {
+			mdelay(10);
+			live_status = intel_hdmi_live_status(hdmi_to_dig_port(intel_hdmi));
+			if (live_status)
+				break;
+		} while (retry--);
+	}
+
 	/*
 	 * We are here, means there is a hotplug or a force
 	 * detection. Clear the cached EDID and probe the