diff mbox

drm/i915/hotplug: Fixing storm handling for digital ports

Message ID 1435575640-32105-1-git-send-email-sivakumar.thulasimani@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sivakumar Thulasimani June 29, 2015, 11 a.m. UTC
From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>

HPD storm is detected in intel_hpd_irq_handler and disabled for respective
port immediately but polling is enabled only in i915_hotplug_work_func and
not in i915_digport_work_func. This will result in disabled hpd never enabled
back again. This is fixed by calling the appropriate storm disable function
that will handle the rest of the sequence (both polling enable and reenabling
of HPD later).
---
 drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
 1 file changed, 4 insertions(+)

Comments

Daniel Vetter June 29, 2015, 4:37 p.m. UTC | #1
On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
> From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
> 
> HPD storm is detected in intel_hpd_irq_handler and disabled for respective
> port immediately but polling is enabled only in i915_hotplug_work_func and
> not in i915_digport_work_func. This will result in disabled hpd never enabled
> back again. This is fixed by calling the appropriate storm disable function
> that will handle the rest of the sequence (both polling enable and reenabling
> of HPD later).
> ---
>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> index 3c53aac..8e18587 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
>  	dev_priv->hotplug.long_port_mask = 0;
>  	short_port_mask = dev_priv->hotplug.short_port_mask;
>  	dev_priv->hotplug.short_port_mask = 0;
> +
> +	/* Disable hotplug on connectors that hit an irq storm. */
> +	intel_hpd_irq_storm_disable(dev_priv);

digport_work_func schedules the hotplug handler for everything not
handled, which should result in this getting called. It really shouldn't
matter when exactly it gets called.

Can you please provide more data and details for your analysis? Like bug
reports, backtraces and dmesg traces showing that the handler is stuck and
similar things.

Also your patch is missing the s-o-b line.
-Daniel

> +
>  	spin_unlock_irq(&dev_priv->irq_lock);
>  
>  	for (i = 0; i < I915_MAX_PORTS; i++) {
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Shuang He June 29, 2015, 10:32 p.m. UTC | #2
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6660
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
ILK                                  302/302              302/302
SNB                                  312/316              312/316
IVB                                  343/343              343/343
BYT                 -1              287/287              286/287
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*BYT  igt@gem_partial_pwrite_pread@reads-display      PASS(1)      FAIL(1)
Note: You need to pay more attention to line start with '*'
Sivakumar Thulasimani June 30, 2015, 3:15 a.m. UTC | #3
On 6/29/2015 10:07 PM, Daniel Vetter wrote:
> On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
>> From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
>>
>> HPD storm is detected in intel_hpd_irq_handler and disabled for respective
>> port immediately but polling is enabled only in i915_hotplug_work_func and
>> not in i915_digport_work_func. This will result in disabled hpd never enabled
>> back again. This is fixed by calling the appropriate storm disable function
>> that will handle the rest of the sequence (both polling enable and reenabling
>> of HPD later).
>> ---
>>   drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
>> index 3c53aac..8e18587 100644
>> --- a/drivers/gpu/drm/i915/intel_hotplug.c
>> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
>> @@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
>>   	dev_priv->hotplug.long_port_mask = 0;
>>   	short_port_mask = dev_priv->hotplug.short_port_mask;
>>   	dev_priv->hotplug.short_port_mask = 0;
>> +
>> +	/* Disable hotplug on connectors that hit an irq storm. */
>> +	intel_hpd_irq_storm_disable(dev_priv);
> digport_work_func schedules the hotplug handler for everything not
> handled, which should result in this getting called. It really shouldn't
> matter when exactly it gets called.
>
> Can you please provide more data and details for your analysis? Like bug
> reports, backtraces and dmesg traces showing that the handler is stuck and
> similar things.
>
> Also your patch is missing the s-o-b line.
> -Daniel
>
there is no bug filed for this, it was observed as part of code analysis 
(that is provided below)
I'll try to get more info as soon as i get access to a system.

short answer:
the issue will be seen during hpd storm, where the last HPD is handled 
inside intel_dp_hpd_pulse.
so i915_hotplug_work_func will not be queued thus missing the 
storm_disable call.

long answer :
To give a bit more background, lets assume that we get a call to
intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
on a HSW/BDW system during HPD storm scenario.
The following sequence will take place
*) is_dig_port will be set and will result in queue_dig being set as well
*) intel_hpd_irq_storm_detect will detect that this is 6th hpd call 
within the
HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
HPD_MARK_DISABLED
*) This will result in HPD for PORT_B being disabled immediately(masked 
in case of LPT)
*) i915_digport_work_func will be queued at the end of this function, 
since queue_dig is set
*) once in the i915_digport_work_func, hpd_pulse func pointer will be 
executed since it is defined for DP
*) intel_dp_hpd_pulse, will have long_hpd set and since the panel is 
plugged in still,
ISR will be high and so will return true.
*) intel_dp_get_dpcd, will succeed since DP is connected
*) finally IRQ_HANDLED will be returned
*) once call exits intel_hpd_irq_handler, HPD on port B will never be 
enabled again
(unmasked in case of LPT) and no more hot plug notifications.

sorry for the incomplete patch , i'll reupload again once i get some 
more details.

>>   	spin_unlock_irq(&dev_priv->irq_lock);
>>   
>>   	for (i = 0; i < I915_MAX_PORTS; i++) {
>> -- 
>> 1.7.9.5
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
Daniel Vetter June 30, 2015, 10:10 a.m. UTC | #4
On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
> 
> 
> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
> >On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
> >>From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
> >>
> >>HPD storm is detected in intel_hpd_irq_handler and disabled for respective
> >>port immediately but polling is enabled only in i915_hotplug_work_func and
> >>not in i915_digport_work_func. This will result in disabled hpd never enabled
> >>back again. This is fixed by calling the appropriate storm disable function
> >>that will handle the rest of the sequence (both polling enable and reenabling
> >>of HPD later).
> >>---
> >>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> >>index 3c53aac..8e18587 100644
> >>--- a/drivers/gpu/drm/i915/intel_hotplug.c
> >>+++ b/drivers/gpu/drm/i915/intel_hotplug.c
> >>@@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
> >>  	dev_priv->hotplug.long_port_mask = 0;
> >>  	short_port_mask = dev_priv->hotplug.short_port_mask;
> >>  	dev_priv->hotplug.short_port_mask = 0;
> >>+
> >>+	/* Disable hotplug on connectors that hit an irq storm. */
> >>+	intel_hpd_irq_storm_disable(dev_priv);
> >digport_work_func schedules the hotplug handler for everything not
> >handled, which should result in this getting called. It really shouldn't
> >matter when exactly it gets called.
> >
> >Can you please provide more data and details for your analysis? Like bug
> >reports, backtraces and dmesg traces showing that the handler is stuck and
> >similar things.
> >
> >Also your patch is missing the s-o-b line.
> >-Daniel
> >
> there is no bug filed for this, it was observed as part of code analysis
> (that is provided below)
> I'll try to get more info as soon as i get access to a system.
> 
> short answer:
> the issue will be seen during hpd storm, where the last HPD is handled
> inside intel_dp_hpd_pulse.
> so i915_hotplug_work_func will not be queued thus missing the storm_disable
> call.
> 
> long answer :
> To give a bit more background, lets assume that we get a call to
> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
> on a HSW/BDW system during HPD storm scenario.
> The following sequence will take place
> *) is_dig_port will be set and will result in queue_dig being set as well
> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
> the
> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
> HPD_MARK_DISABLED
> *) This will result in HPD for PORT_B being disabled immediately(masked in
> case of LPT)
> *) i915_digport_work_func will be queued at the end of this function, since
> queue_dig is set
> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
> executed since it is defined for DP
> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
> in still,
> ISR will be high and so will return true.
> *) intel_dp_get_dpcd, will succeed since DP is connected
> *) finally IRQ_HANDLED will be returned
> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
> enabled again
> (unmasked in case of LPT) and no more hot plug notifications.

The assumption of the storm code is that when there is a DP sink, a storm
will never happen. We need that since otherwise the mst code (which
creates ridiculous amounts of hpds on the DP port) will run into the storm
detection code all the time.

Might be better to document this design assumption somewhere, but it is
baked in. Hence my question whether you've seen this happen in the real
world - DP storms haven't been observed yet afaik, and it would be a much
more serious problem.
-Daniel

> 
> sorry for the incomplete patch , i'll reupload again once i get some more
> details.
> 
> >>  	spin_unlock_irq(&dev_priv->irq_lock);
> >>  	for (i = 0; i < I915_MAX_PORTS; i++) {
> >>-- 
> >>1.7.9.5
> >>
> >>_______________________________________________
> >>Intel-gfx mailing list
> >>Intel-gfx@lists.freedesktop.org
> >>http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
> >
> 
> 
> -- 
> regards,
> Sivakumar
> 
>
Jani Nikula June 30, 2015, 10:19 a.m. UTC | #5
On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
>> 
>> 
>> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
>> >On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
>> >>From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
>> >>
>> >>HPD storm is detected in intel_hpd_irq_handler and disabled for respective
>> >>port immediately but polling is enabled only in i915_hotplug_work_func and
>> >>not in i915_digport_work_func. This will result in disabled hpd never enabled
>> >>back again. This is fixed by calling the appropriate storm disable function
>> >>that will handle the rest of the sequence (both polling enable and reenabling
>> >>of HPD later).
>> >>---
>> >>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
>> >>  1 file changed, 4 insertions(+)
>> >>
>> >>diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
>> >>index 3c53aac..8e18587 100644
>> >>--- a/drivers/gpu/drm/i915/intel_hotplug.c
>> >>+++ b/drivers/gpu/drm/i915/intel_hotplug.c
>> >>@@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
>> >>  	dev_priv->hotplug.long_port_mask = 0;
>> >>  	short_port_mask = dev_priv->hotplug.short_port_mask;
>> >>  	dev_priv->hotplug.short_port_mask = 0;
>> >>+
>> >>+	/* Disable hotplug on connectors that hit an irq storm. */
>> >>+	intel_hpd_irq_storm_disable(dev_priv);
>> >digport_work_func schedules the hotplug handler for everything not
>> >handled, which should result in this getting called. It really shouldn't
>> >matter when exactly it gets called.
>> >
>> >Can you please provide more data and details for your analysis? Like bug
>> >reports, backtraces and dmesg traces showing that the handler is stuck and
>> >similar things.
>> >
>> >Also your patch is missing the s-o-b line.
>> >-Daniel
>> >
>> there is no bug filed for this, it was observed as part of code analysis
>> (that is provided below)
>> I'll try to get more info as soon as i get access to a system.
>> 
>> short answer:
>> the issue will be seen during hpd storm, where the last HPD is handled
>> inside intel_dp_hpd_pulse.
>> so i915_hotplug_work_func will not be queued thus missing the storm_disable
>> call.
>> 
>> long answer :
>> To give a bit more background, lets assume that we get a call to
>> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
>> on a HSW/BDW system during HPD storm scenario.
>> The following sequence will take place
>> *) is_dig_port will be set and will result in queue_dig being set as well
>> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
>> the
>> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
>> HPD_MARK_DISABLED
>> *) This will result in HPD for PORT_B being disabled immediately(masked in
>> case of LPT)
>> *) i915_digport_work_func will be queued at the end of this function, since
>> queue_dig is set
>> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
>> executed since it is defined for DP
>> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
>> in still,
>> ISR will be high and so will return true.
>> *) intel_dp_get_dpcd, will succeed since DP is connected
>> *) finally IRQ_HANDLED will be returned
>> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
>> enabled again
>> (unmasked in case of LPT) and no more hot plug notifications.
>
> The assumption of the storm code is that when there is a DP sink, a storm
> will never happen. We need that since otherwise the mst code (which
> creates ridiculous amounts of hpds on the DP port) will run into the storm
> detection code all the time.
>
> Might be better to document this design assumption somewhere, but it is
> baked in. Hence my question whether you've seen this happen in the real
> world - DP storms haven't been observed yet afaik, and it would be a much
> more serious problem.

The dp short hotplug irqs (used by mst) are not caught by the irq storm
code, but the long hotplug irqs are.

BR,
Jani.



> -Daniel
>
>> 
>> sorry for the incomplete patch , i'll reupload again once i get some more
>> details.
>> 
>> >>  	spin_unlock_irq(&dev_priv->irq_lock);
>> >>  	for (i = 0; i < I915_MAX_PORTS; i++) {
>> >>-- 
>> >>1.7.9.5
>> >>
>> >>_______________________________________________
>> >>Intel-gfx mailing list
>> >>Intel-gfx@lists.freedesktop.org
>> >>http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> >
>> >
>> 
>> 
>> -- 
>> regards,
>> Sivakumar
>> 
>> 
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter June 30, 2015, 11:16 a.m. UTC | #6
On Tue, Jun 30, 2015 at 01:19:57PM +0300, Jani Nikula wrote:
> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
> >> 
> >> 
> >> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
> >> >On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
> >> >>From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
> >> >>
> >> >>HPD storm is detected in intel_hpd_irq_handler and disabled for respective
> >> >>port immediately but polling is enabled only in i915_hotplug_work_func and
> >> >>not in i915_digport_work_func. This will result in disabled hpd never enabled
> >> >>back again. This is fixed by calling the appropriate storm disable function
> >> >>that will handle the rest of the sequence (both polling enable and reenabling
> >> >>of HPD later).
> >> >>---
> >> >>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
> >> >>  1 file changed, 4 insertions(+)
> >> >>
> >> >>diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> >> >>index 3c53aac..8e18587 100644
> >> >>--- a/drivers/gpu/drm/i915/intel_hotplug.c
> >> >>+++ b/drivers/gpu/drm/i915/intel_hotplug.c
> >> >>@@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
> >> >>  	dev_priv->hotplug.long_port_mask = 0;
> >> >>  	short_port_mask = dev_priv->hotplug.short_port_mask;
> >> >>  	dev_priv->hotplug.short_port_mask = 0;
> >> >>+
> >> >>+	/* Disable hotplug on connectors that hit an irq storm. */
> >> >>+	intel_hpd_irq_storm_disable(dev_priv);
> >> >digport_work_func schedules the hotplug handler for everything not
> >> >handled, which should result in this getting called. It really shouldn't
> >> >matter when exactly it gets called.
> >> >
> >> >Can you please provide more data and details for your analysis? Like bug
> >> >reports, backtraces and dmesg traces showing that the handler is stuck and
> >> >similar things.
> >> >
> >> >Also your patch is missing the s-o-b line.
> >> >-Daniel
> >> >
> >> there is no bug filed for this, it was observed as part of code analysis
> >> (that is provided below)
> >> I'll try to get more info as soon as i get access to a system.
> >> 
> >> short answer:
> >> the issue will be seen during hpd storm, where the last HPD is handled
> >> inside intel_dp_hpd_pulse.
> >> so i915_hotplug_work_func will not be queued thus missing the storm_disable
> >> call.
> >> 
> >> long answer :
> >> To give a bit more background, lets assume that we get a call to
> >> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
> >> on a HSW/BDW system during HPD storm scenario.
> >> The following sequence will take place
> >> *) is_dig_port will be set and will result in queue_dig being set as well
> >> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
> >> the
> >> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
> >> HPD_MARK_DISABLED
> >> *) This will result in HPD for PORT_B being disabled immediately(masked in
> >> case of LPT)
> >> *) i915_digport_work_func will be queued at the end of this function, since
> >> queue_dig is set
> >> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
> >> executed since it is defined for DP
> >> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
> >> in still,
> >> ISR will be high and so will return true.
> >> *) intel_dp_get_dpcd, will succeed since DP is connected
> >> *) finally IRQ_HANDLED will be returned
> >> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
> >> enabled again
> >> (unmasked in case of LPT) and no more hot plug notifications.
> >
> > The assumption of the storm code is that when there is a DP sink, a storm
> > will never happen. We need that since otherwise the mst code (which
> > creates ridiculous amounts of hpds on the DP port) will run into the storm
> > detection code all the time.
> >
> > Might be better to document this design assumption somewhere, but it is
> > baked in. Hence my question whether you've seen this happen in the real
> > world - DP storms haven't been observed yet afaik, and it would be a much
> > more serious problem.
> 
> The dp short hotplug irqs (used by mst) are not caught by the irq storm
> code, but the long hotplug irqs are.

We assume there's no DP hotplug storms ever, whether long or short pulses.
Trying to fix that will require serious rework since we need to wait until
dig_port_work has run to know whether the hpd was a real one or just
fluctuation, and only update storm statistic then. And once we do DP is
essentially broken, which means we also need to enable polling for dp aux
short pulses (which will probably piss off some sink device).

In short: If you have a hpd storm, and there's something DP-like
connected, you're screwed. Until we have real-world evidence of this
happening updating comments is really the only thing we need.
-Daniel
Jani Nikula June 30, 2015, 12:30 p.m. UTC | #7
On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Jun 30, 2015 at 01:19:57PM +0300, Jani Nikula wrote:
>> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
>> > On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
>> >> 
>> >> 
>> >> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
>> >> >On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
>> >> >>From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
>> >> >>
>> >> >>HPD storm is detected in intel_hpd_irq_handler and disabled for respective
>> >> >>port immediately but polling is enabled only in i915_hotplug_work_func and
>> >> >>not in i915_digport_work_func. This will result in disabled hpd never enabled
>> >> >>back again. This is fixed by calling the appropriate storm disable function
>> >> >>that will handle the rest of the sequence (both polling enable and reenabling
>> >> >>of HPD later).
>> >> >>---
>> >> >>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
>> >> >>  1 file changed, 4 insertions(+)
>> >> >>
>> >> >>diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
>> >> >>index 3c53aac..8e18587 100644
>> >> >>--- a/drivers/gpu/drm/i915/intel_hotplug.c
>> >> >>+++ b/drivers/gpu/drm/i915/intel_hotplug.c
>> >> >>@@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
>> >> >>  	dev_priv->hotplug.long_port_mask = 0;
>> >> >>  	short_port_mask = dev_priv->hotplug.short_port_mask;
>> >> >>  	dev_priv->hotplug.short_port_mask = 0;
>> >> >>+
>> >> >>+	/* Disable hotplug on connectors that hit an irq storm. */
>> >> >>+	intel_hpd_irq_storm_disable(dev_priv);
>> >> >digport_work_func schedules the hotplug handler for everything not
>> >> >handled, which should result in this getting called. It really shouldn't
>> >> >matter when exactly it gets called.
>> >> >
>> >> >Can you please provide more data and details for your analysis? Like bug
>> >> >reports, backtraces and dmesg traces showing that the handler is stuck and
>> >> >similar things.
>> >> >
>> >> >Also your patch is missing the s-o-b line.
>> >> >-Daniel
>> >> >
>> >> there is no bug filed for this, it was observed as part of code analysis
>> >> (that is provided below)
>> >> I'll try to get more info as soon as i get access to a system.
>> >> 
>> >> short answer:
>> >> the issue will be seen during hpd storm, where the last HPD is handled
>> >> inside intel_dp_hpd_pulse.
>> >> so i915_hotplug_work_func will not be queued thus missing the storm_disable
>> >> call.
>> >> 
>> >> long answer :
>> >> To give a bit more background, lets assume that we get a call to
>> >> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
>> >> on a HSW/BDW system during HPD storm scenario.
>> >> The following sequence will take place
>> >> *) is_dig_port will be set and will result in queue_dig being set as well
>> >> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
>> >> the
>> >> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
>> >> HPD_MARK_DISABLED
>> >> *) This will result in HPD for PORT_B being disabled immediately(masked in
>> >> case of LPT)
>> >> *) i915_digport_work_func will be queued at the end of this function, since
>> >> queue_dig is set
>> >> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
>> >> executed since it is defined for DP
>> >> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
>> >> in still,
>> >> ISR will be high and so will return true.
>> >> *) intel_dp_get_dpcd, will succeed since DP is connected
>> >> *) finally IRQ_HANDLED will be returned
>> >> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
>> >> enabled again
>> >> (unmasked in case of LPT) and no more hot plug notifications.
>> >
>> > The assumption of the storm code is that when there is a DP sink, a storm
>> > will never happen. We need that since otherwise the mst code (which
>> > creates ridiculous amounts of hpds on the DP port) will run into the storm
>> > detection code all the time.
>> >
>> > Might be better to document this design assumption somewhere, but it is
>> > baked in. Hence my question whether you've seen this happen in the real
>> > world - DP storms haven't been observed yet afaik, and it would be a much
>> > more serious problem.
>> 
>> The dp short hotplug irqs (used by mst) are not caught by the irq storm
>> code, but the long hotplug irqs are.
>
> We assume there's no DP hotplug storms ever, whether long or short pulses.
> Trying to fix that will require serious rework since we need to wait until
> dig_port_work has run to know whether the hpd was a real one or just
> fluctuation, and only update storm statistic then. And once we do DP is
> essentially broken, which means we also need to enable polling for dp aux
> short pulses (which will probably piss off some sink device).
>
> In short: If you have a hpd storm, and there's something DP-like
> connected, you're screwed. Until we have real-world evidence of this
> happening updating comments is really the only thing we need.

In that case we should update the code to never do hotplug irq storm
detection on dp long hpd, which we currently do.

BR,
Jani.


> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Ville Syrjala June 30, 2015, 12:47 p.m. UTC | #8
On Tue, Jun 30, 2015 at 03:30:16PM +0300, Jani Nikula wrote:
> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Tue, Jun 30, 2015 at 01:19:57PM +0300, Jani Nikula wrote:
> >> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> >> > On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
> >> >> 
> >> >> 
> >> >> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
> >> >> >On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
> >> >> >>From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
> >> >> >>
> >> >> >>HPD storm is detected in intel_hpd_irq_handler and disabled for respective
> >> >> >>port immediately but polling is enabled only in i915_hotplug_work_func and
> >> >> >>not in i915_digport_work_func. This will result in disabled hpd never enabled
> >> >> >>back again. This is fixed by calling the appropriate storm disable function
> >> >> >>that will handle the rest of the sequence (both polling enable and reenabling
> >> >> >>of HPD later).
> >> >> >>---
> >> >> >>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
> >> >> >>  1 file changed, 4 insertions(+)
> >> >> >>
> >> >> >>diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> >> >> >>index 3c53aac..8e18587 100644
> >> >> >>--- a/drivers/gpu/drm/i915/intel_hotplug.c
> >> >> >>+++ b/drivers/gpu/drm/i915/intel_hotplug.c
> >> >> >>@@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
> >> >> >>  	dev_priv->hotplug.long_port_mask = 0;
> >> >> >>  	short_port_mask = dev_priv->hotplug.short_port_mask;
> >> >> >>  	dev_priv->hotplug.short_port_mask = 0;
> >> >> >>+
> >> >> >>+	/* Disable hotplug on connectors that hit an irq storm. */
> >> >> >>+	intel_hpd_irq_storm_disable(dev_priv);
> >> >> >digport_work_func schedules the hotplug handler for everything not
> >> >> >handled, which should result in this getting called. It really shouldn't
> >> >> >matter when exactly it gets called.
> >> >> >
> >> >> >Can you please provide more data and details for your analysis? Like bug
> >> >> >reports, backtraces and dmesg traces showing that the handler is stuck and
> >> >> >similar things.
> >> >> >
> >> >> >Also your patch is missing the s-o-b line.
> >> >> >-Daniel
> >> >> >
> >> >> there is no bug filed for this, it was observed as part of code analysis
> >> >> (that is provided below)
> >> >> I'll try to get more info as soon as i get access to a system.
> >> >> 
> >> >> short answer:
> >> >> the issue will be seen during hpd storm, where the last HPD is handled
> >> >> inside intel_dp_hpd_pulse.
> >> >> so i915_hotplug_work_func will not be queued thus missing the storm_disable
> >> >> call.
> >> >> 
> >> >> long answer :
> >> >> To give a bit more background, lets assume that we get a call to
> >> >> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
> >> >> on a HSW/BDW system during HPD storm scenario.
> >> >> The following sequence will take place
> >> >> *) is_dig_port will be set and will result in queue_dig being set as well
> >> >> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
> >> >> the
> >> >> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
> >> >> HPD_MARK_DISABLED
> >> >> *) This will result in HPD for PORT_B being disabled immediately(masked in
> >> >> case of LPT)
> >> >> *) i915_digport_work_func will be queued at the end of this function, since
> >> >> queue_dig is set
> >> >> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
> >> >> executed since it is defined for DP
> >> >> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
> >> >> in still,
> >> >> ISR will be high and so will return true.
> >> >> *) intel_dp_get_dpcd, will succeed since DP is connected
> >> >> *) finally IRQ_HANDLED will be returned
> >> >> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
> >> >> enabled again
> >> >> (unmasked in case of LPT) and no more hot plug notifications.
> >> >
> >> > The assumption of the storm code is that when there is a DP sink, a storm
> >> > will never happen. We need that since otherwise the mst code (which
> >> > creates ridiculous amounts of hpds on the DP port) will run into the storm
> >> > detection code all the time.
> >> >
> >> > Might be better to document this design assumption somewhere, but it is
> >> > baked in. Hence my question whether you've seen this happen in the real
> >> > world - DP storms haven't been observed yet afaik, and it would be a much
> >> > more serious problem.
> >> 
> >> The dp short hotplug irqs (used by mst) are not caught by the irq storm
> >> code, but the long hotplug irqs are.
> >
> > We assume there's no DP hotplug storms ever, whether long or short pulses.
> > Trying to fix that will require serious rework since we need to wait until
> > dig_port_work has run to know whether the hpd was a real one or just
> > fluctuation, and only update storm statistic then. And once we do DP is
> > essentially broken, which means we also need to enable polling for dp aux
> > short pulses (which will probably piss off some sink device).
> >
> > In short: If you have a hpd storm, and there's something DP-like
> > connected, you're screwed. Until we have real-world evidence of this
> > happening updating comments is really the only thing we need.
> 
> In that case we should update the code to never do hotplug irq storm
> detection on dp long hpd, which we currently do.

The HPD pin is shared for DP and HDMI so we can't disable HPD just for
HDMI when a storm is detected.
Daniel Vetter July 1, 2015, 12:38 p.m. UTC | #9
On Tue, Jun 30, 2015 at 03:47:41PM +0300, Ville Syrjälä wrote:
> On Tue, Jun 30, 2015 at 03:30:16PM +0300, Jani Nikula wrote:
> > On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Tue, Jun 30, 2015 at 01:19:57PM +0300, Jani Nikula wrote:
> > >> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >> > On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
> > >> >> 
> > >> >> 
> > >> >> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
> > >> >> >On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
> > >> >> >>From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
> > >> >> >>
> > >> >> >>HPD storm is detected in intel_hpd_irq_handler and disabled for respective
> > >> >> >>port immediately but polling is enabled only in i915_hotplug_work_func and
> > >> >> >>not in i915_digport_work_func. This will result in disabled hpd never enabled
> > >> >> >>back again. This is fixed by calling the appropriate storm disable function
> > >> >> >>that will handle the rest of the sequence (both polling enable and reenabling
> > >> >> >>of HPD later).
> > >> >> >>---
> > >> >> >>  drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
> > >> >> >>  1 file changed, 4 insertions(+)
> > >> >> >>
> > >> >> >>diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
> > >> >> >>index 3c53aac..8e18587 100644
> > >> >> >>--- a/drivers/gpu/drm/i915/intel_hotplug.c
> > >> >> >>+++ b/drivers/gpu/drm/i915/intel_hotplug.c
> > >> >> >>@@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
> > >> >> >>  	dev_priv->hotplug.long_port_mask = 0;
> > >> >> >>  	short_port_mask = dev_priv->hotplug.short_port_mask;
> > >> >> >>  	dev_priv->hotplug.short_port_mask = 0;
> > >> >> >>+
> > >> >> >>+	/* Disable hotplug on connectors that hit an irq storm. */
> > >> >> >>+	intel_hpd_irq_storm_disable(dev_priv);
> > >> >> >digport_work_func schedules the hotplug handler for everything not
> > >> >> >handled, which should result in this getting called. It really shouldn't
> > >> >> >matter when exactly it gets called.
> > >> >> >
> > >> >> >Can you please provide more data and details for your analysis? Like bug
> > >> >> >reports, backtraces and dmesg traces showing that the handler is stuck and
> > >> >> >similar things.
> > >> >> >
> > >> >> >Also your patch is missing the s-o-b line.
> > >> >> >-Daniel
> > >> >> >
> > >> >> there is no bug filed for this, it was observed as part of code analysis
> > >> >> (that is provided below)
> > >> >> I'll try to get more info as soon as i get access to a system.
> > >> >> 
> > >> >> short answer:
> > >> >> the issue will be seen during hpd storm, where the last HPD is handled
> > >> >> inside intel_dp_hpd_pulse.
> > >> >> so i915_hotplug_work_func will not be queued thus missing the storm_disable
> > >> >> call.
> > >> >> 
> > >> >> long answer :
> > >> >> To give a bit more background, lets assume that we get a call to
> > >> >> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
> > >> >> on a HSW/BDW system during HPD storm scenario.
> > >> >> The following sequence will take place
> > >> >> *) is_dig_port will be set and will result in queue_dig being set as well
> > >> >> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
> > >> >> the
> > >> >> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
> > >> >> HPD_MARK_DISABLED
> > >> >> *) This will result in HPD for PORT_B being disabled immediately(masked in
> > >> >> case of LPT)
> > >> >> *) i915_digport_work_func will be queued at the end of this function, since
> > >> >> queue_dig is set
> > >> >> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
> > >> >> executed since it is defined for DP
> > >> >> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
> > >> >> in still,
> > >> >> ISR will be high and so will return true.
> > >> >> *) intel_dp_get_dpcd, will succeed since DP is connected
> > >> >> *) finally IRQ_HANDLED will be returned
> > >> >> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
> > >> >> enabled again
> > >> >> (unmasked in case of LPT) and no more hot plug notifications.
> > >> >
> > >> > The assumption of the storm code is that when there is a DP sink, a storm
> > >> > will never happen. We need that since otherwise the mst code (which
> > >> > creates ridiculous amounts of hpds on the DP port) will run into the storm
> > >> > detection code all the time.
> > >> >
> > >> > Might be better to document this design assumption somewhere, but it is
> > >> > baked in. Hence my question whether you've seen this happen in the real
> > >> > world - DP storms haven't been observed yet afaik, and it would be a much
> > >> > more serious problem.
> > >> 
> > >> The dp short hotplug irqs (used by mst) are not caught by the irq storm
> > >> code, but the long hotplug irqs are.
> > >
> > > We assume there's no DP hotplug storms ever, whether long or short pulses.
> > > Trying to fix that will require serious rework since we need to wait until
> > > dig_port_work has run to know whether the hpd was a real one or just
> > > fluctuation, and only update storm statistic then. And once we do DP is
> > > essentially broken, which means we also need to enable polling for dp aux
> > > short pulses (which will probably piss off some sink device).
> > >
> > > In short: If you have a hpd storm, and there's something DP-like
> > > connected, you're screwed. Until we have real-world evidence of this
> > > happening updating comments is really the only thing we need.
> > 
> > In that case we should update the code to never do hotplug irq storm
> > detection on dp long hpd, which we currently do.
> 
> The HPD pin is shared for DP and HDMI so we can't disable HPD just for
> HDMI when a storm is detected.

Yup this is the crux. The real fix really is wiring up IRQ_NONE handling
all the way to be able to differentiate storms from normal/expected irq
load. But that really has to wait until this happens in reality somewhere
with a DP sink.
-Daniel
Sivakumar Thulasimani July 1, 2015, 12:56 p.m. UTC | #10
On 7/1/2015 6:08 PM, Daniel Vetter wrote:
> On Tue, Jun 30, 2015 at 03:47:41PM +0300, Ville Syrjälä wrote:
>> On Tue, Jun 30, 2015 at 03:30:16PM +0300, Jani Nikula wrote:
>>> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>> On Tue, Jun 30, 2015 at 01:19:57PM +0300, Jani Nikula wrote:
>>>>> On Tue, 30 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>> On Tue, Jun 30, 2015 at 08:45:48AM +0530, Sivakumar Thulasimani wrote:
>>>>>>>
>>>>>>> On 6/29/2015 10:07 PM, Daniel Vetter wrote:
>>>>>>>> On Mon, Jun 29, 2015 at 04:30:40PM +0530, Sivakumar Thulasimani wrote:
>>>>>>>>> From: "Thulasimani, Sivakumar" <sivakumar.thulasimani@intel.com>
>>>>>>>>>
>>>>>>>>> HPD storm is detected in intel_hpd_irq_handler and disabled for respective
>>>>>>>>> port immediately but polling is enabled only in i915_hotplug_work_func and
>>>>>>>>> not in i915_digport_work_func. This will result in disabled hpd never enabled
>>>>>>>>> back again. This is fixed by calling the appropriate storm disable function
>>>>>>>>> that will handle the rest of the sequence (both polling enable and reenabling
>>>>>>>>> of HPD later).
>>>>>>>>> ---
>>>>>>>>>   drivers/gpu/drm/i915/intel_hotplug.c |    4 ++++
>>>>>>>>>   1 file changed, 4 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
>>>>>>>>> index 3c53aac..8e18587 100644
>>>>>>>>> --- a/drivers/gpu/drm/i915/intel_hotplug.c
>>>>>>>>> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
>>>>>>>>> @@ -205,6 +205,10 @@ static void i915_digport_work_func(struct work_struct *work)
>>>>>>>>>   	dev_priv->hotplug.long_port_mask = 0;
>>>>>>>>>   	short_port_mask = dev_priv->hotplug.short_port_mask;
>>>>>>>>>   	dev_priv->hotplug.short_port_mask = 0;
>>>>>>>>> +
>>>>>>>>> +	/* Disable hotplug on connectors that hit an irq storm. */
>>>>>>>>> +	intel_hpd_irq_storm_disable(dev_priv);
>>>>>>>> digport_work_func schedules the hotplug handler for everything not
>>>>>>>> handled, which should result in this getting called. It really shouldn't
>>>>>>>> matter when exactly it gets called.
>>>>>>>>
>>>>>>>> Can you please provide more data and details for your analysis? Like bug
>>>>>>>> reports, backtraces and dmesg traces showing that the handler is stuck and
>>>>>>>> similar things.
>>>>>>>>
>>>>>>>> Also your patch is missing the s-o-b line.
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>> there is no bug filed for this, it was observed as part of code analysis
>>>>>>> (that is provided below)
>>>>>>> I'll try to get more info as soon as i get access to a system.
>>>>>>>
>>>>>>> short answer:
>>>>>>> the issue will be seen during hpd storm, where the last HPD is handled
>>>>>>> inside intel_dp_hpd_pulse.
>>>>>>> so i915_hotplug_work_func will not be queued thus missing the storm_disable
>>>>>>> call.
>>>>>>>
>>>>>>> long answer :
>>>>>>> To give a bit more background, lets assume that we get a call to
>>>>>>> intel_hpd_irq_handler, with params long pulse for DP panel in PORT_B
>>>>>>> on a HSW/BDW system during HPD storm scenario.
>>>>>>> The following sequence will take place
>>>>>>> *) is_dig_port will be set and will result in queue_dig being set as well
>>>>>>> *) intel_hpd_irq_storm_detect will detect that this is 6th hpd call within
>>>>>>> the
>>>>>>> HPD_STORM_DETECT_PERIOD and so will mark the HPD status of PORT_B to
>>>>>>> HPD_MARK_DISABLED
>>>>>>> *) This will result in HPD for PORT_B being disabled immediately(masked in
>>>>>>> case of LPT)
>>>>>>> *) i915_digport_work_func will be queued at the end of this function, since
>>>>>>> queue_dig is set
>>>>>>> *) once in the i915_digport_work_func, hpd_pulse func pointer will be
>>>>>>> executed since it is defined for DP
>>>>>>> *) intel_dp_hpd_pulse, will have long_hpd set and since the panel is plugged
>>>>>>> in still,
>>>>>>> ISR will be high and so will return true.
>>>>>>> *) intel_dp_get_dpcd, will succeed since DP is connected
>>>>>>> *) finally IRQ_HANDLED will be returned
>>>>>>> *) once call exits intel_hpd_irq_handler, HPD on port B will never be
>>>>>>> enabled again
>>>>>>> (unmasked in case of LPT) and no more hot plug notifications.
>>>>>> The assumption of the storm code is that when there is a DP sink, a storm
>>>>>> will never happen. We need that since otherwise the mst code (which
>>>>>> creates ridiculous amounts of hpds on the DP port) will run into the storm
>>>>>> detection code all the time.
>>>>>>
>>>>>> Might be better to document this design assumption somewhere, but it is
>>>>>> baked in. Hence my question whether you've seen this happen in the real
>>>>>> world - DP storms haven't been observed yet afaik, and it would be a much
>>>>>> more serious problem.
>>>>> The dp short hotplug irqs (used by mst) are not caught by the irq storm
>>>>> code, but the long hotplug irqs are.
>>>> We assume there's no DP hotplug storms ever, whether long or short pulses.
>>>> Trying to fix that will require serious rework since we need to wait until
>>>> dig_port_work has run to know whether the hpd was a real one or just
>>>> fluctuation, and only update storm statistic then. And once we do DP is
>>>> essentially broken, which means we also need to enable polling for dp aux
>>>> short pulses (which will probably piss off some sink device).
>>>>
>>>> In short: If you have a hpd storm, and there's something DP-like
>>>> connected, you're screwed. Until we have real-world evidence of this
>>>> happening updating comments is really the only thing we need.
>>> In that case we should update the code to never do hotplug irq storm
>>> detection on dp long hpd, which we currently do.
>> The HPD pin is shared for DP and HDMI so we can't disable HPD just for
>> HDMI when a storm is detected.
> Yup this is the crux. The real fix really is wiring up IRQ_NONE handling
> all the way to be able to differentiate storms from normal/expected irq
> load. But that really has to wait until this happens in reality somewhere
> with a DP sink.
> -Daniel
>
when checking internally i was informed that DP hotplug storm is also 
normal like HDMI
if not more frequent, but for now i'll wait till this comes as a real 
world issue and drop this
patch.

-- regards, Sivakumar
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index 3c53aac..8e18587 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -205,6 +205,10 @@  static void i915_digport_work_func(struct work_struct *work)
 	dev_priv->hotplug.long_port_mask = 0;
 	short_port_mask = dev_priv->hotplug.short_port_mask;
 	dev_priv->hotplug.short_port_mask = 0;
+
+	/* Disable hotplug on connectors that hit an irq storm. */
+	intel_hpd_irq_storm_disable(dev_priv);
+
 	spin_unlock_irq(&dev_priv->irq_lock);
 
 	for (i = 0; i < I915_MAX_PORTS; i++) {