diff mbox

[4/4] drm/i915: Avoid race of intel_crt_detect_hotplug() with HPD interrupt

Message ID 1441138895-23732-5-git-send-email-eich@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Egbert Eich Sept. 1, 2015, 8:21 p.m. UTC
A HPD interrupt may fire during intel_crt_detect_hotplug() - especially
when HPD interrupt storms occur.
Since the interrupt handler changes the enabled interrupt lines when it
detects a storm this races with intel_crt_detect_hotplug().
To avoid this, shiled the rmw cycles with IRQ save spinlocks.

Signed-off-by: Egbert Eich <eich@suse.de>
---
 drivers/gpu/drm/i915/intel_crt.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Daniel Vetter Sept. 2, 2015, 12:06 p.m. UTC | #1
On Tue, Sep 01, 2015 at 10:21:35PM +0200, Egbert Eich wrote:
> A HPD interrupt may fire during intel_crt_detect_hotplug() - especially
> when HPD interrupt storms occur.
> Since the interrupt handler changes the enabled interrupt lines when it
> detects a storm this races with intel_crt_detect_hotplug().
> To avoid this, shiled the rmw cycles with IRQ save spinlocks.
> 
> Signed-off-by: Egbert Eich <eich@suse.de>

I think this only reduces one source of such races, but fundamentally we
can't avoid them. E.g. if you're _very_ unlucky you might cause a real hpd
right when the strom code frobs around. Plugging the race with this known
interrupt source here doesn't fix the fundamental issue.

Also I think the actual interrupt deliver is fairly asynchronous, both in
the hardware and in the sw handling. E.g. spin_lock_irq does not
synchronize with the interrupt handler on SMP systems, it only guarantees
that it's not running concurrently on the same cpu (which would deadlock).

I think fundamentally this race is unfixable.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_crt.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_crt.c b/drivers/gpu/drm/i915/intel_crt.c
> index af5e43b..685f3de 100644
> --- a/drivers/gpu/drm/i915/intel_crt.c
> +++ b/drivers/gpu/drm/i915/intel_crt.c
> @@ -376,9 +376,10 @@ static bool intel_crt_detect_hotplug(struct drm_connector *connector)
>  {
>  	struct drm_device *dev = connector->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	u32 hotplug_en, orig, stat;
> +	u32 hotplug_en, stat;
>  	bool ret = false;
>  	int i, tries = 0;
> +	unsigned long irqflags;
>  
>  	if (HAS_PCH_SPLIT(dev))
>  		return intel_ironlake_crt_detect_hotplug(connector);
> @@ -395,12 +396,14 @@ static bool intel_crt_detect_hotplug(struct drm_connector *connector)
>  		tries = 2;
>  	else
>  		tries = 1;
> -	hotplug_en = orig = I915_READ(PORT_HOTPLUG_EN);
> -	hotplug_en |= CRT_HOTPLUG_FORCE_DETECT;
>  
>  	for (i = 0; i < tries ; i++) {
>  		/* turn on the FORCE_DETECT */
> -		I915_WRITE(PORT_HOTPLUG_EN, hotplug_en);
> +		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
> +		hotplug_en = I915_READ(PORT_HOTPLUG_EN);
> +		I915_WRITE(PORT_HOTPLUG_EN, hotplug_en |
> +			   CRT_HOTPLUG_FORCE_DETECT);
> +		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
>  		/* wait for FORCE_DETECT to go off */
>  		if (wait_for((I915_READ(PORT_HOTPLUG_EN) &
>  			      CRT_HOTPLUG_FORCE_DETECT) == 0,
> @@ -416,7 +419,11 @@ static bool intel_crt_detect_hotplug(struct drm_connector *connector)
>  	I915_WRITE(PORT_HOTPLUG_STAT, CRT_HOTPLUG_INT_STATUS);
>  
>  	/* and put the bits back */
> -	I915_WRITE(PORT_HOTPLUG_EN, orig);
> +	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
> +	hotplug_en = I915_READ(PORT_HOTPLUG_EN);
> +	hotplug_en &= ~CRT_HOTPLUG_FORCE_DETECT;
> +	I915_WRITE(PORT_HOTPLUG_EN, hotplug_en);
> +	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
>  
>  	return ret;
>  }
> -- 
> 1.8.4.5
>
Egbert Eich Sept. 2, 2015, 2:19 p.m. UTC | #2
Daniel Vetter writes:
 > On Tue, Sep 01, 2015 at 10:21:35PM +0200, Egbert Eich wrote:
 > > A HPD interrupt may fire during intel_crt_detect_hotplug() - especially
 > > when HPD interrupt storms occur.
 > > Since the interrupt handler changes the enabled interrupt lines when it
 > > detects a storm this races with intel_crt_detect_hotplug().
 > > To avoid this, shiled the rmw cycles with IRQ save spinlocks.
 > > 
 > > Signed-off-by: Egbert Eich <eich@suse.de>
 > 
 > I think this only reduces one source of such races, but fundamentally we
 > can't avoid them. E.g. if you're _very_ unlucky you might cause a real hpd
 > right when the strom code frobs around. Plugging the race with this known

This is exactly the scenatio I'm getting here. I get HPD interrupts at an 
order of 10^4 / sec.

 > interrupt source here doesn't fix the fundamental issue.
 > 
 > Also I think the actual interrupt deliver is fairly asynchronous, both in
 > the hardware and in the sw handling. E.g. spin_lock_irq does not
 > synchronize with the interrupt handler on SMP systems, it only guarantees
 > that it's not running concurrently on the same cpu (which would deadlock).
 > 
 > I think fundamentally this race is unfixable.


There is one important race we avoid with this patch: It is that
we change the PORT_HOTPLUG_EN concurrently in the interrupt handler
(thru xxx_hpd_irq_setup() and in intel_crt_detect_hotplug()).

With the test system that I have here the old version of the code
easily runs into this as the time spent inside intel_crt_detect_hotplug() 
is quite long - especially when no CRT is connected.

What happens intel_crt_detect_hotplug() reads the value of PORT_HOTPLUG_EN
at entry, then frobs around for ages, during this time several HPD interrupts
occur, the storm is detected, the bit related to the stormy line is unset
then after ages intel_crt_detect_hotplug() decides to give up and restores
the value it read on entry.

To remedy this this patch reads the value of PORT_HOTPLUG_EN whenever 
it is going to change it and adds the spin locks to make sure the 
read-modify-write cycles don't happen concurrently.

PORT_HOTPLUG_EN is only touched in two places: in xxx_hpd_irq_setup()
and in intel_crt_detect_hotplug(), the former can be called from an
interrupt handler.

Not sure why you see a problem here.

Cheers,
	Egbert.
Jani Nikula Sept. 2, 2015, 2:32 p.m. UTC | #3
On Wed, 02 Sep 2015, Egbert Eich <eich@suse.com> wrote:
> This is exactly the scenatio I'm getting here. I get HPD interrupts at an 
> order of 10^4 / sec.

Makes you wonder if either you have faulty hardware or we are
configuring the hardware wrong (we overlook some configuration about
some voltage/duration threshold maybe, or get irqs from a line that's
floating, or something).

BR,
Jani.
Daniel Vetter Sept. 2, 2015, 2:46 p.m. UTC | #4
On Wed, Sep 02, 2015 at 04:19:00PM +0200, Egbert Eich wrote:
> Daniel Vetter writes:
>  > On Tue, Sep 01, 2015 at 10:21:35PM +0200, Egbert Eich wrote:
>  > > A HPD interrupt may fire during intel_crt_detect_hotplug() - especially
>  > > when HPD interrupt storms occur.
>  > > Since the interrupt handler changes the enabled interrupt lines when it
>  > > detects a storm this races with intel_crt_detect_hotplug().
>  > > To avoid this, shiled the rmw cycles with IRQ save spinlocks.
>  > > 
>  > > Signed-off-by: Egbert Eich <eich@suse.de>
>  > 
>  > I think this only reduces one source of such races, but fundamentally we
>  > can't avoid them. E.g. if you're _very_ unlucky you might cause a real hpd
>  > right when the strom code frobs around. Plugging the race with this known
> 
> This is exactly the scenatio I'm getting here. I get HPD interrupts at an 
> order of 10^4 / sec.
> 
>  > interrupt source here doesn't fix the fundamental issue.
>  > 
>  > Also I think the actual interrupt deliver is fairly asynchronous, both in
>  > the hardware and in the sw handling. E.g. spin_lock_irq does not
>  > synchronize with the interrupt handler on SMP systems, it only guarantees
>  > that it's not running concurrently on the same cpu (which would deadlock).
>  > 
>  > I think fundamentally this race is unfixable.
> 
> 
> There is one important race we avoid with this patch: It is that
> we change the PORT_HOTPLUG_EN concurrently in the interrupt handler
> (thru xxx_hpd_irq_setup() and in intel_crt_detect_hotplug()).
> 
> With the test system that I have here the old version of the code
> easily runs into this as the time spent inside intel_crt_detect_hotplug() 
> is quite long - especially when no CRT is connected.
> 
> What happens intel_crt_detect_hotplug() reads the value of PORT_HOTPLUG_EN
> at entry, then frobs around for ages, during this time several HPD interrupts
> occur, the storm is detected, the bit related to the stormy line is unset
> then after ages intel_crt_detect_hotplug() decides to give up and restores
> the value it read on entry.
> 
> To remedy this this patch reads the value of PORT_HOTPLUG_EN whenever 
> it is going to change it and adds the spin locks to make sure the 
> read-modify-write cycles don't happen concurrently.
> 
> PORT_HOTPLUG_EN is only touched in two places: in xxx_hpd_irq_setup()
> and in intel_crt_detect_hotplug(), the former can be called from an
> interrupt handler.
> 
> Not sure why you see a problem here.

Hm I missed that this same register is also accessed by the irq handler
code, and it's not just that touching these bits can cause interrupts. So
yeah we need your patch, but it needs to be clearer in the commit message
that there's also trouble with concurrent register access to
PORT_HOTPLUG_EN.

Also I think a commen in the code why we grab that spinlock would be good.
For that extracting a small helper to manipulate the register (like we do
with other irq mask registers with functions like ilk_update_gt_irq) would
be good - then we have just one place to put that commment.
-Daniel
Egbert Eich Sept. 2, 2015, 2:58 p.m. UTC | #5
Jani Nikula writes:
 > On Wed, 02 Sep 2015, Egbert Eich <eich@suse.com> wrote:
 > > This is exactly the scenatio I'm getting here. I get HPD interrupts at an 
 > > order of 10^4 / sec.
 > 
 > Makes you wonder if either you have faulty hardware or we are
 > configuring the hardware wrong (we overlook some configuration about
 > some voltage/duration threshold maybe, or get irqs from a line that's
 > floating, or something).

It is faulty hardware. But it is not a single machine that broke.
It is an entire series. IMHO due to bad signal routing and poor shilding
there is crosstalk on the SDVO lines signaling the plug status.
Since SDVO uses PCIe lines it is AC coupled, if I recall correctly
from reading the specs long time ago, one status is signalled by a
10MHz signal, the other by 20MHz.

At the time when I implemented this I've seen other reports from systems 
which showed similar problems under certain conditions(*) - although not 
quite as bad, therefore I thought of a general solution to get rid of 
this once and for all. If this had only been one system with this problem, 
I would just have blacklisted it.

(*) It seems that this somewhat depends on the video mode set (supports
the crosstalk theory) but I also had a report where this occurred at
certain charging levels and whether a power supply was connected or
not.

Cheers,
	Egbert.
Egbert Eich Sept. 2, 2015, 3:17 p.m. UTC | #6
Daniel Vetter writes:
 > On Wed, Sep 02, 2015 at 04:19:00PM +0200, Egbert Eich wrote:
 > 
 > Hm I missed that this same register is also accessed by the irq handler
 > code, and it's not just that touching these bits can cause interrupts. So
 > yeah we need your patch, but it needs to be clearer in the commit message
 > that there's also trouble with concurrent register access to
 > PORT_HOTPLUG_EN.
 > 
 > Also I think a commen in the code why we grab that spinlock would be good.
 > For that extracting a small helper to manipulate the register (like we do
 > with other irq mask registers with functions like ilk_update_gt_irq) would
 > be good - then we have just one place to put that commment.

OK, I will come up with a suggestion.

Cheers,
	Egbert.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_crt.c b/drivers/gpu/drm/i915/intel_crt.c
index af5e43b..685f3de 100644
--- a/drivers/gpu/drm/i915/intel_crt.c
+++ b/drivers/gpu/drm/i915/intel_crt.c
@@ -376,9 +376,10 @@  static bool intel_crt_detect_hotplug(struct drm_connector *connector)
 {
 	struct drm_device *dev = connector->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 hotplug_en, orig, stat;
+	u32 hotplug_en, stat;
 	bool ret = false;
 	int i, tries = 0;
+	unsigned long irqflags;
 
 	if (HAS_PCH_SPLIT(dev))
 		return intel_ironlake_crt_detect_hotplug(connector);
@@ -395,12 +396,14 @@  static bool intel_crt_detect_hotplug(struct drm_connector *connector)
 		tries = 2;
 	else
 		tries = 1;
-	hotplug_en = orig = I915_READ(PORT_HOTPLUG_EN);
-	hotplug_en |= CRT_HOTPLUG_FORCE_DETECT;
 
 	for (i = 0; i < tries ; i++) {
 		/* turn on the FORCE_DETECT */
-		I915_WRITE(PORT_HOTPLUG_EN, hotplug_en);
+		spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+		hotplug_en = I915_READ(PORT_HOTPLUG_EN);
+		I915_WRITE(PORT_HOTPLUG_EN, hotplug_en |
+			   CRT_HOTPLUG_FORCE_DETECT);
+		spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
 		/* wait for FORCE_DETECT to go off */
 		if (wait_for((I915_READ(PORT_HOTPLUG_EN) &
 			      CRT_HOTPLUG_FORCE_DETECT) == 0,
@@ -416,7 +419,11 @@  static bool intel_crt_detect_hotplug(struct drm_connector *connector)
 	I915_WRITE(PORT_HOTPLUG_STAT, CRT_HOTPLUG_INT_STATUS);
 
 	/* and put the bits back */
-	I915_WRITE(PORT_HOTPLUG_EN, orig);
+	spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
+	hotplug_en = I915_READ(PORT_HOTPLUG_EN);
+	hotplug_en &= ~CRT_HOTPLUG_FORCE_DETECT;
+	I915_WRITE(PORT_HOTPLUG_EN, hotplug_en);
+	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
 
 	return ret;
 }