diff mbox

[3/3] drm/i915: close rps work vs. rps disable races

Message ID 1315150502-12537-4-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter Sept. 4, 2011, 3:35 p.m. UTC
The rps disabling code wasn't properly cancelling outstanding work
items. Also add a comment that explains why we're not racing with
the work item, that could again unmask interrupts.

This also fixes a bug on module unload because nothing was properly
syncing up with that work item, possibly leading to it being run after
freeing dev_priv or removing the module code.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_display.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

Comments

Ben Widawsky Sept. 4, 2011, 5:23 p.m. UTC | #1
On Sun, Sep 04, 2011 at 05:35:02PM +0200, Daniel Vetter wrote:
> The rps disabling code wasn't properly cancelling outstanding work
> items. Also add a comment that explains why we're not racing with
> the work item, that could again unmask interrupts.
> 
> This also fixes a bug on module unload because nothing was properly
> syncing up with that work item, possibly leading to it being run after
> freeing dev_priv or removing the module code.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/intel_display.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 56a8554..ccd4600 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -7483,10 +7483,16 @@ void gen6_disable_rps(struct drm_device *dev)
>  
>  	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
>  	I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
> -	I915_WRITE(GEN6_PMIER, 0);
> +	/* Complete PM interrupt masking here doesn't race with the rps work
> +	 * item again unmasking PM interrupts because that is using PMIMR
> +	 * (logically sitting atop of PMINTRMSK) to mask interrupts. */
> +	cancel_work_sync(&dev_priv->rps_work);
>  
> +	/* Clear PMIMR and dev_priv->pm_iir in case outstanding work gets
> +	 * cancelled before having run. */

This comment is wrong, you are clearing PMIER, not IMR. IMR gets cleared
in enable() iirc

>  	spin_lock_irq(&dev_priv->rps_lock);
>  	dev_priv->pm_iir = 0;
> +	I915_WRITE(GEN6_PMIER, 0);
>  	spin_unlock_irq(&dev_priv->rps_lock);
>  
>  	I915_WRITE(GEN6_PMIIR, I915_READ(GEN6_PMIIR));
> -- 
> 1.7.6
> 

I'm not sure this actually fixes a problem. The existing code:
1. disables all interrupts (no more can occur).
2. sets pm_iir to 0 safe in rps lock
<workqueues can run at this point, but IMR has no effect with IER = 0>

I think you should do the cancel work sync somewhere in the code before
module unload (to be correct). I just don't think this fixes a race.

Ben
Daniel Vetter Sept. 4, 2011, 7:17 p.m. UTC | #2
On Sun, Sep 04, 2011 at 05:23:30PM +0000, Ben Widawsky wrote:
> On Sun, Sep 04, 2011 at 05:35:02PM +0200, Daniel Vetter wrote:
> > The rps disabling code wasn't properly cancelling outstanding work
> > items. Also add a comment that explains why we're not racing with
> > the work item, that could again unmask interrupts.
> > 
> > This also fixes a bug on module unload because nothing was properly
> > syncing up with that work item, possibly leading to it being run after
> > freeing dev_priv or removing the module code.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 56a8554..ccd4600 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -7483,10 +7483,16 @@ void gen6_disable_rps(struct drm_device *dev)
> >  
> >  	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
> >  	I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
> > -	I915_WRITE(GEN6_PMIER, 0);
> > +	/* Complete PM interrupt masking here doesn't race with the rps work
> > +	 * item again unmasking PM interrupts because that is using PMIMR
> > +	 * (logically sitting atop of PMINTRMSK) to mask interrupts. */
> > +	cancel_work_sync(&dev_priv->rps_work);
> >  
> > +	/* Clear PMIMR and dev_priv->pm_iir in case outstanding work gets
> > +	 * cancelled before having run. */
> 
> This comment is wrong, you are clearing PMIER, not IMR. IMR gets cleared
> in enable() iirc

Bleh, now it's my turn to completely screw it up ;-) Still, I think some
comment that we muck around with different registers than the work item
would be good. And clearing PMIMR under the lock just for paranoia so
that setting dev_priv->pm_iir doesn't leave a strange interrupt mask
lingering around

> >  	spin_lock_irq(&dev_priv->rps_lock);
> >  	dev_priv->pm_iir = 0;
> > +	I915_WRITE(GEN6_PMIER, 0);
> >  	spin_unlock_irq(&dev_priv->rps_lock);
> >  
> >  	I915_WRITE(GEN6_PMIIR, I915_READ(GEN6_PMIIR));
> I'm not sure this actually fixes a problem. The existing code:
> 1. disables all interrupts (no more can occur).
> 2. sets pm_iir to 0 safe in rps lock
> <workqueues can run at this point, but IMR has no effect with IER = 0>
> 
> I think you should do the cancel work sync somewhere in the code before
> module unload (to be correct). I just don't think this fixes a race.

Well, we definitely need the cancel_work_sync in the unload path
somewhere. I prefer it in the rps disable function. In the context of the
other patches my patch description is a bit lousy because I've really only
wanted to fix the unload race (and the PMIMR inconsistency) with this
patch.

-Daniel
Ben Widawsky Sept. 4, 2011, 7:50 p.m. UTC | #3
On Sun, Sep 04, 2011 at 09:17:24PM +0200, Daniel Vetter wrote:
> > >  	spin_lock_irq(&dev_priv->rps_lock);
> > >  	dev_priv->pm_iir = 0;
> > > +	I915_WRITE(GEN6_PMIER, 0);
> > >  	spin_unlock_irq(&dev_priv->rps_lock);
> > >  
> > >  	I915_WRITE(GEN6_PMIIR, I915_READ(GEN6_PMIIR));
> > I'm not sure this actually fixes a problem. The existing code:
> > 1. disables all interrupts (no more can occur).
> > 2. sets pm_iir to 0 safe in rps lock
> > <workqueues can run at this point, but IMR has no effect with IER = 0>
> > 
> > I think you should do the cancel work sync somewhere in the code before
> > module unload (to be correct). I just don't think this fixes a race.
> 
> Well, we definitely need the cancel_work_sync in the unload path
> somewhere. I prefer it in the rps disable function. In the context of the
> other patches my patch description is a bit lousy because I've really only
> wanted to fix the unload race (and the PMIMR inconsistency) with this
> patch.
> 
> -Daniel

I still fail to see the inconsitency. The value of PMR no longer matters once
IER is cleared. What the wq or irq handler does after this point should
be fine so long as everything is setup up properly at enable time. This
is a minor detail.

So I would rework some of your comments a bit, and I also think it's
about time we create a central place to cancel workqueue items. Mostly
because I think this patch is subject to a deadlock with struct_mutex
(you're holding it when you call gen6_disable_rps(), but you're doing
cancel_work_sync on a workqueue which attempts to acquire struct_mutex.

Ben
Daniel Vetter Sept. 4, 2011, 7:57 p.m. UTC | #4
On Sun, Sep 04, 2011 at 07:50:08PM +0000, Ben Widawsky wrote:
> So I would rework some of your comments a bit, and I also think it's
> about time we create a central place to cancel workqueue items. Mostly
> because I think this patch is subject to a deadlock with struct_mutex
> (you're holding it when you call gen6_disable_rps(), but you're doing
> cancel_work_sync on a workqueue which attempts to acquire struct_mutex.

Oh dang, you're correct. This will deadlock. And the PMIMR is really just
cosmetic. So putting the cancel_work_sync into intel_modeset_cleanup
sounds like the right approach. And hey, I've already put other work
cancelling stuff there in my last fury at fixing module unload, so it
can't be that bad ;-)

I'll try to write a better patch tomorrow.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 56a8554..ccd4600 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7483,10 +7483,16 @@  void gen6_disable_rps(struct drm_device *dev)
 
 	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
 	I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
-	I915_WRITE(GEN6_PMIER, 0);
+	/* Complete PM interrupt masking here doesn't race with the rps work
+	 * item again unmasking PM interrupts because that is using PMIMR
+	 * (logically sitting atop of PMINTRMSK) to mask interrupts. */
+	cancel_work_sync(&dev_priv->rps_work);
 
+	/* Clear PMIMR and dev_priv->pm_iir in case outstanding work gets
+	 * cancelled before having run. */
 	spin_lock_irq(&dev_priv->rps_lock);
 	dev_priv->pm_iir = 0;
+	I915_WRITE(GEN6_PMIER, 0);
 	spin_unlock_irq(&dev_priv->rps_lock);
 
 	I915_WRITE(GEN6_PMIIR, I915_READ(GEN6_PMIIR));