diff mbox

[RFC,3/3] drm/i915: Make RPS enable immediate

Message ID 1464012583-22480-3-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä May 23, 2016, 2:09 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

On SNB (at least) it's dangeruopus to hang the GPU with an infinite
batch buffer loop while RPS is disabled. The only thing that keeps
the system going in such circumstances are the internal RPS timers,
so we should never feed the GPU with RPS disabled unless we want to
risk a total system hang.

Previously we didn't fully disable RPS, but that changes in
commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
so we probably didn't see the problem so often previously. But
even before that we were at the mercy of the BIOS for the initial
RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
upon boot could have been fatal.

To avoid the problems let's just make the RPS enable immediate.
This renders the delayed_resume_work useless actually. We could perhaps
just move the ring freq table initialization to the work and do the
other stull synchronously?

Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
This is more and RFC at this point. Perhaps we might want to keep the delayed
work but just for the ring freq table update (which is the main reson this whole
thing exists in the first place). Another factor is that wait_for() is not
exactly optiomal currently, so it makes the operation slower than it needs to
be. Would need some hard numbers to see if it's worth keeping the delayed work
or not I suppose.

 drivers/gpu/drm/i915/intel_pm.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Chris Wilson May 23, 2016, 2:33 p.m. UTC | #1
On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> batch buffer loop while RPS is disabled. The only thing that keeps
> the system going in such circumstances are the internal RPS timers,
> so we should never feed the GPU with RPS disabled unless we want to
> risk a total system hang.
> 
> Previously we didn't fully disable RPS, but that changes in
> commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> so we probably didn't see the problem so often previously. But
> even before that we were at the mercy of the BIOS for the initial
> RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> upon boot could have been fatal.
> 
> To avoid the problems let's just make the RPS enable immediate.
> This renders the delayed_resume_work useless actually. We could perhaps
> just move the ring freq table initialization to the work and do the
> other stull synchronously?
> 
> Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> This is more and RFC at this point. Perhaps we might want to keep the delayed
> work but just for the ring freq table update (which is the main reson this whole
> thing exists in the first place). Another factor is that wait_for() is not
> exactly optiomal currently, so it makes the operation slower than it needs to
> be. Would need some hard numbers to see if it's worth keeping the delayed work
> or not I suppose.

I am in favour - the entire chain should be async, not just this little
step to setup the ring freq.
-Chris
Daniel Vetter May 24, 2016, 8:27 a.m. UTC | #2
On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> batch buffer loop while RPS is disabled. The only thing that keeps
> the system going in such circumstances are the internal RPS timers,
> so we should never feed the GPU with RPS disabled unless we want to
> risk a total system hang.
> 
> Previously we didn't fully disable RPS, but that changes in
> commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> so we probably didn't see the problem so often previously. But
> even before that we were at the mercy of the BIOS for the initial
> RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> upon boot could have been fatal.
> 
> To avoid the problems let's just make the RPS enable immediate.
> This renders the delayed_resume_work useless actually. We could perhaps
> just move the ring freq table initialization to the work and do the
> other stull synchronously?
> 
> Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> This is more and RFC at this point. Perhaps we might want to keep the delayed
> work but just for the ring freq table update (which is the main reson this whole
> thing exists in the first place). Another factor is that wait_for() is not
> exactly optiomal currently, so it makes the operation slower than it needs to
> be. Would need some hard numbers to see if it's worth keeping the delayed work
> or not I suppose.

Loading the ring freq tables takes forever, so definitely want to keep
that. I'd just split rps setup in two parts and push the schedule_work
down a bit. Long term we can go overboard with async (and maybey only
stall for all the GT setup in the very first batchbuffer).
-Daniel

> 
>  drivers/gpu/drm/i915/intel_pm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 817c84c22782..e65c5c2b9b4e 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6505,6 +6505,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
>  		if (schedule_delayed_work(&dev_priv->rps.delayed_resume_work,
>  					   round_jiffies_up_relative(HZ)))
>  			intel_runtime_pm_get_noresume(dev_priv);
> +		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
>  	}
>  }
>  
> -- 
> 2.7.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä May 24, 2016, 10:14 a.m. UTC | #3
On Tue, May 24, 2016 at 10:27:19AM +0200, Daniel Vetter wrote:
> On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> > batch buffer loop while RPS is disabled. The only thing that keeps
> > the system going in such circumstances are the internal RPS timers,
> > so we should never feed the GPU with RPS disabled unless we want to
> > risk a total system hang.
> > 
> > Previously we didn't fully disable RPS, but that changes in
> > commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > so we probably didn't see the problem so often previously. But
> > even before that we were at the mercy of the BIOS for the initial
> > RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> > upon boot could have been fatal.
> > 
> > To avoid the problems let's just make the RPS enable immediate.
> > This renders the delayed_resume_work useless actually. We could perhaps
> > just move the ring freq table initialization to the work and do the
> > other stull synchronously?
> > 
> > Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> > This is more and RFC at this point. Perhaps we might want to keep the delayed
> > work but just for the ring freq table update (which is the main reson this whole
> > thing exists in the first place). Another factor is that wait_for() is not
> > exactly optiomal currently, so it makes the operation slower than it needs to
> > be. Would need some hard numbers to see if it's worth keeping the delayed work
> > or not I suppose.
> 
> Loading the ring freq tables takes forever, so definitely want to keep
> that.

It only takes forever becasue wait_for() sucks.

with current sleep duration of 1000-2000 us:
[  308.231364] rps init took 5533 us
[  308.266322] ring freq init took 34952 us

sleep duration reduced to 100-200 us:
[  155.367588] rps init took 679 us
[  155.371100] ring freq init took 3509 us

So looks like someone just failed to root cause the slowness, and then
went on to optimize the wrong thing.

> I'd just split rps setup in two parts and push the schedule_work
> down a bit. Long term we can go overboard with async (and maybey only
> stall for all the GT setup in the very first batchbuffer).
> -Daniel
> 
> > 
> >  drivers/gpu/drm/i915/intel_pm.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 817c84c22782..e65c5c2b9b4e 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -6505,6 +6505,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
> >  		if (schedule_delayed_work(&dev_priv->rps.delayed_resume_work,
> >  					   round_jiffies_up_relative(HZ)))
> >  			intel_runtime_pm_get_noresume(dev_priv);
> > +		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
> >  	}
> >  }
> >  
> > -- 
> > 2.7.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Chris Wilson May 24, 2016, 10:43 a.m. UTC | #4
On Tue, May 24, 2016 at 01:14:53PM +0300, Ville Syrjälä wrote:
> On Tue, May 24, 2016 at 10:27:19AM +0200, Daniel Vetter wrote:
> > On Mon, May 23, 2016 at 05:09:43PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > On SNB (at least) it's dangeruopus to hang the GPU with an infinite
> > > batch buffer loop while RPS is disabled. The only thing that keeps
> > > the system going in such circumstances are the internal RPS timers,
> > > so we should never feed the GPU with RPS disabled unless we want to
> > > risk a total system hang.
> > > 
> > > Previously we didn't fully disable RPS, but that changes in
> > > commit 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > > so we probably didn't see the problem so often previously. But
> > > even before that we were at the mercy of the BIOS for the initial
> > > RPS state, so if the BIOS didn't enable RPS a GPU hang immediately
> > > upon boot could have been fatal.
> > > 
> > > To avoid the problems let's just make the RPS enable immediate.
> > > This renders the delayed_resume_work useless actually. We could perhaps
> > > just move the ring freq table initialization to the work and do the
> > > other stull synchronously?
> > > 
> > > Fixes: 2030d684f7b3 ("drm/i915/bxt: Explicitly clear the Turbo control register")
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > ---
> > > This is more and RFC at this point. Perhaps we might want to keep the delayed
> > > work but just for the ring freq table update (which is the main reson this whole
> > > thing exists in the first place). Another factor is that wait_for() is not
> > > exactly optiomal currently, so it makes the operation slower than it needs to
> > > be. Would need some hard numbers to see if it's worth keeping the delayed work
> > > or not I suppose.
> > 
> > Loading the ring freq tables takes forever, so definitely want to keep
> > that.
> 
> It only takes forever becasue wait_for() sucks.
> 
> with current sleep duration of 1000-2000 us:
> [  308.231364] rps init took 5533 us
> [  308.266322] ring freq init took 34952 us
> 
> sleep duration reduced to 100-200 us:
> [  155.367588] rps init took 679 us
> [  155.371100] ring freq init took 3509 us
> 
> So looks like someone just failed to root cause the slowness, and then
> went on to optimize the wrong thing.

It's still 4ms that can be done in parallel to userspace starting :)
(And looks can be reduced further with an smarter wait_for).

So we encourage Mika to send his updated patches...
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 817c84c22782..e65c5c2b9b4e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6505,6 +6505,7 @@  void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
 		if (schedule_delayed_work(&dev_priv->rps.delayed_resume_work,
 					   round_jiffies_up_relative(HZ)))
 			intel_runtime_pm_get_noresume(dev_priv);
+		flush_delayed_work(&dev_priv->rps.delayed_resume_work);
 	}
 }