diff mbox

[v2] drm/i915/bdw: BDW Software Turbo

Message ID 1404954048-15265-1-git-send-email-daisy.sun@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daisy Sun July 10, 2014, 1 a.m. UTC
BDW supports GT C0 residency reporting in constant time unit. Driver calculates
GT utilization based on C0 residency and adjusts RP frequency up/down
accordingly.

Signed-off-by: Daisy Sun <daisy.sun@intel.com>
[torourke: rebased on latest and resolved conflict]
Signed-off-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      |  17 ++
 drivers/gpu/drm/i915/i915_irq.c      |  10 ++
 drivers/gpu/drm/i915/i915_reg.h      |   4 +
 drivers/gpu/drm/i915/intel_display.c |   2 +
 drivers/gpu/drm/i915/intel_drv.h     |   1 +
 drivers/gpu/drm/i915/intel_pm.c      | 148 ++++++++++++++---
 6 files changed, 162 insertions(+), 20 deletions(-)

Comments

Chris Wilson July 10, 2014, 8:32 a.m. UTC | #1
On Wed, Jul 09, 2014 at 06:00:48PM -0700, Daisy Sun wrote:
> BDW supports GT C0 residency reporting in constant time unit. Driver calculates
> GT utilization based on C0 residency and adjusts RP frequency up/down
> accordingly.

This explanation is a bit thin on the ground for why you want to run
permanently at a single GPU frequency. And the algorithm looks primitive
at best.
-Chris
Daisy Sun July 10, 2014, 6:42 p.m. UTC | #2
GT is not going to run at a single frequency all the time actually. It 
starts from a single frequency, and then will dynamically adjust 
according to the GT utilization, either go up or down.
 From this perspective, SW turbo function the same as the HW turbo.

For the algorithm, we did go over the design forum before implementation.
What kind of improvement is expected? Please let me know if any 
important case is not taken into account. Thanks.

- Daisy
On 7/10/2014 1:32 AM, Chris Wilson wrote:
> On Wed, Jul 09, 2014 at 06:00:48PM -0700, Daisy Sun wrote:
>> BDW supports GT C0 residency reporting in constant time unit. Driver calculates
>> GT utilization based on C0 residency and adjusts RP frequency up/down
>> accordingly.
> This explanation is a bit thin on the ground for why you want to run
> permanently at a single GPU frequency. And the algorithm looks primitive
> at best.
> -Chris
>
Chris Wilson July 10, 2014, 7:07 p.m. UTC | #3
On Thu, Jul 10, 2014 at 11:42:59AM -0700, Sun, Daisy wrote:
> 
> GT is not going to run at a single frequency all the time actually.
> It starts from a single frequency, and then will dynamically adjust
> according to the GT utilization, either go up or down.
> From this perspective, SW turbo function the same as the HW turbo.

Urm, read your code again.

> For the algorithm, we did go over the design forum before implementation.
> What kind of improvement is expected? Please let me know if any
> important case is not taken into account. Thanks.

You have no faststart or boost strategy, so typical desktop usage will
feel very laggy. For a large number of use cases you never change
freequency.
-Chris
Daisy Sun July 11, 2014, 2:39 a.m. UTC | #4
This Software turbo will mainly take place of the hardware driven 
interrupt part without touching the boost/idle strategy.
So gen6_rps_boost and gen6_rps_idle will still function for BDW.

I can revise the commit message to clarify.

On 7/10/2014 12:07 PM, Chris Wilson wrote:
> On Thu, Jul 10, 2014 at 11:42:59AM -0700, Sun, Daisy wrote:
>> GT is not going to run at a single frequency all the time actually.
>> It starts from a single frequency, and then will dynamically adjust
>> according to the GT utilization, either go up or down.
>>  From this perspective, SW turbo function the same as the HW turbo.
> Urm, read your code again.
>
>> For the algorithm, we did go over the design forum before implementation.
>> What kind of improvement is expected? Please let me know if any
>> important case is not taken into account. Thanks.
> You have no faststart or boost strategy, so typical desktop usage will
> feel very laggy. For a large number of use cases you never change
> freequency.
> -Chris
>
Chris Wilson July 11, 2014, 6:15 a.m. UTC | #5
On Thu, Jul 10, 2014 at 07:39:32PM -0700, Sun, Daisy wrote:
> 
> This Software turbo will mainly take place of the hardware driven
> interrupt part without touching the boost/idle strategy.
> So gen6_rps_boost and gen6_rps_idle will still function for BDW.

You still are not addressing that your function is either called at a
random time, and more often than not, never. You also disabled the
set_rps paths which would have disabled boost and idle.
-Chris
Daisy Sun July 14, 2014, 4:22 a.m. UTC | #6
1) The design is by no means to disable boost and idle strategy. They are every effective dealing with burst request, bring up the responsiveness. 

2) The patch did disable part of gen6_set_rps_thresholds(), but  gen6_set_rps is supposed to function the same as before. 
Would you point out the code which is suspicious to disable set_rps path? 

3) The function will be called when flip happened, this should cover most of the cases. One exception is background media process without any display output, it's relatively rare. 
Please let me know if you have concern on other cases, I will try to cover it definitely.


-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 
Sent: Thursday, July 10, 2014 11:16 PM
To: Sun, Daisy
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo

On Thu, Jul 10, 2014 at 07:39:32PM -0700, Sun, Daisy wrote:
> 
> This Software turbo will mainly take place of the hardware driven 
> interrupt part without touching the boost/idle strategy.
> So gen6_rps_boost and gen6_rps_idle will still function for BDW.

You still are not addressing that your function is either called at a random time, and more often than not, never. You also disabled the set_rps paths which would have disabled boost and idle.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
Daniel Vetter July 14, 2014, 6:59 a.m. UTC | #7
On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
> 3) The function will be called when flip happened, this should cover
> most of the cases. One exception is background media process without any
> display output, it's relatively rare.  Please let me know if you have
> concern on other cases, I will try to cover it definitely.

Traditional X never flips. And we kinda have to keep this working. Instead
of checking when flipping we need to check at regular time intervals I
guess, for as long as the gt is busy.
-Daniel
> 
> 
> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 
> Sent: Thursday, July 10, 2014 11:16 PM
> To: Sun, Daisy
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo
> 
> On Thu, Jul 10, 2014 at 07:39:32PM -0700, Sun, Daisy wrote:
> > 
> > This Software turbo will mainly take place of the hardware driven 
> > interrupt part without touching the boost/idle strategy.
> > So gen6_rps_boost and gen6_rps_idle will still function for BDW.
> 
> You still are not addressing that your function is either called at a random time, and more often than not, never. You also disabled the set_rps paths which would have disabled boost and idle.
> -Chris
> 
> --
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter July 14, 2014, 7:03 a.m. UTC | #8
On Mon, Jul 14, 2014 at 8:59 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
>> 3) The function will be called when flip happened, this should cover
>> most of the cases. One exception is background media process without any
>> display output, it's relatively rare.  Please let me know if you have
>> concern on other cases, I will try to cover it definitely.
>
> Traditional X never flips. And we kinda have to keep this working. Instead
> of checking when flipping we need to check at regular time intervals I
> guess, for as long as the gt is busy.

Oh and transcode servers are a real thing apparently. They also never
flip, and we actually care from a business pov ...
-Daniel
Daisy Sun July 15, 2014, 6:35 a.m. UTC | #9
Hi Daniel, Chris

The concern for traditional X and media server do make sense. I'll update the patch with RP_UP_EI_INTERRUPT as trigger instead of the page flip.
Thanks for the valuable input.

- Daisy

-----Original Message-----
From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
Sent: Monday, July 14, 2014 12:04 AM
To: Sun, Daisy
Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo

On Mon, Jul 14, 2014 at 8:59 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
>> 3) The function will be called when flip happened, this should cover 
>> most of the cases. One exception is background media process without 
>> any display output, it's relatively rare.  Please let me know if you 
>> have concern on other cases, I will try to cover it definitely.
>
> Traditional X never flips. And we kinda have to keep this working. 
> Instead of checking when flipping we need to check at regular time 
> intervals I guess, for as long as the gt is busy.

Oh and transcode servers are a real thing apparently. They also never flip, and we actually care from a business pov ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Jesse Barnes July 24, 2014, 8:28 p.m. UTC | #10
If that won't work, you could just use a timer, or tie into some other
event that happens when the GPU is busy (e.g. execbuf or retire) instead
of trying to tie into the display side of things.

Jesse

On Tue, 15 Jul 2014 06:35:20 +0000
"Sun, Daisy" <daisy.sun@intel.com> wrote:

> Hi Daniel, Chris
> 
> The concern for traditional X and media server do make sense. I'll update the patch with RP_UP_EI_INTERRUPT as trigger instead of the page flip.
> Thanks for the valuable input.
> 
> - Daisy
> 
> -----Original Message-----
> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
> Sent: Monday, July 14, 2014 12:04 AM
> To: Sun, Daisy
> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo
> 
> On Mon, Jul 14, 2014 at 8:59 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
> >> 3) The function will be called when flip happened, this should cover 
> >> most of the cases. One exception is background media process without 
> >> any display output, it's relatively rare.  Please let me know if you 
> >> have concern on other cases, I will try to cover it definitely.
> >
> > Traditional X never flips. And we kinda have to keep this working. 
> > Instead of checking when flipping we need to check at regular time 
> > intervals I guess, for as long as the gt is busy.
> 
> Oh and transcode servers are a real thing apparently. They also never flip, and we actually care from a business pov ...
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Daniel Vetter July 25, 2014, 7:22 a.m. UTC | #11
On Thu, Jul 24, 2014 at 01:28:21PM -0700, Jesse Barnes wrote:
> If that won't work, you could just use a timer, or tie into some other
> event that happens when the GPU is busy (e.g. execbuf or retire) instead
> of trying to tie into the display side of things.

Yes, tying into a normal timer is probably best. At least I get the
impression that we only need something regular. Of course once the gpu is
idle we need to stop rearming that timer and restart it upon first batch
when transitioning out of idle.
-Daniel

> 
> Jesse
> 
> On Tue, 15 Jul 2014 06:35:20 +0000
> "Sun, Daisy" <daisy.sun@intel.com> wrote:
> 
> > Hi Daniel, Chris
> > 
> > The concern for traditional X and media server do make sense. I'll update the patch with RP_UP_EI_INTERRUPT as trigger instead of the page flip.
> > Thanks for the valuable input.
> > 
> > - Daisy
> > 
> > -----Original Message-----
> > From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
> > Sent: Monday, July 14, 2014 12:04 AM
> > To: Sun, Daisy
> > Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo
> > 
> > On Mon, Jul 14, 2014 at 8:59 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
> > >> 3) The function will be called when flip happened, this should cover 
> > >> most of the cases. One exception is background media process without 
> > >> any display output, it's relatively rare.  Please let me know if you 
> > >> have concern on other cases, I will try to cover it definitely.
> > >
> > > Traditional X never flips. And we kinda have to keep this working. 
> > > Instead of checking when flipping we need to check at regular time 
> > > intervals I guess, for as long as the gt is busy.
> > 
> > Oh and transcode servers are a real thing apparently. They also never flip, and we actually care from a business pov ...
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> 
> 
> -- 
> Jesse Barnes, Intel Open Source Technology Center
Daisy Sun July 25, 2014, 3:51 p.m. UTC | #12
Yes, timer can be helpful. A revised proposal is that flip trigger + 
timer  to cover together. I'll come up with more details soon.

- Daisy

On 7/25/2014 12:22 AM, Daniel Vetter wrote:
> On Thu, Jul 24, 2014 at 01:28:21PM -0700, Jesse Barnes wrote:
>> If that won't work, you could just use a timer, or tie into some other
>> event that happens when the GPU is busy (e.g. execbuf or retire) instead
>> of trying to tie into the display side of things.
> Yes, tying into a normal timer is probably best. At least I get the
> impression that we only need something regular. Of course once the gpu is
> idle we need to stop rearming that timer and restart it upon first batch
> when transitioning out of idle.
> -Daniel
>
>> Jesse
>>
>> On Tue, 15 Jul 2014 06:35:20 +0000
>> "Sun, Daisy" <daisy.sun@intel.com> wrote:
>>
>>> Hi Daniel, Chris
>>>
>>> The concern for traditional X and media server do make sense. I'll update the patch with RP_UP_EI_INTERRUPT as trigger instead of the page flip.
>>> Thanks for the valuable input.
>>>
>>> - Daisy
>>>
>>> -----Original Message-----
>>> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
>>> Sent: Monday, July 14, 2014 12:04 AM
>>> To: Sun, Daisy
>>> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
>>> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo
>>>
>>> On Mon, Jul 14, 2014 at 8:59 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>> On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
>>>>> 3) The function will be called when flip happened, this should cover
>>>>> most of the cases. One exception is background media process without
>>>>> any display output, it's relatively rare.  Please let me know if you
>>>>> have concern on other cases, I will try to cover it definitely.
>>>> Traditional X never flips. And we kinda have to keep this working.
>>>> Instead of checking when flipping we need to check at regular time
>>>> intervals I guess, for as long as the gt is busy.
>>> Oh and transcode servers are a real thing apparently. They also never flip, and we actually care from a business pov ...
>>> -Daniel
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>
>> -- 
>> Jesse Barnes, Intel Open Source Technology Center
Daisy Sun July 26, 2014, 12:47 a.m. UTC | #13
we have reconsidered good suggestions and evaluated performance and complexity again.

Timer Constant callback would continuously wake up CPU and entire
package, results in lower CPU and package C-state and shorter battery life,
especially for standby time.

execbuf is a good one, and we had taken it into account too. execbuf
can happen much more frequent than flips. Synchronization and calculation
overhead were the main reasons that we tried to avoid using too much IA
resource to benefit GT.

Here's is a revised version of software turbo for BDW, please take a
look and see if there's any concern.

For software turbo, it can be tough to find out a perfect solution
, may need some trade-off.

Revised design:
GT busyness will still be calculated when page_flip comes in, then GT frequency
will be adjusted accordingly. This point stays the same as previous design.
For the cases no flip will happen(server or background task with no display activity)
which is a previous concern,  set GT frequency to RP0(no turbo algorithm interfered in this
case).

Implementation details:
1)  Driver start with RP0 as GT frequency.
2)  When the flip comes, do the regular software turbo busyness calculation. Also set a timer with 250ms;  
3)  If the flip keep coming in time, keep turbo algorithm, reset timer;
4)  When the timer is fired, set RP frequency to RP0 so that the background task will still be taken
care of(the RPS boost and idle need to be disabled in this situation).
5)If the flip comes again, go to 2).

To recap,
For most common cases, GT will run at a desired frequency as a
result of software turbo algorithm;
For background workloads or no flip environment, GT will be running at RP0 with
shorter execution time to extend RC6 and pkg C state residency as long as power
is concerned.

I'll start with the implementation if all concerns are ironed out.

- Daisy


On 7/25/2014 12:22 AM, Daniel Vetter wrote:
> On Thu, Jul 24, 2014 at 01:28:21PM -0700, Jesse Barnes wrote:
>> If that won't work, you could just use a timer, or tie into some other
>> event that happens when the GPU is busy (e.g. execbuf or retire) instead
>> of trying to tie into the display side of things.
> Yes, tying into a normal timer is probably best. At least I get the
> impression that we only need something regular. Of course once the gpu is
> idle we need to stop rearming that timer and restart it upon first batch
> when transitioning out of idle.
> -Daniel
>
>> Jesse
>>
>> On Tue, 15 Jul 2014 06:35:20 +0000
>> "Sun, Daisy" <daisy.sun@intel.com> wrote:
>>
>>> Hi Daniel, Chris
>>>
>>> The concern for traditional X and media server do make sense. I'll update the patch with RP_UP_EI_INTERRUPT as trigger instead of the page flip.
>>> Thanks for the valuable input.
>>>
>>> - Daisy
>>>
>>> -----Original Message-----
>>> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
>>> Sent: Monday, July 14, 2014 12:04 AM
>>> To: Sun, Daisy
>>> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org
>>> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/bdw: BDW Software Turbo
>>>
>>> On Mon, Jul 14, 2014 at 8:59 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>> On Mon, Jul 14, 2014 at 04:22:44AM +0000, Sun, Daisy wrote:
>>>>> 3) The function will be called when flip happened, this should cover
>>>>> most of the cases. One exception is background media process without
>>>>> any display output, it's relatively rare.  Please let me know if you
>>>>> have concern on other cases, I will try to cover it definitely.
>>>> Traditional X never flips. And we kinda have to keep this working.
>>>> Instead of checking when flipping we need to check at regular time
>>>> intervals I guess, for as long as the gt is busy.
>>> Oh and transcode servers are a real thing apparently. They also never flip, and we actually care from a business pov ...
>>> -Daniel
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>
>> -- 
>> Jesse Barnes, Intel Open Source Technology Center
Daniel Vetter July 28, 2014, 9:17 a.m. UTC | #14
On Fri, Jul 25, 2014 at 05:47:11PM -0700, Sun, Daisy wrote:
> we have reconsidered good suggestions and evaluated performance and complexity again.
> 
> Timer Constant callback would continuously wake up CPU and entire
> package, results in lower CPU and package C-state and shorter battery life,
> especially for standby time.

If you shut down the timer when idle this shouldn't be a concern at all.
See the idle infrastructure we have and e.g. the retire work handler. We
could even put the software turbo into the retire work handler ...

> execbuf is a good one, and we had taken it into account too. execbuf
> can happen much more frequent than flips. Synchronization and calculation
> overhead were the main reasons that we tried to avoid using too much IA
> resource to benefit GT.

Yeah, running it at each execbuf is going to be too expensive. But with
the regular timer there should be no need to also do this in flips - it
might badly interfere with the missed-flip boosting patches from Chris
even.

> Here's is a revised version of software turbo for BDW, please take a
> look and see if there's any concern.

Doesn't seem to be attached.

> For software turbo, it can be tough to find out a perfect solution
> , may need some trade-off.

Well, but we've made already quite some nice improvments with the boosting
logic. So I think it can be done, but I agree it's not easy.

> Revised design:
> GT busyness will still be calculated when page_flip comes in, then GT frequency
> will be adjusted accordingly. This point stays the same as previous design.
> For the cases no flip will happen(server or background task with no display activity)
> which is a previous concern,  set GT frequency to RP0(no turbo algorithm interfered in this
> case).
> 
> Implementation details:
> 1)  Driver start with RP0 as GT frequency.
> 2)  When the flip comes, do the regular software turbo busyness calculation.
> Also set a timer with 250ms;  3)  If the flip keep coming in time, keep
> turbo algorithm, reset timer;
> 4)  When the timer is fired, set RP frequency to RP0 so that the background task will still be taken
> care of(the RPS boost and idle need to be disabled in this situation).
> 5)If the flip comes again, go to 2).
> 
> To recap,
> For most common cases, GT will run at a desired frequency as a
> result of software turbo algorithm;
> For background workloads or no flip environment, GT will be running at RP0 with
> shorter execution time to extend RC6 and pkg C state residency as long as power
> is concerned.
> 
> I'll start with the implementation if all concerns are ironed out.

Ah, I expected a patch ;-) Usually code diffs are much more efficient
communication as a replacement for when you'd use a whiteboard session
with a colocated team. It doesn't need to run nor even compile, just
speudo-code illustrating the main integration points.

Anyway, see above for my comments: No need to integrate with flips if we
do the timer thing correctly.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 272aa7a..eef8366 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -820,6 +820,19 @@  struct i915_suspend_saved_registers {
 	u32 savePCH_PORT_HOTPLUG;
 };
 
+struct intel_rps_bdw_cal {
+	u32 it_threshold_pct; /* interrupt, in percentage */
+	u32 eval_interval; /* evaluation interval, in us */
+	u32 last_ts;
+	u32 last_c0;
+	bool is_up;
+};
+
+struct intel_rps_bdw_turbo {
+	struct intel_rps_bdw_cal up;
+	struct intel_rps_bdw_cal down;
+};
+
 struct intel_gen6_power_mgmt {
 	/* work and pm_iir are protected by dev_priv->irq_lock */
 	struct work_struct work;
@@ -850,6 +863,9 @@  struct intel_gen6_power_mgmt {
 	bool enabled;
 	struct delayed_work delayed_resume_work;
 
+	bool is_bdw_sw_turbo;	/* Switch of BDW software turbo */
+	struct intel_rps_bdw_turbo sw_turbo; /* Calculate RP interrupt timing */
+
 	/*
 	 * Protects RPS/RC6 register access and PCU communication.
 	 * Must be taken after struct_mutex if nested.
@@ -2509,6 +2525,7 @@  extern void intel_disable_fbc(struct drm_device *dev);
 extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
 extern void intel_init_pch_refclk(struct drm_device *dev);
 extern void gen6_set_rps(struct drm_device *dev, u8 val);
+extern void bdw_software_turbo(struct drm_device *dev);
 extern void valleyview_set_rps(struct drm_device *dev, u8 val);
 extern int valleyview_rps_max_freq(struct drm_i915_private *dev_priv);
 extern int valleyview_rps_min_freq(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2b3d852..e077269 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1558,6 +1558,16 @@  static void i9xx_pipe_crc_irq_handler(struct drm_device *dev, enum pipe pipe)
 				     res1, res2);
 }
 
+void gen8_flip_interrupt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (!dev_priv->rps.is_bdw_sw_turbo)
+		return;
+
+	bdw_software_turbo(dev);
+}
+
 /* The RPS events need forcewake, so we add them to a work queue and mask their
  * IMR bits until the work is done. Other interrupts can be processed without
  * the work queue. */
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0b88508..ec08cd9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5041,6 +5041,10 @@  enum punit_power_well {
 #define GEN8_UCGCTL6				0x9430
 #define   GEN8_SDEUNIT_CLOCK_GATE_DISABLE	(1<<14)
 
+#define TIMESTAMP_CTR		0x44070
+#define FREQ_1_28_US(us)	(((us) * 100) >> 7)
+#define MCHBAR_PCU_C0		(MCHBAR_MIRROR_BASE_SNB + 0x5960)
+
 #define GEN6_RPNSWREQ				0xA008
 #define   GEN6_TURBO_DISABLE			(1<<31)
 #define   GEN6_FREQUENCY(x)			((x)<<25)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b57210c..8e39ea7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8893,6 +8893,8 @@  static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	unsigned long flags;
 	int ret;
 
+	gen8_flip_interrupt(dev);
+
 	/* Can't change pixel format via MI display flips. */
 	if (fb->pixel_format != crtc->primary->fb->pixel_format)
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index b885df1..15010d3 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -929,6 +929,7 @@  void ironlake_teardown_rc6(struct drm_device *dev);
 void gen6_update_ring_freq(struct drm_device *dev);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
 void gen6_rps_boost(struct drm_i915_private *dev_priv);
+void gen8_flip_interrupt(struct drm_device *dev);
 void intel_aux_display_runtime_get(struct drm_i915_private *dev_priv);
 void intel_aux_display_runtime_put(struct drm_i915_private *dev_priv);
 void intel_runtime_pm_get(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 75c1c76..9327cd7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -2963,6 +2963,9 @@  static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 {
 	int new_power;
 
+	if (dev_priv->rps.is_bdw_sw_turbo)
+		return;
+
 	new_power = dev_priv->rps.power;
 	switch (dev_priv->rps.power) {
 	case LOW_POWER:
@@ -3308,12 +3311,87 @@  static void parse_rp_state_cap(struct drm_i915_private *dev_priv, u32 rp_state_c
 		dev_priv->rps.min_freq_softlimit = dev_priv->rps.min_freq;
 }
 
+static void bdw_sw_calculate_freq(struct drm_device *dev,
+		struct intel_rps_bdw_cal *c, u32 *cur_time, u32 *c0)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	u64 busy = 0;
+	u32 busyness_pct = 0;
+	u32 elapsed_time = 0;
+	u16 new_freq = 0;
+
+	if (!c || !cur_time || !c0)
+		return;
+
+	if (0 == c->last_c0)
+		goto out;
+
+	/* Check Evaluation interval */
+	elapsed_time = *cur_time - c->last_ts;
+	if (elapsed_time < c->eval_interval)
+		return;
+
+	mutex_lock(&dev_priv->rps.hw_lock);
+
+	/*
+	 * c0 unit in 32*1.28 usec, elapsed_time unit in 1 usec.
+	 * Whole busyness_pct calculation should be
+	 *     busy = ((u64)(*c0 - c->last_c0) << 5 << 7) / 100;
+	 *     busyness_pct = (u32)(busy * 100 / elapsed_time);
+	 * The final formula is to simplify CPU calculation
+	 */
+	busy = (u64)(*c0 - c->last_c0) << 12;
+	do_div(busy, elapsed_time);
+	busyness_pct = (u32)busy;
+
+	if (c->is_up && busyness_pct >= c->it_threshold_pct)
+		new_freq = (u16)dev_priv->rps.cur_freq + 3;
+	if (!c->is_up && busyness_pct <= c->it_threshold_pct)
+		new_freq = (u16)dev_priv->rps.cur_freq - 1;
+
+	/* Adjust to new frequency busyness and compare with threshold */
+	if (0 != new_freq) {
+		if (new_freq > dev_priv->rps.max_freq_softlimit)
+			new_freq = dev_priv->rps.max_freq_softlimit;
+		else if (new_freq < dev_priv->rps.min_freq_softlimit)
+			new_freq = dev_priv->rps.min_freq_softlimit;
+
+		gen6_set_rps(dev, new_freq);
+	}
+
+	mutex_unlock(&dev_priv->rps.hw_lock);
+
+out:
+	c->last_c0 = *c0;
+	c->last_ts = *cur_time;
+}
+
+void bdw_software_turbo(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	u32 current_time = I915_READ(TIMESTAMP_CTR); /* unit in usec */
+	u32 current_c0 = I915_READ(MCHBAR_PCU_C0); /* unit in 32*1.28 usec */
+
+	bdw_sw_calculate_freq(dev, &dev_priv->rps.sw_turbo.up,
+			&current_time, &current_c0);
+	bdw_sw_calculate_freq(dev, &dev_priv->rps.sw_turbo.down,
+			&current_time, &current_c0);
+}
+
+
 static void gen8_enable_rps(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
 	uint32_t rc6_mask = 0, rp_state_cap;
+	uint32_t threshold_up_pct, threshold_down_pct;
+	uint32_t ei_up, ei_down; /* up and down evaluation interval */
+	u32 rp_ctl_flag;
 	int unused;
+	
+	/* Use software Turbo for BDW */
+	dev_priv->rps.is_bdw_sw_turbo = IS_BROADWELL(dev);
 
 	/* 1a: Software RC state - RC0 */
 	I915_WRITE(GEN6_RC_STATE, 0);
@@ -3350,35 +3428,62 @@  static void gen8_enable_rps(struct drm_device *dev)
 		   HSW_FREQUENCY(dev_priv->rps.rp1_freq));
 	I915_WRITE(GEN6_RC_VIDEO_FREQ,
 		   HSW_FREQUENCY(dev_priv->rps.rp1_freq));
-	/* NB: Docs say 1s, and 1000000 - which aren't equivalent */
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 100000000 / 128); /* 1 second timeout */
+	ei_up = 84480; /* 84.48ms */
+	ei_down = 448000;
+	threshold_up_pct = 90; /* x percent busy */
+	threshold_down_pct = 70;
+
+	if (dev_priv->rps.is_bdw_sw_turbo) {
+		dev_priv->rps.sw_turbo.up.it_threshold_pct = threshold_up_pct;
+		dev_priv->rps.sw_turbo.up.eval_interval = ei_up;
+		dev_priv->rps.sw_turbo.up.is_up = true;
+		dev_priv->rps.sw_turbo.up.last_ts = 0;
+		dev_priv->rps.sw_turbo.up.last_c0 = 0;
+
+		dev_priv->rps.sw_turbo.down.it_threshold_pct = threshold_down_pct;
+		dev_priv->rps.sw_turbo.down.eval_interval = ei_down;
+		dev_priv->rps.sw_turbo.down.is_up = false;
+		dev_priv->rps.sw_turbo.down.last_ts = 0;
+		dev_priv->rps.sw_turbo.down.last_c0 = 0;
+	} else {
+		/* NB: Docs say 1s, and 1000000 - which aren't equivalent
+		 * 1 second timeout*/
+		I915_WRITE(GEN6_RP_DOWN_TIMEOUT, FREQ_1_28_US(1000000));
 
-	/* Docs recommend 900MHz, and 300 MHz respectively */
-	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
-		   dev_priv->rps.max_freq_softlimit << 24 |
-		   dev_priv->rps.min_freq_softlimit << 16);
+		/* Docs recommend 900MHz, and 300 MHz respectively */
+		I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
+			   dev_priv->rps.max_freq_softlimit << 24 |
+			   dev_priv->rps.min_freq_softlimit << 16);
 
-	I915_WRITE(GEN6_RP_UP_THRESHOLD, 7600000 / 128); /* 76ms busyness per EI, 90% */
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 31300000 / 128); /* 313ms busyness per EI, 70%*/
-	I915_WRITE(GEN6_RP_UP_EI, 66000); /* 84.48ms, XXX: random? */
-	I915_WRITE(GEN6_RP_DOWN_EI, 350000); /* 448ms, XXX: random? */
+		I915_WRITE(GEN6_RP_UP_THRESHOLD,
+			FREQ_1_28_US(ei_up * threshold_up_pct / 100));
+		I915_WRITE(GEN6_RP_DOWN_THRESHOLD,
+			FREQ_1_28_US(ei_down * threshold_down_pct / 100));
+		I915_WRITE(GEN6_RP_UP_EI,
+			FREQ_1_28_US(ei_up));
+		I915_WRITE(GEN6_RP_DOWN_EI,
+			FREQ_1_28_US(ei_down));
 
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+		I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+	}
 
 	/* 5: Enable RPS */
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_TURBO |
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   GEN6_RP_DOWN_IDLE_AVG);
+	rp_ctl_flag = GEN6_RP_MEDIA_TURBO |
+					GEN6_RP_MEDIA_HW_NORMAL_MODE |
+					GEN6_RP_MEDIA_IS_GFX |
+					GEN6_RP_UP_BUSY_AVG |
+					GEN6_RP_DOWN_IDLE_AVG;
+	if (!dev_priv->rps.is_bdw_sw_turbo)
+		rp_ctl_flag |= GEN6_RP_ENABLE;
+
+	I915_WRITE(GEN6_RP_CONTROL, rp_ctl_flag);
 
 	/* 6: Ring frequency + overclocking (our driver does this later */
 
 	gen6_set_rps(dev, (I915_READ(GEN6_GT_PERF_STATUS) & 0xff00) >> 8);
-
-	gen6_enable_rps_interrupts(dev);
+	
+	if (!dev_priv->rps.is_bdw_sw_turbo)
+		gen6_enable_rps_interrupts(dev);
 
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -4529,8 +4634,11 @@  static void intel_gen6_powersave_work(struct work_struct *work)
 		container_of(work, struct drm_i915_private,
 			     rps.delayed_resume_work.work);
 	struct drm_device *dev = dev_priv->dev;
+	
+	dev_priv->rps.is_bdw_sw_turbo = false;
 
 	mutex_lock(&dev_priv->rps.hw_lock);
+	
 
 	if (IS_VALLEYVIEW(dev)) {
 		valleyview_enable_rps(dev);