diff mbox

drm/i915/vlv: Add VLV specific force wake routines.

Message ID 1384435935-13571-1-git-send-email-deepak.s@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

deepak.s@intel.com Nov. 14, 2013, 1:32 p.m. UTC
From: Deepak S <deepak.s@intel.com>

Added media/render/common well VLV force wake routines to help
bring up the WELLS before access the register
- Refactor current vlv_forcewake get/put and added MEDIA or
  RENDER specific Forcewake.
- Added VLV Check to bring up MEDIA and RENDER WELL base
  on the register accessed in vlv_read##x (in intel_uncore.c)

Signed-off-by: Deepak S <deepak.s@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   8 +-
 drivers/gpu/drm/i915/i915_drv.h         |  31 ++++-
 drivers/gpu/drm/i915/i915_gpu_error.c   |   2 +-
 drivers/gpu/drm/i915/i915_reg.h         |   4 +-
 drivers/gpu/drm/i915/intel_display.c    |   4 +-
 drivers/gpu/drm/i915/intel_pm.c         |  18 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  14 +-
 drivers/gpu/drm/i915/intel_uncore.c     | 223 +++++++++++++++++++++++++-------
 8 files changed, 237 insertions(+), 67 deletions(-)

Comments

Jesse Barnes Nov. 19, 2013, 6:05 p.m. UTC | #1
On Thu, 14 Nov 2013 19:02:15 +0530
deepak.s@intel.com wrote:

> From: Deepak S <deepak.s@intel.com>
> 
> Added media/render/common well VLV force wake routines to help
> bring up the WELLS before access the register
> - Refactor current vlv_forcewake get/put and added MEDIA or
>   RENDER specific Forcewake.
> - Added VLV Check to bring up MEDIA and RENDER WELL base
>   on the register accessed in vlv_read##x (in intel_uncore.c)

This patch is pretty big and so a bit hard to review.  A couple of
questions:
  - why not use the callback to __vlv_force_wake_* from
    gen6_gt_force_wake_*?  i.e. why is VLV special here?
  - having a new gen7_media_force_wake function may be better than
    passing an engine around, and would touch fewer pieces of code
  - have you done measurements on this?  given how infrequently we
    ought to be waking the wells when they're idle, and how long we
    generally keep them awake, is this a real power win?

Thanks,
Jesse
deepak.s@intel.com Nov. 20, 2013, 6 a.m. UTC | #2
Hi Jesse,

Thanks for the review. Below is my response. 

>  - why not use the callback to __vlv_force_wake_* from
    gen6_gt_force_wake_*?  i.e. why is VLV special here?
[Deepak]   Gen6 has a single power well whereas the  VLV is has spate wells. This was the reason  for the separate function

 > - having a new gen7_media_force_wake function may be better than
    passing an engine around, and would touch fewer pieces of code
[Deepak]  Even Gen7  is also as single Power Well. Having common function between gen7 and vlv might be difficult to individually handle the wells.

  >- have you done measurements on this?  given how infrequently we
    ought to be waking the wells when they're idle, and how long we
    generally keep them awake, is this a real power win?
[Deepak] By Individually controlling the wells we observed around 100mW - 200mW  saving in different scenarios (GL Beanchmark & Media playback).

Thanks,
Deepak

-----Original Message-----
From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org] 
Sent: Tuesday, November 19, 2013 11:36 PM
To: S, Deepak
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/vlv: Add VLV specific force wake routines.

On Thu, 14 Nov 2013 19:02:15 +0530
deepak.s@intel.com wrote:

> From: Deepak S <deepak.s@intel.com>
> 
> Added media/render/common well VLV force wake routines to help bring 
> up the WELLS before access the register
> - Refactor current vlv_forcewake get/put and added MEDIA or
>   RENDER specific Forcewake.
> - Added VLV Check to bring up MEDIA and RENDER WELL base
>   on the register accessed in vlv_read##x (in intel_uncore.c)

This patch is pretty big and so a bit hard to review.  A couple of
questions:
  - why not use the callback to __vlv_force_wake_* from
    gen6_gt_force_wake_*?  i.e. why is VLV special here?
  - having a new gen7_media_force_wake function may be better than
    passing an engine around, and would touch fewer pieces of code
  - have you done measurements on this?  given how infrequently we
    ought to be waking the wells when they're idle, and how long we
    generally keep them awake, is this a real power win?

Thanks,
Jesse
Jesse Barnes Nov. 20, 2013, 4:05 p.m. UTC | #3
On Wed, 20 Nov 2013 06:00:24 +0000
"S, Deepak" <deepak.s@intel.com> wrote:

> Hi Jesse,
> 
> Thanks for the review. Below is my response. 
> 
> >  - why not use the callback to __vlv_force_wake_* from
>     gen6_gt_force_wake_*?  i.e. why is VLV special here?
> [Deepak]   Gen6 has a single power well whereas the  VLV is has spate wells. This was the reason  for the separate function
> 
>  > - having a new gen7_media_force_wake function may be better than
>     passing an engine around, and would touch fewer pieces of code
> [Deepak]  Even Gen7  is also as single Power Well. Having common function between gen7 and vlv might be difficult to individually handle the wells.
> 
>   >- have you done measurements on this?  given how infrequently we
>     ought to be waking the wells when they're idle, and how long we
>     generally keep them awake, is this a real power win?
> [Deepak] By Individually controlling the wells we observed around 100mW - 200mW  saving in different scenarios (GL Beanchmark & Media playback).

wow, that's a significant savings.

Can you split the patch into one that adds the power well arg, and
another that adds the VLV support for the split?  That would make it
easier to review.

Thanks,
Daniel Vetter Nov. 20, 2013, 4:16 p.m. UTC | #4
On Wed, Nov 20, 2013 at 7:00 AM, S, Deepak <deepak.s@intel.com> wrote:
>>- have you done measurements on this?  given how infrequently we
>     ought to be waking the wells when they're idle, and how long we
>     generally keep them awake, is this a real power win?
> [Deepak] By Individually controlling the wells we observed around 100mW - 200mW  saving in different scenarios (GL Beanchmark & Media playback).

This kind of information is really important and should be part of the
commit message. This rule holds generally for performance/power tuning
work - the commit message should at least mention the order of
magnitude of the improvements seen and which workloads have been
tested.

If you're afraid to disclose confidential information (e.g. for power
savings) just make the language fuzzy enough, e.g. here "We've seen
power savings in the lower sub-1W range on workloads that only neeed
on of the power wells, e.g. glbenchmark, media playback."

Thanks, Daniel
deepak.s@intel.com Nov. 20, 2013, 4:33 p.m. UTC | #5
Thanks Jesse, I wil split the patches and send it for review.

Thanks
Deepak

-----Original Message-----
From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org] 
Sent: Wednesday, November 20, 2013 9:35 PM
To: S, Deepak
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/vlv: Add VLV specific force wake routines.

On Wed, 20 Nov 2013 06:00:24 +0000
"S, Deepak" <deepak.s@intel.com> wrote:

> Hi Jesse,
> 
> Thanks for the review. Below is my response. 
> 
> >  - why not use the callback to __vlv_force_wake_* from
>     gen6_gt_force_wake_*?  i.e. why is VLV special here?
> [Deepak]   Gen6 has a single power well whereas the  VLV is has spate wells. This was the reason  for the separate function
> 
>  > - having a new gen7_media_force_wake function may be better than
>     passing an engine around, and would touch fewer pieces of code 
> [Deepak]  Even Gen7  is also as single Power Well. Having common function between gen7 and vlv might be difficult to individually handle the wells.
> 
>   >- have you done measurements on this?  given how infrequently we
>     ought to be waking the wells when they're idle, and how long we
>     generally keep them awake, is this a real power win?
> [Deepak] By Individually controlling the wells we observed around 100mW - 200mW  saving in different scenarios (GL Beanchmark & Media playback).

wow, that's a significant savings.

Can you split the patch into one that adds the power well arg, and another that adds the VLV support for the split?  That would make it easier to review.

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center
Jesse Barnes Nov. 22, 2013, 9:12 p.m. UTC | #6
Also, you might re-run your power numbers with Chris's patch to drop
the force wake ref around the irq get/put.  That's the only one we
normally take at runtime, so getting rid of it should give you even
greater power savings in conjunction with the RC6 timeout update patch I
already pushed.

Thanks,
Jesse

On Wed, 20 Nov 2013 16:33:22 +0000
"S, Deepak" <deepak.s@intel.com> wrote:

> Thanks Jesse, I wil split the patches and send it for review.
> 
> Thanks
> Deepak
> 
> -----Original Message-----
> From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org] 
> Sent: Wednesday, November 20, 2013 9:35 PM
> To: S, Deepak
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/vlv: Add VLV specific force wake routines.
> 
> On Wed, 20 Nov 2013 06:00:24 +0000
> "S, Deepak" <deepak.s@intel.com> wrote:
> 
> > Hi Jesse,
> > 
> > Thanks for the review. Below is my response. 
> > 
> > >  - why not use the callback to __vlv_force_wake_* from
> >     gen6_gt_force_wake_*?  i.e. why is VLV special here?
> > [Deepak]   Gen6 has a single power well whereas the  VLV is has spate wells. This was the reason  for the separate function
> > 
> >  > - having a new gen7_media_force_wake function may be better than
> >     passing an engine around, and would touch fewer pieces of code 
> > [Deepak]  Even Gen7  is also as single Power Well. Having common function between gen7 and vlv might be difficult to individually handle the wells.
> > 
> >   >- have you done measurements on this?  given how infrequently we
> >     ought to be waking the wells when they're idle, and how long we
> >     generally keep them awake, is this a real power win?
> > [Deepak] By Individually controlling the wells we observed around 100mW - 200mW  saving in different scenarios (GL Beanchmark & Media playback).
> 
> wow, that's a significant savings.
> 
> Can you split the patch into one that adds the power well arg, and another that adds the VLV support for the split?  That would make it easier to review.
> 
> Thanks,
> --
> Jesse Barnes, Intel Open Source Technology Center
>
deepak.s@intel.com Nov. 23, 2013, 4:56 a.m. UTC | #7
Thanks Jesse.  We will re-run and compare the power numbers. 

-----Original Message-----
From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org] 
Sent: Saturday, November 23, 2013 2:43 AM
To: S, Deepak
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/vlv: Add VLV specific force wake routines.

Also, you might re-run your power numbers with Chris's patch to drop the force wake ref around the irq get/put.  That's the only one we normally take at runtime, so getting rid of it should give you even greater power savings in conjunction with the RC6 timeout update patch I already pushed.

Thanks,
Jesse

On Wed, 20 Nov 2013 16:33:22 +0000
"S, Deepak" <deepak.s@intel.com> wrote:

> Thanks Jesse, I wil split the patches and send it for review.
> 
> Thanks
> Deepak
> 
> -----Original Message-----
> From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org]
> Sent: Wednesday, November 20, 2013 9:35 PM
> To: S, Deepak
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/vlv: Add VLV specific force wake routines.
> 
> On Wed, 20 Nov 2013 06:00:24 +0000
> "S, Deepak" <deepak.s@intel.com> wrote:
> 
> > Hi Jesse,
> > 
> > Thanks for the review. Below is my response. 
> > 
> > >  - why not use the callback to __vlv_force_wake_* from
> >     gen6_gt_force_wake_*?  i.e. why is VLV special here?
> > [Deepak]   Gen6 has a single power well whereas the  VLV is has spate wells. This was the reason  for the separate function
> > 
> >  > - having a new gen7_media_force_wake function may be better than
> >     passing an engine around, and would touch fewer pieces of code 
> > [Deepak]  Even Gen7  is also as single Power Well. Having common function between gen7 and vlv might be difficult to individually handle the wells.
> > 
> >   >- have you done measurements on this?  given how infrequently we
> >     ought to be waking the wells when they're idle, and how long we
> >     generally keep them awake, is this a real power win?
> > [Deepak] By Individually controlling the wells we observed around 100mW - 200mW  saving in different scenarios (GL Beanchmark & Media playback).
> 
> wow, that's a significant savings.
> 
> Can you split the patch into one that adds the power well arg, and another that adds the VLV support for the split?  That would make it easier to review.
> 
> Thanks,
> --
> Jesse Barnes, Intel Open Source Technology Center
> 


--
Jesse Barnes, Intel Open Source Technology Center
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 506f8ef..0c1d6ce 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -901,7 +901,7 @@  static int i915_cur_delayinfo(struct seq_file *m, void *unused)
 		if (ret)
 			return ret;
 
-		gen6_gt_force_wake_get(dev_priv);
+		gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
 
 		reqf = I915_READ(GEN6_RPNSWREQ);
 		reqf &= ~GEN6_TURBO_DISABLE;
@@ -924,7 +924,7 @@  static int i915_cur_delayinfo(struct seq_file *m, void *unused)
 			cagf = (rpstat & GEN6_CAGF_MASK) >> GEN6_CAGF_SHIFT;
 		cagf *= GT_FREQUENCY_MULTIPLIER;
 
-		gen6_gt_force_wake_put(dev_priv);
+		gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 		mutex_unlock(&dev->struct_mutex);
 
 		seq_printf(m, "GT_PERF_STATUS: 0x%08x\n", gt_perf_status);
@@ -2898,7 +2898,7 @@  static int i915_forcewake_open(struct inode *inode, struct file *file)
 	if (INTEL_INFO(dev)->gen < 6)
 		return 0;
 
-	gen6_gt_force_wake_get(dev_priv);
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
 
 	return 0;
 }
@@ -2911,7 +2911,7 @@  static int i915_forcewake_release(struct inode *inode, struct file *file)
 	if (INTEL_INFO(dev)->gen < 6)
 		return 0;
 
-	gen6_gt_force_wake_put(dev_priv);
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6d2a9a1..e06e376 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -422,8 +422,10 @@  struct drm_i915_display_funcs {
 };
 
 struct intel_uncore_funcs {
-	void (*force_wake_get)(struct drm_i915_private *dev_priv);
-	void (*force_wake_put)(struct drm_i915_private *dev_priv);
+	void (*force_wake_get)(struct drm_i915_private *dev_priv,
+							int fw_engine);
+	void (*force_wake_put)(struct drm_i915_private *dev_priv,
+							int fw_engine);
 
 	uint8_t  (*mmio_readb)(struct drm_i915_private *dev_priv, off_t offset, bool trace);
 	uint16_t (*mmio_readw)(struct drm_i915_private *dev_priv, off_t offset, bool trace);
@@ -448,6 +450,8 @@  struct intel_uncore {
 	unsigned fifo_count;
 	unsigned forcewake_count;
 
+	unsigned fw_rendercount;
+	unsigned fw_mediacount;
 	struct delayed_work force_wake_work;
 };
 
@@ -2399,8 +2403,8 @@  extern void intel_display_print_error_state(struct drm_i915_error_state_buf *e,
  * must be set to prevent GT core from power down and stale values being
  * returned.
  */
-void gen6_gt_force_wake_get(struct drm_i915_private *dev_priv);
-void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv);
+void gen6_gt_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine);
+void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine);
 
 int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u8 mbox, u32 *val);
 int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u8 mbox, u32 val);
@@ -2429,6 +2433,25 @@  void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
 int vlv_gpu_freq(struct drm_i915_private *dev_priv, int val);
 int vlv_freq_opcode(struct drm_i915_private *dev_priv, int val);
 
+void vlv_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine);
+void vlv_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine);
+
+#define FORCEWAKE_VLV_RENDER_RANGE_OFFSET(reg) \
+	(((reg) >= 0x2000 && (reg) < 0x4000) ||\
+	((reg) >= 0x5000 && (reg) < 0x8000) ||\
+	((reg) >= 0xB000 && (reg) < 0x12000) ||\
+	((reg) >= 0x2E000 && (reg) < 0x30000))
+
+#define FORCEWAKE_VLV_MEDIA_RANGE_OFFSET(reg)\
+	(((reg) >= 0x12000 && (reg) < 0x14000) ||\
+	((reg) >= 0x22000 && (reg) < 0x24000) ||\
+	((reg) >= 0x30000 && (reg) < 0x40000))
+
+#define FORCEWAKE_RENDER		(1 << 0)
+#define FORCEWAKE_MEDIA		(1 << 1)
+#define FORCEWAKE_ALL		(FORCEWAKE_RENDER | FORCEWAKE_MEDIA)
+
+
 #define I915_READ8(reg)		dev_priv->uncore.funcs.mmio_readb(dev_priv, (reg), true)
 #define I915_WRITE8(reg, val)	dev_priv->uncore.funcs.mmio_writeb(dev_priv, (reg), (val), true)
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a8bb213..83f17d8 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -938,7 +938,7 @@  void i915_capture_error_state(struct drm_device *dev)
 		error->derrmr = I915_READ(DERRMR);
 
 	if (IS_VALLEYVIEW(dev))
-		error->forcewake = I915_READ(FORCEWAKE_VLV);
+		error->forcewake = I915_READ(FORCEWAKE_RENDER_VLV);
 	else if (INTEL_INFO(dev)->gen >= 7)
 		error->forcewake = I915_READ(FORCEWAKE_MT);
 	else if (INTEL_INFO(dev)->gen == 6)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index a8a5bcb..1f0b94a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4722,8 +4722,8 @@ 
 #define  EDP_LINK_TRAIN_VOL_EMP_MASK_IVB	(0x3f<<22)
 
 #define  FORCEWAKE				0xA18C
-#define  FORCEWAKE_VLV				0x1300b0
-#define  FORCEWAKE_ACK_VLV			0x1300b4
+#define  FORCEWAKE_RENDER_VLV				0x1300b0
+#define  FORCEWAKE_RENDER_ACK_VLV		0x1300b4
 #define  FORCEWAKE_MEDIA_VLV			0x1300b8
 #define  FORCEWAKE_ACK_MEDIA_VLV		0x1300bc
 #define  FORCEWAKE_ACK_HSW			0x130044
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 752d830..493cd2f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -6533,7 +6533,7 @@  static void hsw_restore_lcpll(struct drm_i915_private *dev_priv)
 
 	/* Make sure we're not on PC8 state before disabling PC8, otherwise
 	 * we'll hang the machine! */
-	dev_priv->uncore.funcs.force_wake_get(dev_priv);
+	dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
 
 	if (val & LCPLL_POWER_DOWN_ALLOW) {
 		val &= ~LCPLL_POWER_DOWN_ALLOW;
@@ -6567,7 +6567,7 @@  static void hsw_restore_lcpll(struct drm_i915_private *dev_priv)
 			DRM_ERROR("Switching back to LCPLL failed\n");
 	}
 
-	dev_priv->uncore.funcs.force_wake_put(dev_priv);
+	dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
 void hsw_enable_pc8_work(struct work_struct *__work)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 542b444..67eb4ec 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -191,7 +191,11 @@  static void sandybridge_blit_fbc_update(struct drm_device *dev)
 	u32 blt_ecoskpd;
 
 	/* Make sure blitter notifies FBC of writes */
-	gen6_gt_force_wake_get(dev_priv);
+
+	/* Blitter is part of Media powerwell on VLV. No impact of
+	 * his param in other platforms for now */
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_MEDIA);
+
 	blt_ecoskpd = I915_READ(GEN6_BLITTER_ECOSKPD);
 	blt_ecoskpd |= GEN6_BLITTER_FBC_NOTIFY <<
 		GEN6_BLITTER_LOCK_SHIFT;
@@ -202,7 +206,8 @@  static void sandybridge_blit_fbc_update(struct drm_device *dev)
 			 GEN6_BLITTER_LOCK_SHIFT);
 	I915_WRITE(GEN6_BLITTER_ECOSKPD, blt_ecoskpd);
 	POSTING_READ(GEN6_BLITTER_ECOSKPD);
-	gen6_gt_force_wake_put(dev_priv);
+
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_MEDIA);
 }
 
 static void ironlake_enable_fbc(struct drm_crtc *crtc, unsigned long interval)
@@ -3737,7 +3742,7 @@  static void gen6_enable_rps(struct drm_device *dev)
 		I915_WRITE(GTFIFODBG, gtfifodbg);
 	}
 
-	gen6_gt_force_wake_get(dev_priv);
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
 
 	rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
 	gt_perf_status = I915_READ(GEN6_GT_PERF_STATUS);
@@ -3829,7 +3834,7 @@  static void gen6_enable_rps(struct drm_device *dev)
 			DRM_ERROR("Couldn't fix incorrect rc6 voltage\n");
 	}
 
-	gen6_gt_force_wake_put(dev_priv);
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
 void gen6_update_ring_freq(struct drm_device *dev)
@@ -3988,7 +3993,8 @@  static void valleyview_enable_rps(struct drm_device *dev)
 
 	valleyview_setup_pctx(dev);
 
-	gen6_gt_force_wake_get(dev_priv);
+	/* If VLV, Forcewake all wells, else re-direct to regular path */
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
 
 	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
 	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
@@ -4060,7 +4066,7 @@  static void valleyview_enable_rps(struct drm_device *dev)
 
 	gen6_enable_rps_interrupts(dev);
 
-	gen6_gt_force_wake_put(dev_priv);
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
 void ironlake_teardown_rc6(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2dec134..2374c32 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -395,7 +395,7 @@  static int init_ring_common(struct intel_ring_buffer *ring)
 	int ret = 0;
 	u32 head;
 
-	gen6_gt_force_wake_get(dev_priv);
+	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
 
 	if (I915_NEED_GFX_HWS(dev))
 		intel_ring_setup_status_page(ring);
@@ -468,7 +468,7 @@  static int init_ring_common(struct intel_ring_buffer *ring)
 	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
 
 out:
-	gen6_gt_force_wake_put(dev_priv);
+	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 
 	return ret;
 }
@@ -991,7 +991,10 @@  gen6_ring_get_irq(struct intel_ring_buffer *ring)
 	/* It looks like we need to prevent the gt from suspending while waiting
 	 * for an notifiy irq, otherwise irqs seem to get lost on at least the
 	 * blt/bsd rings on ivb. */
-	gen6_gt_force_wake_get(dev_priv);
+	if (ring->id == RCS)
+		gen6_gt_force_wake_get(dev_priv, FORCEWAKE_RENDER);
+	else
+		gen6_gt_force_wake_get(dev_priv, FORCEWAKE_MEDIA);
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
 	if (ring->irq_refcount++ == 0) {
@@ -1025,7 +1028,10 @@  gen6_ring_put_irq(struct intel_ring_buffer *ring)
 	}
 	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
 
-	gen6_gt_force_wake_put(dev_priv);
+	if (ring->id == RCS)
+		gen6_gt_force_wake_put(dev_priv, FORCEWAKE_RENDER);
+	else
+		gen6_gt_force_wake_put(dev_priv, FORCEWAKE_MEDIA);
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index a881906..899b963 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -64,7 +64,8 @@  static void __gen6_gt_force_wake_reset(struct drm_i915_private *dev_priv)
 	__raw_posting_read(dev_priv, ECOBUS);
 }
 
-static void __gen6_gt_force_wake_get(struct drm_i915_private *dev_priv)
+static void __gen6_gt_force_wake_get(struct drm_i915_private *dev_priv,
+							int fw_engine)
 {
 	if (wait_for_atomic((__raw_i915_read32(dev_priv, FORCEWAKE_ACK) & 1) == 0,
 			    FORCEWAKE_ACK_TIMEOUT_MS))
@@ -89,7 +90,8 @@  static void __gen6_gt_force_wake_mt_reset(struct drm_i915_private *dev_priv)
 	__raw_posting_read(dev_priv, ECOBUS);
 }
 
-static void __gen6_gt_force_wake_mt_get(struct drm_i915_private *dev_priv)
+static void __gen6_gt_force_wake_mt_get(struct drm_i915_private *dev_priv,
+							int fw_engine)
 {
 	u32 forcewake_ack;
 
@@ -125,7 +127,8 @@  static void gen6_gt_check_fifodbg(struct drm_i915_private *dev_priv)
 		__raw_i915_write32(dev_priv, GTFIFODBG, GT_FIFO_CPU_ERROR_MASK);
 }
 
-static void __gen6_gt_force_wake_put(struct drm_i915_private *dev_priv)
+static void __gen6_gt_force_wake_put(struct drm_i915_private *dev_priv,
+							int fw_engine)
 {
 	__raw_i915_write32(dev_priv, FORCEWAKE, 0);
 	/* something from same cacheline, but !FORCEWAKE */
@@ -133,7 +136,8 @@  static void __gen6_gt_force_wake_put(struct drm_i915_private *dev_priv)
 	gen6_gt_check_fifodbg(dev_priv);
 }
 
-static void __gen6_gt_force_wake_mt_put(struct drm_i915_private *dev_priv)
+static void __gen6_gt_force_wake_mt_put(struct drm_i915_private *dev_priv,
+							int fw_engine)
 {
 	__raw_i915_write32(dev_priv, FORCEWAKE_MT,
 			   _MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
@@ -164,44 +168,123 @@  static int __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
 
 static void vlv_force_wake_reset(struct drm_i915_private *dev_priv)
 {
-	__raw_i915_write32(dev_priv, FORCEWAKE_VLV,
+	__raw_i915_write32(dev_priv, FORCEWAKE_RENDER_VLV,
 			   _MASKED_BIT_DISABLE(0xffff));
-	/* something from same cacheline, but !FORCEWAKE_VLV */
-	__raw_posting_read(dev_priv, FORCEWAKE_ACK_VLV);
+	/* something from same cacheline, but !FORCEWAKE_RENDER_VLV */
+	__raw_posting_read(dev_priv, FORCEWAKE_RENDER_ACK_VLV);
+
+	__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
+			   _MASKED_BIT_DISABLE(0xffff));
+	/* something from same cacheline, but !FORCEWAKE_MEDIA_VLV */
+	__raw_posting_read(dev_priv, FORCEWAKE_ACK_MEDIA_VLV);
+
 }
 
-static void vlv_force_wake_get(struct drm_i915_private *dev_priv)
+static void __vlv_force_wake_get(struct drm_i915_private *dev_priv,
+						int fw_engine)
 {
-	if (wait_for_atomic((__raw_i915_read32(dev_priv, FORCEWAKE_ACK_VLV) & FORCEWAKE_KERNEL) == 0,
-			    FORCEWAKE_ACK_TIMEOUT_MS))
-		DRM_ERROR("Timed out waiting for forcewake old ack to clear.\n");
+	/* Check for Render Engine */
+	if (FORCEWAKE_RENDER & fw_engine) {
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_RENDER_ACK_VLV) &
+						FORCEWAKE_KERNEL) == 0,
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: Render forcewake old ack to clear.\n");
 
-	__raw_i915_write32(dev_priv, FORCEWAKE_VLV,
-			   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
-	__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
-			   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
-
-	if (wait_for_atomic((__raw_i915_read32(dev_priv, FORCEWAKE_ACK_VLV) & FORCEWAKE_KERNEL),
-			    FORCEWAKE_ACK_TIMEOUT_MS))
-		DRM_ERROR("Timed out waiting for GT to ack forcewake request.\n");
+		__raw_i915_write32(dev_priv, FORCEWAKE_RENDER_VLV,
+				   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
 
-	if (wait_for_atomic((__raw_i915_read32(dev_priv, FORCEWAKE_ACK_MEDIA_VLV) &
-			     FORCEWAKE_KERNEL),
-			    FORCEWAKE_ACK_TIMEOUT_MS))
-		DRM_ERROR("Timed out waiting for media to ack forcewake request.\n");
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_RENDER_ACK_VLV) &
+						FORCEWAKE_KERNEL),
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: waiting for Render to ack.\n");
+	}
 
+	/* Check for Media Engine */
+	if (FORCEWAKE_MEDIA & fw_engine) {
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_MEDIA_VLV) &
+						FORCEWAKE_KERNEL) == 0,
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: Media forcewake old ack to clear.\n");
+
+		__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
+				   _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL));
+
+		if (wait_for_atomic((__raw_i915_read32(dev_priv,
+						FORCEWAKE_ACK_MEDIA_VLV) &
+						FORCEWAKE_KERNEL),
+					FORCEWAKE_ACK_TIMEOUT_MS))
+			DRM_ERROR("Timed out: waiting for media to ack.\n");
+	}
 	/* WaRsForcewakeWaitTC0:vlv */
 	__gen6_gt_wait_for_thread_c0(dev_priv);
+
 }
 
-static void vlv_force_wake_put(struct drm_i915_private *dev_priv)
+static void __vlv_force_wake_put(struct drm_i915_private *dev_priv,
+					int fw_engine)
 {
-	__raw_i915_write32(dev_priv, FORCEWAKE_VLV,
-			   _MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
-	__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
-			   _MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
+
+	/* Check for Render Engine */
+	if (FORCEWAKE_RENDER & fw_engine)
+		__raw_i915_write32(dev_priv, FORCEWAKE_RENDER_VLV,
+					_MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
+
+
+	/* Check for Media Engine */
+	if (FORCEWAKE_MEDIA & fw_engine)
+		__raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
+				_MASKED_BIT_DISABLE(FORCEWAKE_KERNEL));
+
 	/* The below doubles as a POSTING_READ */
 	gen6_gt_check_fifodbg(dev_priv);
+
+}
+
+void vlv_force_wake_get(struct drm_i915_private *dev_priv,
+						int fw_engine)
+{
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	if (FORCEWAKE_RENDER & fw_engine) {
+		if (dev_priv->uncore.fw_rendercount++ == 0)
+			dev_priv->uncore.funcs.force_wake_get(dev_priv,
+							FORCEWAKE_RENDER);
+	}
+	if (FORCEWAKE_MEDIA & fw_engine) {
+		if (dev_priv->uncore.fw_mediacount++ == 0)
+			dev_priv->uncore.funcs.force_wake_get(dev_priv,
+							FORCEWAKE_MEDIA);
+	}
+
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
+
+void vlv_force_wake_put(struct drm_i915_private *dev_priv,
+						int fw_engine)
+{
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+
+	if (FORCEWAKE_RENDER & fw_engine) {
+		WARN_ON(dev_priv->uncore.fw_rendercount == 0);
+		if (--dev_priv->uncore.fw_rendercount == 0)
+			dev_priv->uncore.funcs.force_wake_put(dev_priv,
+							FORCEWAKE_RENDER);
+	}
+
+	if (FORCEWAKE_MEDIA & fw_engine) {
+		WARN_ON(dev_priv->uncore.fw_mediacount == 0);
+		if (--dev_priv->uncore.fw_mediacount == 0)
+			dev_priv->uncore.funcs.force_wake_put(dev_priv,
+							FORCEWAKE_MEDIA);
+	}
+
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
 static void gen6_force_wake_work(struct work_struct *work)
@@ -212,7 +295,7 @@  static void gen6_force_wake_work(struct work_struct *work)
 
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 	if (--dev_priv->uncore.forcewake_count == 0)
-		dev_priv->uncore.funcs.force_wake_put(dev_priv);
+		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
@@ -278,26 +361,34 @@  void intel_uncore_sanitize(struct drm_device *dev)
  * be called at the beginning of the sequence followed by a call to
  * gen6_gt_force_wake_put() at the end of the sequence.
  */
-void gen6_gt_force_wake_get(struct drm_i915_private *dev_priv)
+void gen6_gt_force_wake_get(struct drm_i915_private *dev_priv, int fw_engine)
 {
 	unsigned long irqflags;
 
+	/* Redirect to VLV specific routine */
+	if (IS_VALLEYVIEW(dev_priv->dev))
+		return vlv_force_wake_get(dev_priv, fw_engine);
+
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 	if (dev_priv->uncore.forcewake_count++ == 0)
-		dev_priv->uncore.funcs.force_wake_get(dev_priv);
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_engine);
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
 /*
  * see gen6_gt_force_wake_get()
  */
-void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv)
+void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv, int fw_engine)
 {
 	unsigned long irqflags;
 
+	/* Redirect to VLV specific routine */
+	if (IS_VALLEYVIEW(dev_priv->dev))
+		return vlv_force_wake_put(dev_priv, fw_engine);
+
 	if (!dev_priv->uncore.funcs.force_wake_put)
 		return;
 
@@ -376,16 +467,51 @@  gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
 	REG_READ_HEADER(x); \
 	if (NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
 		if (dev_priv->uncore.forcewake_count == 0) \
-			dev_priv->uncore.funcs.force_wake_get(dev_priv); \
+			dev_priv->uncore.funcs.force_wake_get(dev_priv, \
+							FORCEWAKE_ALL); \
 		val = __raw_i915_read##x(dev_priv, reg); \
 		if (dev_priv->uncore.forcewake_count == 0) \
-			dev_priv->uncore.funcs.force_wake_put(dev_priv); \
+			dev_priv->uncore.funcs.force_wake_put(dev_priv, \
+							FORCEWAKE_ALL); \
 	} else { \
 		val = __raw_i915_read##x(dev_priv, reg); \
 	} \
 	REG_READ_FOOTER; \
 }
 
+#define __vlv_read(x) \
+static u##x \
+vlv_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
+	unsigned fwengine = 0; \
+	unsigned *fwcount = 0; \
+	REG_READ_HEADER(x); \
+	if (FORCEWAKE_VLV_RENDER_RANGE_OFFSET(reg)) {   \
+		fwengine = FORCEWAKE_RENDER;            \
+		fwcount = &dev_priv->uncore.fw_rendercount;    \
+	}                                               \
+	else if (FORCEWAKE_VLV_MEDIA_RANGE_OFFSET(reg)) {       \
+		fwengine = FORCEWAKE_MEDIA;             \
+		fwcount = &dev_priv->uncore.fw_mediacount;     \
+	}  \
+	if (fwengine != 0) {		\
+		if ((*fwcount)++ == 0) \
+			(dev_priv)->uncore.funcs.force_wake_get(dev_priv, \
+								fwengine); \
+		val = __raw_i915_read##x(dev_priv, reg); \
+		if (--(*fwcount) == 0) \
+			(dev_priv)->uncore.funcs.force_wake_put(dev_priv, \
+								fwengine); \
+	} else { \
+		val = __raw_i915_read##x(dev_priv, reg); \
+	} \
+	REG_READ_FOOTER; \
+}
+
+
+__vlv_read(8)
+__vlv_read(16)
+__vlv_read(32)
+__vlv_read(64)
 __gen6_read(8)
 __gen6_read(16)
 __gen6_read(32)
@@ -399,6 +525,7 @@  __gen4_read(16)
 __gen4_read(32)
 __gen4_read(64)
 
+#undef __vlv_read
 #undef __gen6_read
 #undef __gen5_read
 #undef __gen4_read
@@ -490,8 +617,8 @@  void intel_uncore_init(struct drm_device *dev)
 			  gen6_force_wake_work);
 
 	if (IS_VALLEYVIEW(dev)) {
-		dev_priv->uncore.funcs.force_wake_get = vlv_force_wake_get;
-		dev_priv->uncore.funcs.force_wake_put = vlv_force_wake_put;
+		dev_priv->uncore.funcs.force_wake_get = __vlv_force_wake_get;
+		dev_priv->uncore.funcs.force_wake_put = __vlv_force_wake_put;
 	} else if (IS_HASWELL(dev)) {
 		dev_priv->uncore.funcs.force_wake_get = __gen6_gt_force_wake_mt_get;
 		dev_priv->uncore.funcs.force_wake_put = __gen6_gt_force_wake_mt_put;
@@ -508,9 +635,9 @@  void intel_uncore_init(struct drm_device *dev)
 		 * forcewake being disabled.
 		 */
 		mutex_lock(&dev->struct_mutex);
-		__gen6_gt_force_wake_mt_get(dev_priv);
+		__gen6_gt_force_wake_mt_get(dev_priv, FORCEWAKE_ALL);
 		ecobus = __raw_i915_read32(dev_priv, ECOBUS);
-		__gen6_gt_force_wake_mt_put(dev_priv);
+		__gen6_gt_force_wake_mt_put(dev_priv, FORCEWAKE_ALL);
 		mutex_unlock(&dev->struct_mutex);
 
 		if (ecobus & FORCEWAKE_MT_ENABLE) {
@@ -547,10 +674,18 @@  void intel_uncore_init(struct drm_device *dev)
 			dev_priv->uncore.funcs.mmio_writel  = gen6_write32;
 			dev_priv->uncore.funcs.mmio_writeq  = gen6_write64;
 		}
-		dev_priv->uncore.funcs.mmio_readb  = gen6_read8;
-		dev_priv->uncore.funcs.mmio_readw  = gen6_read16;
-		dev_priv->uncore.funcs.mmio_readl  = gen6_read32;
-		dev_priv->uncore.funcs.mmio_readq  = gen6_read64;
+
+		if (IS_VALLEYVIEW(dev)) {
+			dev_priv->uncore.funcs.mmio_readb  = vlv_read8;
+			dev_priv->uncore.funcs.mmio_readw  = vlv_read16;
+			dev_priv->uncore.funcs.mmio_readl  = vlv_read32;
+			dev_priv->uncore.funcs.mmio_readq  = vlv_read64;
+		} else {
+			dev_priv->uncore.funcs.mmio_readb  = gen6_read8;
+			dev_priv->uncore.funcs.mmio_readw  = gen6_read16;
+			dev_priv->uncore.funcs.mmio_readl  = gen6_read32;
+			dev_priv->uncore.funcs.mmio_readq  = gen6_read64;
+		}
 		break;
 	case 5:
 		dev_priv->uncore.funcs.mmio_writeb  = gen5_write8;
@@ -753,9 +888,9 @@  static int gen6_do_reset(struct drm_device *dev)
 
 	/* If reset with a user forcewake, try to restore, otherwise turn it off */
 	if (dev_priv->uncore.forcewake_count)
-		dev_priv->uncore.funcs.force_wake_get(dev_priv);
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
 	else
-		dev_priv->uncore.funcs.force_wake_put(dev_priv);
+		dev_priv->uncore.funcs.force_wake_put(dev_priv, FORCEWAKE_ALL);
 
 	/* Restore fifo count */
 	dev_priv->uncore.fifo_count = __raw_i915_read32(dev_priv, GT_FIFO_FREE_ENTRIES);