diff mbox

[v2,3/3] drm/i915: optimise intel_runtime_pm_{get, put}

Message ID 20161118133647.4868-4-david.weinehall@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Weinehall Nov. 18, 2016, 1:36 p.m. UTC
Benchmarking shows that on resume we spend quite a bit of time
just taking and dropping these references, leaving us two options;
either rewriting the code not to take these references more than
once, which would be a rather invasive change since the involved
functions are used from other places, or to optimise
intel_runtime_pm_{get,put}().  This patch does the latter.
Initial benchmarking indicate improvements of a couple
of milliseconds on resume.

Original patch by Chris, with slight fixes by me.

v2: Fix missing return value (Patchwork)
    Remove extra atomic_dec() (Chris)

Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_runtime_pm.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Chris Wilson Nov. 18, 2016, 2 p.m. UTC | #1
On Fri, Nov 18, 2016 at 03:36:47PM +0200, David Weinehall wrote:
> Benchmarking shows that on resume we spend quite a bit of time
> just taking and dropping these references, leaving us two options;
> either rewriting the code not to take these references more than
> once, which would be a rather invasive change since the involved
> functions are used from other places, or to optimise
> intel_runtime_pm_{get,put}().  This patch does the latter.
> Initial benchmarking indicate improvements of a couple
> of milliseconds on resume.
> 
> Original patch by Chris, with slight fixes by me.
> 
> v2: Fix missing return value (Patchwork)
>     Remove extra atomic_dec() (Chris)
> 
> Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
> CC: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@linux.intel.com>

I'm happy with this. Not amused that it apparently saves quite a bit of
overhead with frequent pm_runtime calls.

Imre?
-Chris
Ville Syrjälä Nov. 18, 2016, 2:15 p.m. UTC | #2
On Fri, Nov 18, 2016 at 02:00:40PM +0000, Chris Wilson wrote:
> On Fri, Nov 18, 2016 at 03:36:47PM +0200, David Weinehall wrote:
> > Benchmarking shows that on resume we spend quite a bit of time
> > just taking and dropping these references, leaving us two options;
> > either rewriting the code not to take these references more than
> > once, which would be a rather invasive change since the involved
> > functions are used from other places, or to optimise
> > intel_runtime_pm_{get,put}().  This patch does the latter.
> > Initial benchmarking indicate improvements of a couple
> > of milliseconds on resume.
> > 
> > Original patch by Chris, with slight fixes by me.
> > 
> > v2: Fix missing return value (Patchwork)
> >     Remove extra atomic_dec() (Chris)
> > 
> > Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
> > CC: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@linux.intel.com>
> 
> I'm happy with this. Not amused that it apparently saves quite a bit of
> overhead with frequent pm_runtime calls.

We could eliminate some of those calls entirely by moving them from
intel_display_power_{get,put}() into the always on well enable/disable
hooks. But I'm not sure how much this overhead originates from the power
well code as opposed to some gem/etc. stuff.

> 
> Imre?
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Imre Deak Nov. 18, 2016, 3:41 p.m. UTC | #3
On pe, 2016-11-18 at 14:00 +0000, Chris Wilson wrote:
> On Fri, Nov 18, 2016 at 03:36:47PM +0200, David Weinehall wrote:
> > Benchmarking shows that on resume we spend quite a bit of time
> > just taking and dropping these references, leaving us two options;
> > either rewriting the code not to take these references more than
> > once, which would be a rather invasive change since the involved
> > functions are used from other places, or to optimise
> > intel_runtime_pm_{get,put}().  This patch does the latter.
> > Initial benchmarking indicate improvements of a couple
> > of milliseconds on resume.
> > 
> > Original patch by Chris, with slight fixes by me.
> > 
> > v2: Fix missing return value (Patchwork)
> >     Remove extra atomic_dec() (Chris)
> > 
> > Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
> > CC: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@linux.intel.com>
> 
> I'm happy with this. Not amused that it apparently saves quite a bit
> of
> overhead with frequent pm_runtime calls.
> 
> Imre?

I think the overhead is because the RPM core takes a lock and checks if
the device needs to be woken up even if the runtime_usage count is 0.
But for us the device is awake whenever wakeref_count > 0, so yes we
can optimize things.

> -Chris
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 356c662ad453..831fde343d10 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -2632,6 +2632,9 @@  void intel_runtime_pm_get(struct drm_i915_private *dev_priv)
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 	struct device *kdev = &pdev->dev;
 
+	if (atomic_inc_not_zero(&dev_priv->pm.wakeref_count))
+		return;
+
 	pm_runtime_get_sync(kdev);
 
 	atomic_inc(&dev_priv->pm.wakeref_count);
@@ -2653,6 +2656,9 @@  bool intel_runtime_pm_get_if_in_use(struct drm_i915_private *dev_priv)
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 	struct device *kdev = &pdev->dev;
 
+	if (atomic_inc_not_zero(&dev_priv->pm.wakeref_count))
+		return true;
+
 	if (IS_ENABLED(CONFIG_PM)) {
 		int ret = pm_runtime_get_if_in_use(kdev);
 
@@ -2695,6 +2701,9 @@  void intel_runtime_pm_get_noresume(struct drm_i915_private *dev_priv)
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 	struct device *kdev = &pdev->dev;
 
+	if (atomic_inc_not_zero(&dev_priv->pm.wakeref_count))
+		return;
+
 	assert_rpm_wakelock_held(dev_priv);
 	pm_runtime_get_noresume(kdev);
 
@@ -2714,8 +2723,10 @@  void intel_runtime_pm_put(struct drm_i915_private *dev_priv)
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 	struct device *kdev = &pdev->dev;
 
+	if (!atomic_dec_and_test(&dev_priv->pm.wakeref_count))
+		return;
+
 	assert_rpm_wakelock_held(dev_priv);
-	atomic_dec(&dev_priv->pm.wakeref_count);
 
 	pm_runtime_mark_last_busy(kdev);
 	pm_runtime_put_autosuspend(kdev);