From patchwork Thu May 21 20:01:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 6459121 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 029FE9F399 for ; Thu, 21 May 2015 20:02:06 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B9E5220534 for ; Thu, 21 May 2015 20:02:04 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 82F8D20513 for ; Thu, 21 May 2015 20:02:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 17DFF6E9CC; Thu, 21 May 2015 13:02:03 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from relay.fireflyinternet.com (hostedrelay.fireflyinternet.com [109.228.30.76]) by gabe.freedesktop.org (Postfix) with ESMTP id DFDE08924A for ; Thu, 21 May 2015 13:01:57 -0700 (PDT) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by relay.fireflyinternet.com (FireflyRelay1) with ESMTP id 661757-1305619 for multiple; Thu, 21 May 2015 21:01:52 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 21 May 2015 21:01:47 +0100 Message-Id: <1432238508-28741-4-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1432238508-28741-1-git-send-email-chris@chris-wilson.co.uk> References: <1432238508-28741-1-git-send-email-chris@chris-wilson.co.uk> X-Authenticated-User: chris.alporthouse@surfanytime.net Subject: [Intel-gfx] [PATCH 3/4] drm/i915: Use spinlocks for checking when to waitboost X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In commit 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e Author: Chris Wilson Date: Tue Apr 7 16:20:32 2015 +0100 drm/i915: Deminish contribution of wait-boosting from clients we removed an atomic timer based check for allowing waitboosting and moved it below the mutex taken during RPS. However, that mutex can be held for long periods of time on Vallyview/Cherryview as communication with the PCU is slow. As clients may frequently wait for results (e.g. such as tranform feedback) we introduced contention between the client and the RPS worker. We can take advantage of the RPS worker, by switching the wait boost decision to use spin locks and defer the actual reclocking to the worker. Fixes a regression of up to 45% on Baytrail and Baswell! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112 Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 17 +++------------- drivers/gpu/drm/i915/i915_drv.h | 9 +++++++-- drivers/gpu/drm/i915/i915_gem.c | 4 ++-- drivers/gpu/drm/i915/i915_irq.c | 20 +++++++++++++------ drivers/gpu/drm/i915/intel_pm.c | 40 +++++++++++++++++++++++++++---------- 5 files changed, 55 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 32a925d0f2f3..c975b391a822 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2305,15 +2305,6 @@ static int i915_rps_boost_info(struct seq_file *m, void *data) struct drm_device *dev = node->minor->dev; struct drm_i915_private *dev_priv = dev->dev_private; struct drm_file *file; - int ret; - - ret = mutex_lock_interruptible(&dev->struct_mutex); - if (ret) - return ret; - - ret = mutex_lock_interruptible(&dev_priv->rps.hw_lock); - if (ret) - goto unlock; seq_printf(m, "RPS enabled? %d\n", dev_priv->rps.enabled); seq_printf(m, "GPU busy? %d\n", dev_priv->mm.busy); @@ -2324,6 +2315,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data) intel_gpu_freq(dev_priv, dev_priv->rps.min_freq_softlimit), intel_gpu_freq(dev_priv, dev_priv->rps.max_freq_softlimit), intel_gpu_freq(dev_priv, dev_priv->rps.max_freq)); + spin_lock(&dev_priv->rps.client_lock); list_for_each_entry_reverse(file, &dev->filelist, lhead) { struct drm_i915_file_private *file_priv = file->driver_priv; struct task_struct *task; @@ -2344,12 +2336,9 @@ static int i915_rps_boost_info(struct seq_file *m, void *data) dev_priv->rps.mmioflips.boosts, list_empty(&dev_priv->rps.mmioflips.link) ? "" : ", active"); seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts); + spin_unlock(&dev_priv->rps.client_lock); - mutex_unlock(&dev_priv->rps.hw_lock); -unlock: - mutex_unlock(&dev->struct_mutex); - - return ret; + return 0; } static int i915_llc(struct seq_file *m, void *data) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d61447df1ae7..abd0914b4849 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1084,9 +1084,12 @@ struct intel_gen6_power_mgmt { int last_adj; enum { LOW_POWER, BETWEEN, HIGH_POWER } power; + spinlock_t client_lock; + struct list_head clients; + bool client_boost; + bool enabled; struct delayed_work delayed_resume_work; - struct list_head clients; unsigned boosts; struct intel_rps_client semaphores, mmioflips; @@ -1096,7 +1099,9 @@ struct intel_gen6_power_mgmt { /* * Protects RPS/RC6 register access and PCU communication. - * Must be taken after struct_mutex if nested. + * Must be taken after struct_mutex if nested. Note that + * this lock may be held for long periods of time when + * talking to hw - so only take it when talking to hw! */ struct mutex hw_lock; }; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 870ba67a595f..ca41fc56f246 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5195,9 +5195,9 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file) spin_unlock(&file_priv->mm.lock); if (!list_empty(&file_priv->rps.link)) { - mutex_lock(&to_i915(dev)->rps.hw_lock); + spin_lock(&to_i915(dev)->rps.client_lock); list_del(&file_priv->rps.link); - mutex_unlock(&to_i915(dev)->rps.hw_lock); + spin_unlock(&to_i915(dev)->rps.client_lock); } } diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 707e2ca8fbd8..6255c0456594 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1086,8 +1086,9 @@ static void gen6_pm_rps_work(struct work_struct *work) { struct drm_i915_private *dev_priv = container_of(work, struct drm_i915_private, rps.work); + bool client_boost; + int new_delay, adj, min, max; u32 pm_iir; - int new_delay, adj; spin_lock_irq(&dev_priv->irq_lock); /* Speed up work cancelation during disabling rps interrupts. */ @@ -1099,6 +1100,8 @@ static void gen6_pm_rps_work(struct work_struct *work) dev_priv->rps.pm_iir = 0; /* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */ gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); + client_boost = dev_priv->rps.client_boost; + dev_priv->rps.client_boost = false; spin_unlock_irq(&dev_priv->irq_lock); /* Make sure we didn't queue anything we're not going to process. */ @@ -1113,7 +1116,14 @@ static void gen6_pm_rps_work(struct work_struct *work) adj = dev_priv->rps.last_adj; new_delay = dev_priv->rps.cur_freq; - if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) { + min = dev_priv->rps.min_freq_softlimit; + max = dev_priv->rps.max_freq_softlimit; + if (client_boost || any_waiters(dev_priv)) + max = dev_priv->rps.max_freq; + if (client_boost && new_delay <= dev_priv->rps.boost_freq) { + new_delay = dev_priv->rps.boost_freq; + adj = 0; + } else if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) { if (adj > 0) adj *= 2; else /* CHV needs even encode values */ @@ -1126,7 +1136,7 @@ static void gen6_pm_rps_work(struct work_struct *work) new_delay = dev_priv->rps.efficient_freq; adj = 0; } - } else if (any_waiters(dev_priv)) { + } else if (client_boost || any_waiters(dev_priv)) { adj = 0; } else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) { if (dev_priv->rps.cur_freq > dev_priv->rps.efficient_freq) @@ -1149,9 +1159,7 @@ static void gen6_pm_rps_work(struct work_struct *work) * interrupt */ new_delay += adj; - new_delay = clamp_t(int, new_delay, - dev_priv->rps.min_freq_softlimit, - dev_priv->rps.max_freq_softlimit); + new_delay = clamp_t(int, new_delay, min, max); intel_set_rps(dev_priv->dev, new_delay); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 24a35bda1462..833f9dd28fea 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -4141,17 +4141,25 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv) dev_priv->rps.last_adj = 0; I915_WRITE(GEN6_PMINTRMSK, 0xffffffff); } + mutex_unlock(&dev_priv->rps.hw_lock); + spin_lock(&dev_priv->rps.client_lock); while (!list_empty(&dev_priv->rps.clients)) list_del_init(dev_priv->rps.clients.next); - mutex_unlock(&dev_priv->rps.hw_lock); + spin_unlock(&dev_priv->rps.client_lock); } void gen6_rps_boost(struct drm_i915_private *dev_priv, struct intel_rps_client *rps, unsigned long submitted) { - u32 val; + /* This is intentionally racy! We peek at the state here, then + * validate inside the RPS worker. + */ + if (!(dev_priv->mm.busy && + dev_priv->rps.enabled && + dev_priv->rps.cur_freq < dev_priv->rps.max_freq)) + return; /* Force a RPS boost (and don't count it against the client) if * the GPU is severely congested. @@ -4159,14 +4167,23 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv, if (rps && time_after(jiffies, submitted + msecs_to_jiffies(20))) rps = NULL; - mutex_lock(&dev_priv->rps.hw_lock); - val = dev_priv->rps.boost_freq; - if (dev_priv->rps.enabled && - dev_priv->mm.busy && - dev_priv->rps.cur_freq < val && - (rps == NULL || list_empty(&rps->link))) { - intel_set_rps(dev_priv->dev, val); - dev_priv->rps.last_adj = 0; + spin_lock(&dev_priv->rps.client_lock); + if (rps == NULL || list_empty(&rps->link)) { + spin_lock_irq(&dev_priv->irq_lock); + if (dev_priv->rps.interrupts_enabled) { + unsigned mask; + + /* Pick a legal PM_IIR value to trigger a wakeup */ + mask = (GEN6_PM_RP_UP_EI_EXPIRED | + GEN6_PM_RP_UP_THRESHOLD); + mask &= dev_priv->pm_rps_events; + dev_priv->rps.pm_iir |= mask; + + /* Request RPS worker to give us full power */ + dev_priv->rps.client_boost = true; + queue_work(dev_priv->wq, &dev_priv->rps.work); + } + spin_unlock_irq(&dev_priv->irq_lock); if (rps != NULL) { list_add(&rps->link, &dev_priv->rps.clients); @@ -4174,7 +4191,7 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv, } else dev_priv->rps.boosts++; } - mutex_unlock(&dev_priv->rps.hw_lock); + spin_unlock(&dev_priv->rps.client_lock); } void intel_set_rps(struct drm_device *dev, u8 val) @@ -6892,6 +6909,7 @@ void intel_pm_setup(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; mutex_init(&dev_priv->rps.hw_lock); + spin_lock_init(&dev_priv->rps.client_lock); INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work, intel_gen6_powersave_work);