From patchwork Mon May 5 11:43:18 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: akash.goel@intel.com X-Patchwork-Id: 4114141 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id C60E79F271 for ; Mon, 5 May 2014 11:40:37 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C7F2B203E6 for ; Mon, 5 May 2014 11:40:36 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 3DBE320398 for ; Mon, 5 May 2014 11:40:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7E8B06E87A; Mon, 5 May 2014 04:40:34 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id 5DCA56E87A for ; Mon, 5 May 2014 04:40:33 -0700 (PDT) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 05 May 2014 04:40:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,987,1389772800"; d="scan'208";a="526156388" Received: from akashgoe-desktop.iind.intel.com ([10.223.82.135]) by fmsmga001.fm.intel.com with ESMTP; 05 May 2014 04:40:17 -0700 From: akash.goel@intel.com To: intel-gfx@lists.freedesktop.org Date: Mon, 5 May 2014 17:13:18 +0530 Message-Id: <1399290198-4283-1-git-send-email-akash.goel@intel.com> X-Mailer: git-send-email 1.9.2 Cc: sourab.gupta@intel.com, Akash Goel Subject: [Intel-gfx] [RFC] drm/i915: Scratch page optimization for blanking buffer X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Akash Goel There is a use case, when user space (display compositor) tries to directly flip a fb (without any prior rendering) on primary plane. So the backing pages of the object are allocated at page flip time only, which takes time. Since, this buffer is supposed to serve as a blanking buffer (black colored), we can setup all the GTT entries of that blanking buffer with scratch page (which is already zeroed out). This saves the time in allocation of real backing physical space for the blanking buffer and flushing of CPU cache. Signed-off-by: Akash Goel --- drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++- drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++++ drivers/gpu/drm/i915/intel_display.c | 72 ++++++++++++++++++++++++++++++++++++ 3 files changed, 97 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b19ccb8..7c3963c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1781,6 +1781,17 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj) * lists early. */ list_del(&obj->global_list); + /* + * If so far the object was backed up by a scratch page, then remove + * that association & make it reusable as a normal Gem object + */ + if ((unsigned long)obj->pages == (unsigned long)(obj)) { + obj->pages = NULL; + obj->base.read_domains = obj->base.write_domain = + I915_GEM_DOMAIN_CPU; + return 0; + } + ops->put_pages(obj); obj->pages = NULL; @@ -3772,7 +3783,12 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err_unpin_display; - i915_gem_object_flush_cpu_write_domain(obj, true); + /* + * Check if object is backed up by a scratch page, in that case CPU + * cache flush is not required, thus skip it. + */ + if ((unsigned long)(obj->pages) != (unsigned long)obj) + i915_gem_object_flush_cpu_write_domain(obj, true); old_write_domain = obj->base.write_domain; old_read_domains = obj->base.read_domains; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index f6354e0..fb3193a 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1566,6 +1566,14 @@ static void ggtt_bind_vma(struct i915_vma *vma, if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) { if (!obj->has_global_gtt_mapping || (cache_level != obj->cache_level)) { + if ((unsigned long)(obj->pages) == (unsigned long)obj) { + /* Leave the scratch page mapped into the GTT + * entries of the object, as it is actually + * supposed to be backed up by scratch page + * only */ + obj->has_global_gtt_mapping = 1; + return; + } vma->vm->insert_entries(vma->vm, obj->pages, vma->node.start, cache_level); diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 59303213..dff85e4 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -40,6 +40,7 @@ #include #include #include +#include static void intel_increase_pllclock(struct drm_crtc *crtc); static void intel_crtc_update_cursor(struct drm_crtc *crtc, bool on); @@ -8469,6 +8470,75 @@ static void intel_crtc_destroy(struct drm_crtc *crtc) kfree(intel_crtc); } +static inline void +intel_use_srcatch_page_for_fb(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *dev_priv = obj->base.dev->dev_private; + int ret; + + /* A fb being flipped without having any allocated backing physical + * space (without any prior rendering), is most probably going to be + * used as a blanking buffer (black colored). So instead of allocating + * the real backing physical space for this buffer, we can try to + * currently back this object by a scratch page, which is already + * allocated. So we check if no shmem data pages have been allocated to + * the fb we can back it by a scratch page and thus save time by avoid- + * ing allocation of backing physicl space & subsequent CPU cache flush + */ + if (obj->base.filp) { + struct inode *inode = file_inode(obj->base.filp); + struct shmem_inode_info *info = SHMEM_I(inode); + spin_lock(&info->lock); + ret = info->alloced; + spin_unlock(&info->lock); + if ((ret == 0) && (obj->pages == NULL)) { + /* + * Set the 'pages' field with the object pointer + * itself, this will avoid the need of a new field in + * obj structure to identify the object backed up by a + * scratch page and will also avoid the call to + * 'get_pages', thus also saving on the time required + * for allocation of 'scatterlist' structure. + */ + obj->pages = (struct sg_table *)(obj); + + /* + * To avoid calls to gtt prepare & finish, as those + * will dereference the 'pages' field + */ + obj->has_dma_mapping = 1; + list_add_tail(&obj->global_list, + &dev_priv->mm.unbound_list); + + trace_printk("Using Scratch page for obj %p\n", obj); + } + } +} + +static inline void +intel_drop_srcatch_page_for_fb(struct drm_i915_gem_object *obj) +{ + int ret; + /* + * Unmap the object backed up by scratch page, as it is no + * longer being scanned out and thus it can be now allowed + * to be used as a normal object. + * Assumption: The User space will ensure that only when the + * object is no longer being scanned out, it will be reused + * for rendering. This is a valid assumption as there is no + * such handling in driver for other regular fb objects also. + */ + if ((unsigned long)obj->pages == + (unsigned long)obj) { + ret = i915_gem_object_ggtt_unbind(obj); + /* EBUSY is ok: this means that pin count is still not zero */ + if (ret && ret != -EBUSY) + DRM_ERROR("unbind error %d\n", ret); + i915_gem_object_put_pages(obj); + obj->has_dma_mapping = 0; + } +} + static void intel_unpin_work_fn(struct work_struct *__work) { struct intel_unpin_work *work = @@ -8477,6 +8547,7 @@ static void intel_unpin_work_fn(struct work_struct *__work) mutex_lock(&dev->struct_mutex); intel_unpin_fb_obj(work->old_fb_obj); + intel_drop_srcatch_page_for_fb(work->old_fb_obj); drm_gem_object_unreference(&work->pending_flip_obj->base); drm_gem_object_unreference(&work->old_fb_obj->base); @@ -8770,6 +8841,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev, if (IS_VALLEYVIEW(dev) || ring == NULL || ring->id != RCS) ring = &dev_priv->ring[BCS]; + intel_use_srcatch_page_for_fb(obj); ret = intel_pin_and_fence_fb_obj(dev, obj, ring); if (ret) goto err;