From patchwork Mon Nov 30 07:11:05 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: akash.goel@intel.com X-Patchwork-Id: 7721521 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id EB13D9F387 for ; Mon, 30 Nov 2015 07:00:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 03F852066D for ; Mon, 30 Nov 2015 07:00:40 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id EDAC02066C for ; Mon, 30 Nov 2015 07:00:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D9E9E6E1BF; Sun, 29 Nov 2015 23:00:37 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id 32CA66E1BF for ; Sun, 29 Nov 2015 23:00:36 -0800 (PST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP; 29 Nov 2015 23:00:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,364,1444719600"; d="scan'208";a="609759415" Received: from akashgoe-desktop.iind.intel.com ([10.223.82.141]) by FMSMGA003.fm.intel.com with ESMTP; 29 Nov 2015 23:00:33 -0800 From: akash.goel@intel.com To: intel-gfx@lists.freedesktop.org Date: Mon, 30 Nov 2015 12:41:05 +0530 Message-Id: <1448867465-5520-1-git-send-email-akash.goel@intel.com> X-Mailer: git-send-email 1.9.2 In-Reply-To: <20151126105713.GF23362@nuc-i3427.alporthouse.com> References: <20151126105713.GF23362@nuc-i3427.alporthouse.com> Cc: Akash Goel Subject: [Intel-gfx] [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Akash Goel When the object is moved out of CPU read domain, the cachelines are not invalidated immediately. The invalidation is deferred till next time the object is brought back into CPU read domain. But the invalidation is done unconditionally, i.e. even for the case where the cachelines were flushed previously, when the object moved out of CPU write domain. This is avoidable and would lead to some optimization. Though this is not a hypothetical case, but is unlikely to occur often. The aim is to detect changes to the backing storage whilst the data is potentially in the CPU cache, and only clflush in those case. v2: Made the comment more verbose (Ville/Chris) Added doc for 'cache_clean' field (Daniel) v3: Updated the comment to assuage an apprehension regarding the speculative-prefetching behavior of HW (Ville/Chris) Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_set_domain Signed-off-by: Chris Wilson Signed-off-by: Akash Goel --- drivers/gpu/drm/i915/i915_drv.h | 9 +++++++++ drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 11ae5a5..f97795e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2100,6 +2100,15 @@ struct drm_i915_gem_object { unsigned int cache_level:3; unsigned int cache_dirty:1; + /* + * Tracks if the CPU cache has been completely clflushed. + * !cache_clean does not imply cache_dirty (there is some data in the + * CPU cachelines, but has not been dirtied), but cache_clean + * does imply !cache_dirty (no data in cachelines, so not dirty also). + * Actually cache_dirty tracks whether we have been omitting clflushes. + */ + unsigned int cache_clean:1; + unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS; unsigned int pin_display; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 33adc8f..7376be8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj, trace_i915_gem_object_clflush(obj); drm_clflush_sg(obj->pages); obj->cache_dirty = false; + obj->cache_clean = true; return true; } @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) /* Flush the CPU cache if it's still invalid. */ if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) { - i915_gem_clflush_object(obj, false); + /* If an object is moved out of the CPU domain following a + * CPU write and before a GPU or GTT write, we will clflush + * it out of the CPU cache, and mark the cache as clean. + * After clflushing we know that this object cannot be in the + * CPU cache, nor can it be speculatively loaded into the CPU + * cache as our objects are page-aligned (& speculation cannot + * cross page boundaries). Whilst this flag is set, we know + * that any future access to the object's pages will miss the + * stale cache and have to be serviced from main memory, i.e. + * we do not need another clflush to invalidate the CPU cache + * in preparing to read from the object. + */ + if (!obj->cache_clean) + i915_gem_clflush_object(obj, false); + obj->cache_clean = false; obj->base.read_domains |= I915_GEM_DOMAIN_CPU; }