From patchwork Sun Dec 14 03:08:22 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Widawsky <benjamin.widawsky@intel.com>
X-Patchwork-Id: 5487131
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Original-To: patchwork-intel-gfx@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id E1F99BEEBA
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Sun, 14 Dec 2014 03:08:36 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id E8DCC20A10
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Sun, 14 Dec 2014 03:08:35 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.kernel.org (Postfix) with ESMTP id BF07320A0F
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Sun, 14 Dec 2014 03:08:34 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id A6CE16E22C;
	Sat, 13 Dec 2014 19:08:29 -0800 (PST)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
	by gabe.freedesktop.org (Postfix) with ESMTP id 3D22F6E09E;
	Sat, 13 Dec 2014 19:08:27 -0800 (PST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
	by fmsmga101.fm.intel.com with ESMTP; 13 Dec 2014 19:08:26 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.07,573,1413270000"; d="scan'208";a="647252533"
Received: from unknown (HELO ironside.intel.com) ([10.255.13.249])
	by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2014 19:08:26 -0800
From: Ben Widawsky <benjamin.widawsky@intel.com>
To: DRI Development <dri-devel@lists.freedesktop.org>
Date: Sat, 13 Dec 2014 19:08:22 -0800
Message-Id: <1418526504-26316-3-git-send-email-benjamin.widawsky@intel.com>
X-Mailer: git-send-email 2.1.3
In-Reply-To: <1418526504-26316-1-git-send-email-benjamin.widawsky@intel.com>
References: <1418526504-26316-1-git-send-email-benjamin.widawsky@intel.com>
Cc: Intel GFX <intel-gfx@lists.freedesktop.org>,
	Ben Widawsky <ben@bwidawsk.net>,
	Ben Widawsky <benjamin.widawsky@intel.com>
Subject: [Intel-gfx] [PATCH 2/4] drm/cache: Try to be smarter about
	clflushing on x86
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	T_RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Any GEM driver which has very large objects and a slow CPU is subject to very
long waits simply for clflushing incoherent objects. Generally, each individual
object is not a problem, but if you have very large objects, or very many
objects, the flushing begins to show up in profiles. Because on x86 we know the
cache size, we can easily determine when an object will use all the cache, and
forego iterating over each cacheline.

We need to be careful when using wbinvd. wbinvd() is itself potentially slow
because it requires synchronizing the flush across all CPUs so they have a
coherent view of memory. This can result in either stalling work being done on
other CPUs, or this call itself stalling while waiting for a CPU to accept the
interrupt. Also, wbinvd() also has the downside of invalidating all cachelines,
so we don't want to use it unless we're sure we already own most of the
cachelines.

The current algorithm is very naive. I think it can be tweaked more, and it
would be good if someone else gave it some thought. I am pretty confident in
i915, we can even skip the IPI in the execbuf path with minimal code change (or
perhaps just some verifying of the existing code). It would be nice to hear what
other developers who depend on this code think.

Cc: Intel GFX <intel-gfx@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/drm_cache.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index d7797e8..6009c2d 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[],
 		drm_clflush_page(*pages++);
 	mb();
 }
+
+static bool
+drm_cache_should_clflush(unsigned long num_pages)
+{
+	const int cache_size = boot_cpu_data.x86_cache_size;
+
+	/* For now the algorithm simply checks if the number of pages to be
+	 * flushed is greater than the entire system cache. One could make the
+	 * function more aware of the actual system (ie. if SMP, how large is
+	 * the cache, CPU freq. etc. All those help to determine when to
+	 * wbinvd() */
+	WARN_ON_ONCE(!cache_size);
+	return !cache_size || num_pages < (cache_size >> 2);
+}
 #endif
 
 void
@@ -71,7 +85,7 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 {
 
 #if defined(CONFIG_X86)
-	if (cpu_has_clflush) {
+	if (cpu_has_clflush && drm_cache_should_clflush(num_pages)) {
 		drm_cache_flush_clflush(pages, num_pages);
 		return;
 	}
@@ -104,7 +118,7 @@ void
 drm_clflush_sg(struct sg_table *st)
 {
 #if defined(CONFIG_X86)
-	if (cpu_has_clflush) {
+	if (cpu_has_clflush && drm_cache_should_clflush(st->nents)) {
 		struct sg_page_iter sg_iter;
 
 		mb();
@@ -128,7 +142,7 @@ void
 drm_clflush_virt_range(void *addr, unsigned long length)
 {
 #if defined(CONFIG_X86)
-	if (cpu_has_clflush) {
+	if (cpu_has_clflush && drm_cache_should_clflush(length / PAGE_SIZE)) {
 		void *end = addr + length;
 		mb();
 		for (; addr < end; addr += boot_cpu_data.x86_clflush_size)