From patchwork Mon Jan 11 17:30:25 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Doug Anderson <dianders@chromium.org>
X-Patchwork-Id: 8007181
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 37322BEEE5
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Mon, 11 Jan 2016 17:34:13 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 8A44A201BB
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Mon, 11 Jan 2016 17:34:11 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org
	[198.137.202.9])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 3B5DC201D3
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Mon, 11 Jan 2016 17:34:10 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux))
	id 1aIgKU-0008Cd-Tn; Mon, 11 Jan 2016 17:32:30 +0000
Received: from mail-pf0-x22e.google.com ([2607:f8b0:400e:c00::22e])
	by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat
	Linux)) id 1aIgK5-0007jx-UQ
	for linux-arm-kernel@lists.infradead.org;
	Mon, 11 Jan 2016 17:32:08 +0000
Received: by mail-pf0-x22e.google.com with SMTP id n128so47753268pfn.3
	for <linux-arm-kernel@lists.infradead.org>;
	Mon, 11 Jan 2016 09:31:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org;
	s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=ou6npg+SlwJK4g6ufqTDsvIRDGOdXvXEs0Zoa0lbC/o=;
	b=dK1zxEYtFOGLUaKDTWQzSh4lFADC1rrs4pquUvlpc2b1sQDrt3IK0cPUm3Ch9wWEX9
	TLyg39Lul2A+xguWADezntbAl9tYWmIPWpwdMi5qV+Nqhh+FLtQs7eqwSZRb+kgdbuuF
	TTzECEH9sagSpQM45GqtYX91FCJyW/ylJHtlM=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=ou6npg+SlwJK4g6ufqTDsvIRDGOdXvXEs0Zoa0lbC/o=;
	b=Mn/TlPh9ZpcI0UpTRxpZ7kH7HhYADMwzbnuX5K05a1iF6c0fMvo+L8WhrZEW9O0+y2
	xm0ZkuASwWoTNKLpo8BkQ5HqSxkJsizjtl3WN299TD4HJq0qUllC4zFVou0ZzN8HncG9
	JUuJ9W8JFlUZnaNhpc7Ua304MLt1iFZX/7t3RX0RQtqB99FU+vuPJYNAFt5DfowyLqwY
	5jOf0FYiYTNKdiUq+q9rvf7kl86qtqaaz3C1m2Z7JK4QjV6AyHYmnhX/bZRqp4e7f1jI
	Nz9BRFUC+yU3qLBK02PWGQ8mN6CEjsok11ys0nWfRwGNtgCFd3iXZtfJv1hbFcpdR+MR
	k2Xg==
X-Gm-Message-State: 
 ALoCoQnHWElGRFFaeYGjmvB4Mn8djZp5JEme0+3l2yaZoDingGyIqbGLw9K68+dpMo8d1yk1NZe5PFf1+sf7vog82iSYno9uHw==
X-Received: by 10.98.69.143 with SMTP id n15mr28190209pfi.104.1452533505275;
	Mon, 11 Jan 2016 09:31:45 -0800 (PST)
Received: from tictac.mtv.corp.google.com ([172.22.65.76])
	by smtp.gmail.com with ESMTPSA id
	wa17sm118618460pac.38.2016.01.11.09.31.44
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Mon, 11 Jan 2016 09:31:44 -0800 (PST)
From: Douglas Anderson <dianders@chromium.org>
To: linux@arm.linux.org.uk, mchehab@osg.samsung.com, robin.murphy@arm.com,
	tfiga@chromium.org, m.szyprowski@samsung.com
Subject: [PATCH v6 3/5] ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES
	hint to optimize alloc
Date: Mon, 11 Jan 2016 09:30:25 -0800
Message-Id: <1452533428-12762-4-git-send-email-dianders@chromium.org>
X-Mailer: git-send-email 2.6.0.rc2.230.g3dd15c0
In-Reply-To: <1452533428-12762-1-git-send-email-dianders@chromium.org>
References: <1452533428-12762-1-git-send-email-dianders@chromium.org>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20160111_093206_207650_6540E091 
X-CRM114-Status: GOOD (  17.22  )
X-Spam-Score: -2.7 (--)
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: laurent.pinchart+renesas@ideasonboard.com, pawel@osciak.com,
	mike.looijmans@topic.nl, Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	will.deacon@arm.com, Douglas Anderson <dianders@chromium.org>,
	linux-kernel@vger.kernel.org, hch@infradead.org, carlo@caione.org,
	akpm@linux-foundation.org, dan.j.williams@intel.com,
	linux-arm-kernel@lists.infradead.org
MIME-Version: 1.0
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY
	autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE

Changes in v4:
- renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE
- added Marek's ack

Changes in v3:
- Use DMA_ATTR_SEQUENTIAL hint patch new for v3.

Changes in v2: None

 arch/arm/mm/dma-mapping.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index bc9cebfa0891..9f996a3d79f7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1158,6 +1158,10 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 		return pages;
 	}
 
+	/* Go straight to 4K chunks if caller says it's OK. */
+	if (dma_get_attr(DMA_ATTR_ALLOC_SINGLE_PAGES, attrs))
+		order_idx = ARRAY_SIZE(iommu_order_array) - 1;
+
 	/*
 	 * IOMMU can map any pages, so himem can also be used here
 	 */