From patchwork Fri May 30 06:47:00 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexandre Courbot <acourbot@nvidia.com>
X-Patchwork-Id: 4268841
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
X-Original-To: patchwork-dri-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id A0FBC9F30B
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Fri, 30 May 2014 06:51:59 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 7A0C4201E4
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Fri, 30 May 2014 06:51:58 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.kernel.org (Postfix) with ESMTP id DFE4C2018E
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Fri, 30 May 2014 06:51:56 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id EDCCE6E052;
	Thu, 29 May 2014 23:51:54 -0700 (PDT)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from hqemgate14.nvidia.com (hqemgate14.nvidia.com
	[216.228.121.143])
	by gabe.freedesktop.org (Postfix) with ESMTP id 48E9F6E052;
	Thu, 29 May 2014 23:51:54 -0700 (PDT)
Received: from hqnvupgp08.nvidia.com (Not Verified[216.228.121.13]) by
	hqemgate14.nvidia.com
	id <B53882a9b0003>; Thu, 29 May 2014 23:52:12 -0700
Received: from hqemhub03.nvidia.com ([172.20.12.94])
	by hqnvupgp08.nvidia.com (PGP Universal service);
	Thu, 29 May 2014 23:46:50 -0700
X-PGP-Universal: processed;
	by hqnvupgp08.nvidia.com on Thu, 29 May 2014 23:46:50 -0700
Received: from percival.nvidia.com (172.20.144.16) by hqemhub03.nvidia.com
	(172.20.150.15) with Microsoft SMTP Server (TLS) id 8.3.342.0;
	Thu, 29 May 2014 23:51:53 -0700
From: Alexandre Courbot <acourbot@nvidia.com>
To: Ben Skeggs <bskeggs@redhat.com>, Thierry Reding <treding@nvidia.com>,
	Terje Bergstrom <tbergstrom@nvidia.com>, Ken Adams <KAdams@nvidia.com>
Subject: [PATCH] drm/gk20a/fb: use dma_alloc_coherent() for VRAM
Date: Fri, 30 May 2014 15:47:00 +0900
Message-ID: <1401432420-29477-1-git-send-email-acourbot@nvidia.com>
X-Mailer: git-send-email 1.9.3
X-NVConfidentiality: public
MIME-Version: 1.0
Cc: gnurou@gmail.com, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-tegra@vger.kernel.org
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
	<dri-devel.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

GK20A's RAM driver was using CMA functions in order to allocate VRAM.
This is wrong because these functions are not exported, which causes
compilation to fail when CMA is enabled and Nouveau is built as a
module. On top of that the driver was leaking (or rather bleeding)
memory.

dma_alloc_coherent() will also use CMA when needed but has the
advantage of being properly exported. It creates a permanent kernel
mapping, but experiment revealed that the lowmem mapping is actually
reused, and this mapping can also be taken advantage of to implement
faster instmem. We lose the ability to allocate memory at finer
granularity, but that's what CMA is here for and it also simplifies the
driver.

This driver is to be replaced by an IOMMU-based one in the future ;
until then, its current form will allow it to do its job.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c | 97 ++++++++++-------------
 1 file changed, 42 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c b/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c
index 7effd1a63458..10cdcf8b8a7f 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c
@@ -24,32 +24,32 @@
 
 #include <subdev/fb.h>
 
-#include <linux/mm.h>
 #include <linux/types.h>
-#include <linux/dma-contiguous.h>
+#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+
+struct gk20a_mem {
+	struct nouveau_mem base;
+	void *cpuaddr;
+	dma_addr_t handle;
+};
+#define to_gk20a_mem(m) container_of(m, struct gk20a_mem, base)
 
 static void
 gk20a_ram_put(struct nouveau_fb *pfb, struct nouveau_mem **pmem)
 {
 	struct device *dev = nv_device_base(nv_device(pfb));
-	struct nouveau_mem *mem = *pmem;
-	int i;
+	struct gk20a_mem *mem = to_gk20a_mem(*pmem);
 
 	*pmem = NULL;
 	if (unlikely(mem == NULL))
 		return;
 
-	for (i = 0; i < mem->size; i++) {
-		struct page *page;
-
-		if (mem->pages[i] == 0)
-			break;
+	if (likely(mem->cpuaddr))
+		dma_free_coherent(dev, mem->base.size << PAGE_SHIFT,
+				  mem->cpuaddr, mem->handle);
 
-		page = pfn_to_page(mem->pages[i] >> PAGE_SHIFT);
-		dma_release_from_contiguous(dev, page, 1);
-	}
-
-	kfree(mem->pages);
+	kfree(mem->base.pages);
 	kfree(mem);
 }
 
@@ -58,11 +58,9 @@ gk20a_ram_get(struct nouveau_fb *pfb, u64 size, u32 align, u32 ncmin,
 	     u32 memtype, struct nouveau_mem **pmem)
 {
 	struct device *dev = nv_device_base(nv_device(pfb));
-	struct nouveau_mem *mem;
-	int type = memtype & 0xff;
-	dma_addr_t dma_addr;
-	int npages;
-	int order;
+	struct gk20a_mem *mem;
+	u32 type = memtype & 0xff;
+	u32 npages, order;
 	int i;
 
 	nv_debug(pfb, "%s: size: %llx align: %x, ncmin: %x\n", __func__, size,
@@ -80,59 +78,48 @@ gk20a_ram_get(struct nouveau_fb *pfb, u64 size, u32 align, u32 ncmin,
 	order = fls(align);
 	if ((align & (align - 1)) == 0)
 		order--;
+	align = BIT(order);
 
-	ncmin >>= PAGE_SHIFT;
-	/*
-	 * allocate pages by chunks of "align" size, otherwise we may leave
-	 * holes in the contiguous memory area.
-	 */
-	if (ncmin == 0)
-		ncmin = npages;
-	else if (align > ncmin)
-		ncmin = align;
+	/* ensure returned address is correctly aligned */
+	npages = max(align, npages);
 
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
 
-	mem->size = npages;
-	mem->memtype = type;
+	mem->base.size = npages;
+	mem->base.memtype = type;
 
-	mem->pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL);
-	if (!mem) {
+	mem->base.pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL);
+	if (!mem->base.pages) {
 		kfree(mem);
 		return -ENOMEM;
 	}
 
-	while (npages) {
-		struct page *pages;
-		int pos = 0;
-
-		/* don't overflow in case size is not a multiple of ncmin */
-		if (ncmin > npages)
-			ncmin = npages;
-
-		pages = dma_alloc_from_contiguous(dev, ncmin, order);
-		if (!pages) {
-			gk20a_ram_put(pfb, &mem);
-			return -ENOMEM;
-		}
+	*pmem = &mem->base;
 
-		dma_addr = (dma_addr_t)(page_to_pfn(pages) << PAGE_SHIFT);
+	mem->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT,
+					  &mem->handle, GFP_KERNEL);
+	if (!mem->cpuaddr) {
+		nv_error(pfb, "%s: cannot allocate memory!\n", __func__);
+		gk20a_ram_put(pfb, pmem);
+		return -ENOMEM;
+	}
 
-		nv_debug(pfb, "  alloc count: %x, order: %x, addr: %pad\n", ncmin,
-			 order, &dma_addr);
+	align <<= PAGE_SHIFT;
 
-		for (i = 0; i < ncmin; i++)
-			mem->pages[pos + i] = dma_addr + (PAGE_SIZE * i);
+	/* alignment check */
+	if (unlikely(mem->handle & (align - 1)))
+		nv_warn(pfb, "memory not aligned as requested: %pad (0x%x)\n",
+			&mem->handle, align);
 
-		pos += ncmin;
-		npages -= ncmin;
-	}
+	nv_debug(pfb, "alloc size: 0x%x, align: 0x%x, paddr: %pad, vaddr: %p\n",
+		 npages << PAGE_SHIFT, align, &mem->handle, mem->cpuaddr);
 
-	mem->offset = (u64)mem->pages[0];
+	for (i = 0; i < npages; i++)
+		mem->base.pages[i] = mem->handle + (PAGE_SIZE * i);
 
-	*pmem = mem;
+	mem->base.offset = (u64)mem->base.pages[0];
 
 	return 0;
 }