From patchwork Fri May 30 06:47:00 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Courbot X-Patchwork-Id: 4268841 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id A0FBC9F30B for ; Fri, 30 May 2014 06:51:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7A0C4201E4 for ; Fri, 30 May 2014 06:51:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id DFE4C2018E for ; Fri, 30 May 2014 06:51:56 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EDCCE6E052; Thu, 29 May 2014 23:51:54 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from hqemgate14.nvidia.com (hqemgate14.nvidia.com [216.228.121.143]) by gabe.freedesktop.org (Postfix) with ESMTP id 48E9F6E052; Thu, 29 May 2014 23:51:54 -0700 (PDT) Received: from hqnvupgp08.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com id ; Thu, 29 May 2014 23:52:12 -0700 Received: from hqemhub03.nvidia.com ([172.20.12.94]) by hqnvupgp08.nvidia.com (PGP Universal service); Thu, 29 May 2014 23:46:50 -0700 X-PGP-Universal: processed; by hqnvupgp08.nvidia.com on Thu, 29 May 2014 23:46:50 -0700 Received: from percival.nvidia.com (172.20.144.16) by hqemhub03.nvidia.com (172.20.150.15) with Microsoft SMTP Server (TLS) id 8.3.342.0; Thu, 29 May 2014 23:51:53 -0700 From: Alexandre Courbot To: Ben Skeggs , Thierry Reding , Terje Bergstrom , Ken Adams Subject: [PATCH] drm/gk20a/fb: use dma_alloc_coherent() for VRAM Date: Fri, 30 May 2014 15:47:00 +0900 Message-ID: <1401432420-29477-1-git-send-email-acourbot@nvidia.com> X-Mailer: git-send-email 1.9.3 X-NVConfidentiality: public MIME-Version: 1.0 Cc: gnurou@gmail.com, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-tegra@vger.kernel.org X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP GK20A's RAM driver was using CMA functions in order to allocate VRAM. This is wrong because these functions are not exported, which causes compilation to fail when CMA is enabled and Nouveau is built as a module. On top of that the driver was leaking (or rather bleeding) memory. dma_alloc_coherent() will also use CMA when needed but has the advantage of being properly exported. It creates a permanent kernel mapping, but experiment revealed that the lowmem mapping is actually reused, and this mapping can also be taken advantage of to implement faster instmem. We lose the ability to allocate memory at finer granularity, but that's what CMA is here for and it also simplifies the driver. This driver is to be replaced by an IOMMU-based one in the future ; until then, its current form will allow it to do its job. Signed-off-by: Alexandre Courbot --- drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c | 97 ++++++++++------------- 1 file changed, 42 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c b/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c index 7effd1a63458..10cdcf8b8a7f 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c +++ b/drivers/gpu/drm/nouveau/core/subdev/fb/ramgk20a.c @@ -24,32 +24,32 @@ #include -#include #include -#include +#include +#include + +struct gk20a_mem { + struct nouveau_mem base; + void *cpuaddr; + dma_addr_t handle; +}; +#define to_gk20a_mem(m) container_of(m, struct gk20a_mem, base) static void gk20a_ram_put(struct nouveau_fb *pfb, struct nouveau_mem **pmem) { struct device *dev = nv_device_base(nv_device(pfb)); - struct nouveau_mem *mem = *pmem; - int i; + struct gk20a_mem *mem = to_gk20a_mem(*pmem); *pmem = NULL; if (unlikely(mem == NULL)) return; - for (i = 0; i < mem->size; i++) { - struct page *page; - - if (mem->pages[i] == 0) - break; + if (likely(mem->cpuaddr)) + dma_free_coherent(dev, mem->base.size << PAGE_SHIFT, + mem->cpuaddr, mem->handle); - page = pfn_to_page(mem->pages[i] >> PAGE_SHIFT); - dma_release_from_contiguous(dev, page, 1); - } - - kfree(mem->pages); + kfree(mem->base.pages); kfree(mem); } @@ -58,11 +58,9 @@ gk20a_ram_get(struct nouveau_fb *pfb, u64 size, u32 align, u32 ncmin, u32 memtype, struct nouveau_mem **pmem) { struct device *dev = nv_device_base(nv_device(pfb)); - struct nouveau_mem *mem; - int type = memtype & 0xff; - dma_addr_t dma_addr; - int npages; - int order; + struct gk20a_mem *mem; + u32 type = memtype & 0xff; + u32 npages, order; int i; nv_debug(pfb, "%s: size: %llx align: %x, ncmin: %x\n", __func__, size, @@ -80,59 +78,48 @@ gk20a_ram_get(struct nouveau_fb *pfb, u64 size, u32 align, u32 ncmin, order = fls(align); if ((align & (align - 1)) == 0) order--; + align = BIT(order); - ncmin >>= PAGE_SHIFT; - /* - * allocate pages by chunks of "align" size, otherwise we may leave - * holes in the contiguous memory area. - */ - if (ncmin == 0) - ncmin = npages; - else if (align > ncmin) - ncmin = align; + /* ensure returned address is correctly aligned */ + npages = max(align, npages); mem = kzalloc(sizeof(*mem), GFP_KERNEL); if (!mem) return -ENOMEM; - mem->size = npages; - mem->memtype = type; + mem->base.size = npages; + mem->base.memtype = type; - mem->pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL); - if (!mem) { + mem->base.pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL); + if (!mem->base.pages) { kfree(mem); return -ENOMEM; } - while (npages) { - struct page *pages; - int pos = 0; - - /* don't overflow in case size is not a multiple of ncmin */ - if (ncmin > npages) - ncmin = npages; - - pages = dma_alloc_from_contiguous(dev, ncmin, order); - if (!pages) { - gk20a_ram_put(pfb, &mem); - return -ENOMEM; - } + *pmem = &mem->base; - dma_addr = (dma_addr_t)(page_to_pfn(pages) << PAGE_SHIFT); + mem->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT, + &mem->handle, GFP_KERNEL); + if (!mem->cpuaddr) { + nv_error(pfb, "%s: cannot allocate memory!\n", __func__); + gk20a_ram_put(pfb, pmem); + return -ENOMEM; + } - nv_debug(pfb, " alloc count: %x, order: %x, addr: %pad\n", ncmin, - order, &dma_addr); + align <<= PAGE_SHIFT; - for (i = 0; i < ncmin; i++) - mem->pages[pos + i] = dma_addr + (PAGE_SIZE * i); + /* alignment check */ + if (unlikely(mem->handle & (align - 1))) + nv_warn(pfb, "memory not aligned as requested: %pad (0x%x)\n", + &mem->handle, align); - pos += ncmin; - npages -= ncmin; - } + nv_debug(pfb, "alloc size: 0x%x, align: 0x%x, paddr: %pad, vaddr: %p\n", + npages << PAGE_SHIFT, align, &mem->handle, mem->cpuaddr); - mem->offset = (u64)mem->pages[0]; + for (i = 0; i < npages; i++) + mem->base.pages[i] = mem->handle + (PAGE_SIZE * i); - *pmem = mem; + mem->base.offset = (u64)mem->base.pages[0]; return 0; }