From patchwork Tue Jul 8 08:26:00 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Courbot X-Patchwork-Id: 4503021 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id A0F899F392 for ; Tue, 8 Jul 2014 08:26:52 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id A802A20351 for ; Tue, 8 Jul 2014 08:26:51 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 66739201D5 for ; Tue, 8 Jul 2014 08:26:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A99426E51A; Tue, 8 Jul 2014 01:26:49 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from hqemgate14.nvidia.com (hqemgate14.nvidia.com [216.228.121.143]) by gabe.freedesktop.org (Postfix) with ESMTP id 59A6F6E519; Tue, 8 Jul 2014 01:26:48 -0700 (PDT) Received: from hqnvupgp08.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com id ; Tue, 08 Jul 2014 01:27:36 -0700 Received: from hqemhub01.nvidia.com ([172.20.12.94]) by hqnvupgp08.nvidia.com (PGP Universal service); Tue, 08 Jul 2014 01:19:52 -0700 X-PGP-Universal: processed; by hqnvupgp08.nvidia.com on Tue, 08 Jul 2014 01:19:52 -0700 Received: from percival.nvidia.com (172.20.144.16) by hqemhub01.nvidia.com (172.20.150.30) with Microsoft SMTP Server (TLS) id 8.3.342.0; Tue, 8 Jul 2014 01:26:47 -0700 From: Alexandre Courbot To: Ben Skeggs , David Airlie , David Herrmann , Lucas Stach , Thierry Reding , Maarten Lankhorst Subject: [PATCH v4 5/6] drm/nouveau: implement explicitly coherent BOs Date: Tue, 8 Jul 2014 17:26:00 +0900 Message-ID: <1404807961-30530-6-git-send-email-acourbot@nvidia.com> X-Mailer: git-send-email 2.0.0 In-Reply-To: <1404807961-30530-1-git-send-email-acourbot@nvidia.com> References: <1404807961-30530-1-git-send-email-acourbot@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 Cc: gnurou@gmail.com, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-tegra@vger.kernel.org X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Allow nouveau_bo_new() to recognize the TTM_PL_FLAG_UNCACHED flag, which means that we want the allocated BO to be perfectly coherent between the CPU and GPU. This is useful on non-coherent architectures for which we do not want to manually sync some rarely-accessed buffers: typically, fences and pushbuffers. A TTM BO allocated with the TTM_PL_FLAG_UNCACHED on a non-coherent architecture will be populated using the DMA API, and accesses to it performed using the coherent mapping performed by dma_alloc_coherent(). Signed-off-by: Alexandre Courbot --- drivers/gpu/drm/nouveau/nouveau_bo.c | 76 ++++++++++++++++++++++++++++++++---- drivers/gpu/drm/nouveau/nouveau_bo.h | 1 + 2 files changed, 69 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 47e4e8886769..23a29adfabf0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -219,6 +219,9 @@ nouveau_bo_new(struct drm_device *dev, int size, int align, nvbo->tile_flags = tile_flags; nvbo->bo.bdev = &drm->ttm.bdev; + if (!nv_device_is_cpu_coherent(nouveau_dev(dev))) + nvbo->force_coherent = flags & TTM_PL_FLAG_UNCACHED; + nvbo->page_shift = 12; if (drm->client.base.vm) { if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024) @@ -289,8 +292,9 @@ void nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy) { struct ttm_placement *pl = &nvbo->placement; - uint32_t flags = TTM_PL_MASK_CACHING | - (nvbo->pin_refcnt ? TTM_PL_FLAG_NO_EVICT : 0); + uint32_t flags = (nvbo->force_coherent ? TTM_PL_FLAG_UNCACHED : + TTM_PL_MASK_CACHING) | + (nvbo->pin_refcnt ? TTM_PL_FLAG_NO_EVICT : 0); pl->placement = nvbo->placements; set_placement_list(nvbo->placements, &pl->num_placement, @@ -390,7 +394,14 @@ nouveau_bo_map(struct nouveau_bo *nvbo) if (ret) return ret; - ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages, &nvbo->kmap); + /* + * TTM buffers allocated using the DMA API already have a mapping, let's + * use it instead. + */ + if (!nvbo->force_coherent) + ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages, + &nvbo->kmap); + ttm_bo_unreserve(&nvbo->bo); return ret; } @@ -398,7 +409,14 @@ nouveau_bo_map(struct nouveau_bo *nvbo) void nouveau_bo_unmap(struct nouveau_bo *nvbo) { - if (nvbo) + if (!nvbo) + return; + + /* + * TTM buffers allocated using the DMA API already had a coherent + * mapping which we used, no need to unmap. + */ + if (!nvbo->force_coherent) ttm_bo_kunmap(&nvbo->kmap); } @@ -482,12 +500,36 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible, return 0; } +static inline void * +_nouveau_bo_mem_index(struct nouveau_bo *nvbo, unsigned index, void *mem, u8 sz) +{ + struct ttm_dma_tt *dma_tt; + u8 *m = mem; + + index *= sz; + + if (m) { + /* kmap'd address, return the corresponding offset */ + m += index; + } else { + /* DMA-API mapping, lookup the right address */ + dma_tt = (struct ttm_dma_tt *)nvbo->bo.ttm; + m = dma_tt->cpu_address[index / PAGE_SIZE]; + m += index % PAGE_SIZE; + } + + return m; +} +#define nouveau_bo_mem_index(o, i, m) _nouveau_bo_mem_index(o, i, m, sizeof(*m)) + u16 nouveau_bo_rd16(struct nouveau_bo *nvbo, unsigned index) { bool is_iomem; u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = &mem[index]; + + mem = nouveau_bo_mem_index(nvbo, index, mem); + if (is_iomem) return ioread16_native((void __force __iomem *)mem); else @@ -499,7 +541,9 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index, u16 val) { bool is_iomem; u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = &mem[index]; + + mem = nouveau_bo_mem_index(nvbo, index, mem); + if (is_iomem) iowrite16_native(val, (void __force __iomem *)mem); else @@ -511,7 +555,9 @@ nouveau_bo_rd32(struct nouveau_bo *nvbo, unsigned index) { bool is_iomem; u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = &mem[index]; + + mem = nouveau_bo_mem_index(nvbo, index, mem); + if (is_iomem) return ioread32_native((void __force __iomem *)mem); else @@ -523,7 +569,9 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) { bool is_iomem; u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = &mem[index]; + + mem = nouveau_bo_mem_index(nvbo, index, mem); + if (is_iomem) iowrite32_native(val, (void __force __iomem *)mem); else @@ -1426,6 +1474,14 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) device = nv_device(drm->device); dev = drm->dev; + /* + * Objects matching this condition have been marked as force_coherent, + * so use the DMA API for them. + */ + if (!nv_device_is_cpu_coherent(device) && + ttm->caching_state == tt_uncached) + return ttm_dma_populate(ttm_dma, dev->dev); + #if __OS_HAS_AGP if (drm->agp.stat == ENABLED) { return ttm_agp_tt_populate(ttm); @@ -1476,6 +1532,10 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) device = nv_device(drm->device); dev = drm->dev; + if (!nv_device_is_cpu_coherent(device) && + ttm->caching_state == tt_uncached) + ttm_dma_unpopulate(ttm_dma, dev->dev); + #if __OS_HAS_AGP if (drm->agp.stat == ENABLED) { ttm_agp_tt_unpopulate(ttm); diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h index fa42298d2dca..9a111b92fb34 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.h +++ b/drivers/gpu/drm/nouveau/nouveau_bo.h @@ -11,6 +11,7 @@ struct nouveau_bo { u32 valid_domains; u32 placements[3]; u32 busy_placements[3]; + bool force_coherent; struct ttm_bo_kmap_obj kmap; struct list_head head;