From patchwork Mon Jan 11 10:45:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 8001701 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 44E299F32E for ; Mon, 11 Jan 2016 10:51:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 30FE820279 for ; Mon, 11 Jan 2016 10:51:00 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 20C8E2024D for ; Mon, 11 Jan 2016 10:50:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ED7E46E32D; Mon, 11 Jan 2016 02:50:56 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by gabe.freedesktop.org (Postfix) with ESMTPS id D5AD86E2E2 for ; Mon, 11 Jan 2016 02:47:15 -0800 (PST) Received: by mail-wm0-f66.google.com with SMTP id l65so25642616wmf.3 for ; Mon, 11 Jan 2016 02:47:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=Fgo7ZufRWTMiSCsslJj3yqsD0VXR3mIekbsZjos2dFM=; b=pRoyIGBiQvWpsLBtDZTttXpNSqHoGddxjDHU4w1Pfz2Kux5RO+rl0sb6jlqnTULpOU xb4lMYD9HcyhZvB7kAVeJhamccxhl2ukAEcF+8oe2Wa96d3xAsIcKbTU27ScCHQjExck 8jroxlVt+OGdocB6VbrkptZeG081LUHdPqd4quYWRPFjXydDraZ5HLeOq8iDsSBk4BGY TZ0X6G2ePizUD0dwsBLYWtaMOFg8M+5wNYWH4/Ty6He9fekXF6Rbn4oJttvzE5FRib4j iyiyTDO1Z4lxwuczlahIMxN2vBY6kBnbla1fHdz39sJbOWJaWg1M2rvVlxLQyLdf0aGe 8PEw== X-Received: by 10.28.54.209 with SMTP id y78mr13633714wmh.26.1452509234681; Mon, 11 Jan 2016 02:47:14 -0800 (PST) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id t3sm118879383wjz.11.2016.01.11.02.47.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 11 Jan 2016 02:47:13 -0800 (PST) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 11 Jan 2016 10:45:12 +0000 Message-Id: <1452509174-16671-42-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1452509174-16671-1-git-send-email-chris@chris-wilson.co.uk> References: <1452503961-14837-1-git-send-email-chris@chris-wilson.co.uk> <1452509174-16671-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is a companion to i915_gem_obj_prepare_shmem_read() that prepares the backing storage for direct writes. It first serialises with the GPU, pins the backing storage and then indicates what clfushes are required in order for the writes to be coherent. Whilst here, fix support for ancient CPUs without clflush for which we cannot do the GTT+clflush tricks. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_cmd_parser.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 6 +- drivers/gpu/drm/i915/i915_gem.c | 112 +++++++++++++++++++++------------ 3 files changed, 79 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 814d894ed925..fae127166e2c 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -902,7 +902,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj, u32 batch_start_offset, u32 batch_len) { - int needs_clflush = 0; + unsigned needs_clflush; void *src_base, *src; void *dst = NULL; int ret; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 693f472bd604..45a0da0947cd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2761,7 +2761,11 @@ void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv); void i915_gem_release_mmap(struct drm_i915_gem_object *obj); int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj, - int *needs_clflush); + unsigned *needs_clflush); +int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj, + unsigned *needs_clflush); +#define CLFLUSH_BEFORE 0x1 +#define CLFLUSH_AFTER 0x2 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 01c20a336c04..e18c0d4d24ad 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -492,34 +492,92 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset, * flush the object from the CPU cache. */ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj, - int *needs_clflush) + unsigned *needs_clflush) { int ret; *needs_clflush = 0; - if (!obj->base.filp) return -EINVAL; if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) { + ret = i915_gem_object_wait_rendering(obj, true); + if (ret) + return ret; + /* If we're not in the cpu read domain, set ourself into the gtt * read domain and manually flush cachelines (if required). This * optimizes for the case when the gpu will dirty the data * anyway again before the next pread happens. */ *needs_clflush = !cpu_cache_is_coherent(obj->base.dev, obj->cache_level); - ret = i915_gem_object_wait_rendering(obj, true); + } + + ret = i915_gem_object_get_pages(obj); + if (ret) + return ret; + + i915_gem_object_pin_pages(obj); + + if (*needs_clflush && !cpu_has_clflush) { + ret = i915_gem_object_set_to_cpu_domain(obj, false); + if (ret) { + i915_gem_object_unpin_pages(obj); + return ret; + } + *needs_clflush = 0; + } + + return 0; +} + +int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj, + unsigned *needs_clflush) +{ + int ret; + + *needs_clflush = 0; + if (!obj->base.filp) + return -EINVAL; + + if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) { + ret = i915_gem_object_wait_rendering(obj, false); if (ret) return ret; + + /* If we're not in the cpu write domain, set ourself into the + * gtt write domain and manually flush cachelines (as required). + * This optimizes for the case when the gpu will use the data + * right away and we therefore have to clflush anyway. + */ + *needs_clflush |= cpu_write_needs_clflush(obj) << 1; } + /* Same trick applies to invalidate partially written cachelines read + * before writing. + */ + if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) + *needs_clflush |= !cpu_cache_is_coherent(obj->base.dev, + obj->cache_level); + ret = i915_gem_object_get_pages(obj); if (ret) return ret; i915_gem_object_pin_pages(obj); - return ret; + if (*needs_clflush && !cpu_has_clflush) { + ret = i915_gem_object_set_to_cpu_domain(obj, true); + if (ret) { + i915_gem_object_unpin_pages(obj); + return ret; + } + *needs_clflush = 0; + } + + intel_fb_obj_invalidate(obj, ORIGIN_CPU); + obj->dirty = 1; + return 0; } /* Per-page copy function for the shmem pread fastpath. @@ -916,41 +974,17 @@ i915_gem_shmem_pwrite(struct drm_device *dev, int shmem_page_offset, page_length, ret = 0; int obj_do_bit17_swizzling, page_do_bit17_swizzling; int hit_slowpath = 0; - int needs_clflush_after = 0; - int needs_clflush_before = 0; + unsigned needs_clflush; struct sg_page_iter sg_iter; - user_data = to_user_ptr(args->data_ptr); - remain = args->size; - - obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj); - - if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) { - /* If we're not in the cpu write domain, set ourself into the gtt - * write domain and manually flush cachelines (if required). This - * optimizes for the case when the gpu will use the data - * right away and we therefore have to clflush anyway. */ - needs_clflush_after = cpu_write_needs_clflush(obj); - ret = i915_gem_object_wait_rendering(obj, false); - if (ret) - return ret; - } - /* Same trick applies to invalidate partially written cachelines read - * before writing. */ - if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) - needs_clflush_before = - !cpu_cache_is_coherent(dev, obj->cache_level); - - ret = i915_gem_object_get_pages(obj); + ret = i915_gem_obj_prepare_shmem_write(obj, &needs_clflush); if (ret) return ret; - intel_fb_obj_invalidate(obj, ORIGIN_CPU); - - i915_gem_object_pin_pages(obj); - + obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj); + user_data = to_user_ptr(args->data_ptr); offset = args->offset; - obj->dirty = 1; + remain = args->size; for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, offset >> PAGE_SHIFT) { @@ -974,7 +1008,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, /* If we don't overwrite a cacheline completely we need to be * careful to have up-to-date data by first clflushing. Don't * overcomplicate things and flush the entire patch. */ - partial_cacheline_write = needs_clflush_before && + partial_cacheline_write = needs_clflush & CLFLUSH_BEFORE && ((shmem_page_offset | page_length) & (boot_cpu_data.x86_clflush_size - 1)); @@ -984,7 +1018,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, ret = shmem_pwrite_fast(page, shmem_page_offset, page_length, user_data, page_do_bit17_swizzling, partial_cacheline_write, - needs_clflush_after); + needs_clflush & CLFLUSH_AFTER); if (ret == 0) goto next_page; @@ -993,7 +1027,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, ret = shmem_pwrite_slow(page, shmem_page_offset, page_length, user_data, page_do_bit17_swizzling, partial_cacheline_write, - needs_clflush_after); + needs_clflush & CLFLUSH_AFTER); mutex_lock(&dev->struct_mutex); @@ -1015,14 +1049,14 @@ out: * cachelines in-line while writing and the object moved * out of the cpu write domain while we've dropped the lock. */ - if (!needs_clflush_after && + if (!(needs_clflush & CLFLUSH_AFTER) && obj->base.write_domain != I915_GEM_DOMAIN_CPU) { if (i915_gem_clflush_object(obj, obj->pin_display)) - needs_clflush_after = true; + needs_clflush |= CLFLUSH_AFTER; } } - if (needs_clflush_after) + if (needs_clflush & CLFLUSH_AFTER) i915_gem_chipset_flush(dev); else obj->cache_dirty = true;