From patchwork Wed Apr 19 09:41:18 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9687145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 34A29602C9 for ; Wed, 19 Apr 2017 09:41:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24CC4283F1 for ; Wed, 19 Apr 2017 09:41:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19AC6283F9; Wed, 19 Apr 2017 09:41:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 56E1D283F1 for ; Wed, 19 Apr 2017 09:41:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C748B6E2A6; Wed, 19 Apr 2017 09:41:51 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wr0-x244.google.com (mail-wr0-x244.google.com [IPv6:2a00:1450:400c:c0c::244]) by gabe.freedesktop.org (Postfix) with ESMTPS id A15E76E1F4 for ; Wed, 19 Apr 2017 09:41:49 +0000 (UTC) Received: by mail-wr0-x244.google.com with SMTP id o21so2439092wrb.3 for ; Wed, 19 Apr 2017 02:41:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=Op6PmaEqKMDdFfkAxYiH7/n3dCp+Gvp7OoiJDctDzRw=; b=Y4dwCz3Vof3Pwx6SqolnLZgSta5A9Ks4jc2rEpNl/wkyZXlXMpFharBySKVgKCpKHF ys6NhbwP0zpN2irC6HP/l+eL1czCmjpzBx6IMD/j3Yz94l2r2opWVT+wZNgOFtGqMELK X81TRChVqxcPubcux783nYUiGh05uwwu5IW0g6bmAEFuLRN4gRQVRz3yFKMAl3uJZb6d ldhSjq+2gMB4TFu1KhkU6bGT4u4YQYow3ozCkJKWaiS78pbJpiQYj1urC2AS79BpUJur b31L1SvvXZa7BnPpCiR0xEOmT08TGk7Do5lJwSKXAKUgLEmGhQbavMBjKuDXwjwlRVet rblg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=Op6PmaEqKMDdFfkAxYiH7/n3dCp+Gvp7OoiJDctDzRw=; b=Hhetr3SISKDOI2XYmklZL7u/C1jGeOePfIGLZma/XvL9zI247w3W2yjD2rq+Dh9xqd lQecYy7y5Ez6N0cacs3uuynfsK3jbyh6UtTzcxGcz/W8+lmNS1aWSp2xFT10IRWTZslh Nv4S1C/rpeYbs5z/pyOg88xzWehgjVbWjlL8hbB6JJc4qS4uNMJsOoxOliKX/W040NEI V5osoOYFlGTgr6o4UIxCq4sOzsiSz8SgCnEUU6a2rW2wuuDEy0o1kM03fikvyr0izHeo KLB5I3T3ABElJ1UIRkJ8Bw4fLeATovWlmIXsJD7SiIBXOueOSbJagg1dnzWnyRP4Dqgr rB3w== X-Gm-Message-State: AN3rC/4svoJcIrpeKLKY+XolkiYMKgpPoccxP8GXTekuihvLS6K/sllZ K6tySUzyOcTajw== X-Received: by 10.223.169.205 with SMTP id b71mr1883625wrd.80.1492594908152; Wed, 19 Apr 2017 02:41:48 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id l132sm18654478wmd.10.2017.04.19.02.41.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Apr 2017 02:41:47 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Wed, 19 Apr 2017 10:41:18 +0100 Message-Id: <20170419094143.16922-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170419094143.16922-1-chris@chris-wilson.co.uk> References: <20170419094143.16922-1-chris@chris-wilson.co.uk> Cc: Dongwon Kim Subject: [Intel-gfx] [PATCH 02/27] drm/i915: Mark CPU cache as dirty on every transition for CPU writes X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Currently, we only mark the CPU cache as dirty if we skip a clflush. This leads to some confusion where we have to ask if the object is in the write domain or missed a clflush. If we always mark the cache as dirty, this becomes a much simply question to answer. The goal remains to do as few clflushes as required and to do them as late as possible, in the hope of deferring the work to a kthread and not block the caller (e.g. execbuf, flips). Reported-by: Dongwon Kim Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout") Signed-off-by: Chris Wilson Cc: Dongwon Kim Cc: Matt Roper --- drivers/gpu/drm/i915/i915_gem.c | 78 +++++++++++++++--------- drivers/gpu/drm/i915/i915_gem_clflush.c | 15 +++-- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 21 +++---- drivers/gpu/drm/i915/i915_gem_internal.c | 3 +- drivers/gpu/drm/i915/i915_gem_userptr.c | 5 +- drivers/gpu/drm/i915/selftests/huge_gem_object.c | 3 +- 6 files changed, 70 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 33fb11cc5acc..488ca7733c1e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -49,7 +49,7 @@ static void i915_gem_flush_free_objects(struct drm_i915_private *i915); static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) { - if (obj->base.write_domain == I915_GEM_DOMAIN_CPU) + if (obj->cache_dirty) return false; if (!i915_gem_object_is_coherent(obj)) @@ -233,6 +233,14 @@ i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj) return st; } +static void __start_cpu_write(struct drm_i915_gem_object *obj) +{ + obj->base.read_domains = I915_GEM_DOMAIN_CPU; + obj->base.write_domain = I915_GEM_DOMAIN_CPU; + if (cpu_write_needs_clflush(obj)) + obj->cache_dirty = true; +} + static void __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages, @@ -248,8 +256,7 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, !i915_gem_object_is_coherent(obj)) drm_clflush_sg(pages); - obj->base.read_domains = I915_GEM_DOMAIN_CPU; - obj->base.write_domain = I915_GEM_DOMAIN_CPU; + __start_cpu_write(obj); } static void @@ -684,6 +691,12 @@ i915_gem_dumb_create(struct drm_file *file, args->size, &args->handle); } +static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) +{ + return !(obj->cache_level == I915_CACHE_NONE || + obj->cache_level == I915_CACHE_WT); +} + /** * Creates a new mm object and returns a handle to it. * @dev: drm device pointer @@ -753,6 +766,11 @@ flush_write_domain(struct drm_i915_gem_object *obj, unsigned int flush_domains) case I915_GEM_DOMAIN_CPU: i915_gem_clflush_object(obj, I915_CLFLUSH_SYNC); break; + + case I915_GEM_DOMAIN_RENDER: + if (gpu_write_needs_clflush(obj)) + obj->cache_dirty = true; + break; } obj->base.write_domain = 0; @@ -854,7 +872,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj, * optimizes for the case when the gpu will dirty the data * anyway again before the next pread happens. */ - if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) + if (!obj->cache_dirty && + !(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) *needs_clflush = CLFLUSH_BEFORE; out: @@ -906,14 +925,15 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj, * This optimizes for the case when the gpu will use the data * right away and we therefore have to clflush anyway. */ - if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) + if (!obj->cache_dirty) { *needs_clflush |= CLFLUSH_AFTER; - /* Same trick applies to invalidate partially written cachelines read - * before writing. - */ - if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) - *needs_clflush |= CLFLUSH_BEFORE; + /* Same trick applies to invalidate partially written + * cachelines read before writing. + */ + if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) + *needs_clflush |= CLFLUSH_BEFORE; + } out: intel_fb_obj_invalidate(obj, ORIGIN_CPU); @@ -3374,10 +3394,12 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags) static void __i915_gem_object_flush_for_display(struct drm_i915_gem_object *obj) { - if (obj->base.write_domain != I915_GEM_DOMAIN_CPU && !obj->cache_dirty) - return; - - i915_gem_clflush_object(obj, I915_CLFLUSH_FORCE); + /* We manually flush the CPU domain so that we can override and + * force the flush for the display, and perform it asyncrhonously. + */ + flush_write_domain(obj, ~I915_GEM_DOMAIN_CPU); + if (obj->cache_dirty) + i915_gem_clflush_object(obj, I915_CLFLUSH_FORCE); obj->base.write_domain = 0; } @@ -3636,14 +3658,17 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, } } - if (obj->base.write_domain == I915_GEM_DOMAIN_CPU && - i915_gem_object_is_coherent(obj)) - obj->cache_dirty = true; + /* Catch any deferred obj->cache_dirty markups */ + flush_write_domain(obj, ~I915_GEM_DOMAIN_CPU); list_for_each_entry(vma, &obj->vma_list, obj_link) vma->node.color = cache_level; obj->cache_level = cache_level; + if (obj->base.write_domain & I915_GEM_DOMAIN_CPU && + cpu_write_needs_clflush(obj)) + obj->cache_dirty = true; + return 0; } @@ -3864,9 +3889,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) if (ret) return ret; - if (obj->base.write_domain == I915_GEM_DOMAIN_CPU) - return 0; - flush_write_domain(obj, ~I915_GEM_DOMAIN_CPU); /* Flush the CPU cache if it's still invalid. */ @@ -3878,15 +3900,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) /* It should now be out of any other write domains, and we can update * the domain values for our changes. */ - GEM_BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0); + GEM_BUG_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU); /* If we're writing through the CPU, then the GPU read domains will * need to be invalidated at next use. */ - if (write) { - obj->base.read_domains = I915_GEM_DOMAIN_CPU; - obj->base.write_domain = I915_GEM_DOMAIN_CPU; - } + if (write) + __start_cpu_write(obj); return 0; } @@ -4306,6 +4326,8 @@ i915_gem_object_create(struct drm_i915_private *dev_priv, u64 size) } else obj->cache_level = I915_CACHE_NONE; + obj->cache_dirty = !i915_gem_object_is_coherent(obj); + trace_i915_gem_object_create(obj); return obj; @@ -4968,10 +4990,8 @@ int i915_gem_freeze_late(struct drm_i915_private *dev_priv) mutex_lock(&dev_priv->drm.struct_mutex); for (p = phases; *p; p++) { - list_for_each_entry(obj, *p, global_link) { - obj->base.read_domains = I915_GEM_DOMAIN_CPU; - obj->base.write_domain = I915_GEM_DOMAIN_CPU; - } + list_for_each_entry(obj, *p, global_link) + __start_cpu_write(obj); } mutex_unlock(&dev_priv->drm.struct_mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_clflush.c b/drivers/gpu/drm/i915/i915_gem_clflush.c index ffd01e02fe94..a895643c4dc4 100644 --- a/drivers/gpu/drm/i915/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/i915_gem_clflush.c @@ -72,8 +72,6 @@ static const struct dma_fence_ops i915_clflush_ops = { static void __i915_do_clflush(struct drm_i915_gem_object *obj) { drm_clflush_sg(obj->mm.pages); - obj->cache_dirty = false; - intel_fb_obj_flush(obj, ORIGIN_CPU); } @@ -82,9 +80,6 @@ static void i915_clflush_work(struct work_struct *work) struct clflush *clflush = container_of(work, typeof(*clflush), work); struct drm_i915_gem_object *obj = clflush->obj; - if (!obj->cache_dirty) - goto out; - if (i915_gem_object_pin_pages(obj)) { DRM_ERROR("Failed to acquire obj->pages for clflushing\n"); goto out; @@ -132,10 +127,10 @@ void i915_gem_clflush_object(struct drm_i915_gem_object *obj, * anything not backed by physical memory we consider to be always * coherent and not need clflushing. */ - if (!i915_gem_object_has_struct_page(obj)) + if (!i915_gem_object_has_struct_page(obj)) { + obj->cache_dirty = false; return; - - obj->cache_dirty = true; + } /* If the GPU is snooping the contents of the CPU cache, * we do not need to manually clear the CPU cache lines. However, @@ -154,6 +149,8 @@ void i915_gem_clflush_object(struct drm_i915_gem_object *obj, if (!(flags & I915_CLFLUSH_SYNC)) clflush = kmalloc(sizeof(*clflush), GFP_KERNEL); if (clflush) { + GEM_BUG_ON(!obj->cache_dirty); + dma_fence_init(&clflush->dma, &i915_clflush_ops, &clflush_lock, @@ -181,6 +178,8 @@ void i915_gem_clflush_object(struct drm_i915_gem_object *obj, } else { GEM_BUG_ON(obj->base.write_domain != I915_GEM_DOMAIN_CPU); } + + obj->cache_dirty = false; } void i915_gem_clflush_init(struct drm_i915_private *i915) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index af1965774e7b..ddc011ef5480 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -291,7 +291,7 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj) return DBG_USE_CPU_RELOC > 0; return (HAS_LLC(to_i915(obj->base.dev)) || - obj->base.write_domain == I915_GEM_DOMAIN_CPU || + obj->cache_dirty || obj->cache_level != I915_CACHE_NONE); } @@ -1129,10 +1129,8 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req, if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC) continue; - if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) { + if (obj->base.write_domain & obj->cache_dirty) i915_gem_clflush_object(obj, 0); - obj->base.write_domain = 0; - } ret = i915_gem_request_await_object (req, obj, obj->base.pending_write_domain); @@ -1265,12 +1263,6 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file, return ctx; } -static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) -{ - return !(obj->cache_level == I915_CACHE_NONE || - obj->cache_level == I915_CACHE_WT); -} - void i915_vma_move_to_active(struct i915_vma *vma, struct drm_i915_gem_request *req, unsigned int flags) @@ -1294,15 +1286,16 @@ void i915_vma_move_to_active(struct i915_vma *vma, i915_gem_active_set(&vma->last_read[idx], req); list_move_tail(&vma->vm_link, &vma->vm->active_list); + obj->base.write_domain = 0; if (flags & EXEC_OBJECT_WRITE) { + obj->base.write_domain = I915_GEM_DOMAIN_RENDER; + if (intel_fb_obj_invalidate(obj, ORIGIN_CS)) i915_gem_active_set(&obj->frontbuffer_write, req); - /* update for the implicit flush after a batch */ - obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS; - if (!obj->cache_dirty && gpu_write_needs_clflush(obj)) - obj->cache_dirty = true; + obj->base.read_domains = 0; } + obj->base.read_domains |= I915_GEM_GPU_DOMAINS; if (flags & EXEC_OBJECT_NEEDS_FENCE) i915_gem_active_set(&vma->last_fence, req); diff --git a/drivers/gpu/drm/i915/i915_gem_internal.c b/drivers/gpu/drm/i915/i915_gem_internal.c index fc950abbe400..58e93e87d573 100644 --- a/drivers/gpu/drm/i915/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/i915_gem_internal.c @@ -188,9 +188,10 @@ i915_gem_object_create_internal(struct drm_i915_private *i915, drm_gem_private_object_init(&i915->drm, &obj->base, size); i915_gem_object_init(obj, &i915_gem_object_internal_ops); - obj->base.write_domain = I915_GEM_DOMAIN_CPU; obj->base.read_domains = I915_GEM_DOMAIN_CPU; + obj->base.write_domain = I915_GEM_DOMAIN_CPU; obj->cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE; + obj->cache_dirty = !i915_gem_object_is_coherent(obj); return obj; } diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 58ccf8b8ca1c..9f84be171ad2 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -802,9 +802,10 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file drm_gem_private_object_init(dev, &obj->base, args->user_size); i915_gem_object_init(obj, &i915_gem_userptr_ops); - obj->cache_level = I915_CACHE_LLC; - obj->base.write_domain = I915_GEM_DOMAIN_CPU; obj->base.read_domains = I915_GEM_DOMAIN_CPU; + obj->base.write_domain = I915_GEM_DOMAIN_CPU; + obj->cache_level = I915_CACHE_LLC; + obj->cache_dirty = !i915_gem_object_is_coherent(obj); obj->userptr.ptr = args->user_ptr; obj->userptr.read_only = !!(args->flags & I915_USERPTR_READ_ONLY); diff --git a/drivers/gpu/drm/i915/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/selftests/huge_gem_object.c index 4e681fc13be4..0ca867a877b6 100644 --- a/drivers/gpu/drm/i915/selftests/huge_gem_object.c +++ b/drivers/gpu/drm/i915/selftests/huge_gem_object.c @@ -126,9 +126,10 @@ huge_gem_object(struct drm_i915_private *i915, drm_gem_private_object_init(&i915->drm, &obj->base, dma_size); i915_gem_object_init(obj, &huge_ops); - obj->base.write_domain = I915_GEM_DOMAIN_CPU; obj->base.read_domains = I915_GEM_DOMAIN_CPU; + obj->base.write_domain = I915_GEM_DOMAIN_CPU; obj->cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE; + obj->cache_dirty = !i915_gem_object_is_coherent(obj); obj->scratch = phys_size; return obj;