From patchwork Wed Sep 28 15:42:42 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Anholt X-Patchwork-Id: 9354117 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6A3FF6077A for ; Wed, 28 Sep 2016 15:42:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B011296D1 for ; Wed, 28 Sep 2016 15:42:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4CF1C296E8; Wed, 28 Sep 2016 15:42:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5BBD0296D1 for ; Wed, 28 Sep 2016 15:42:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A46336E7CF; Wed, 28 Sep 2016 15:42:45 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from anholt.net (anholt.net [50.246.234.109]) by gabe.freedesktop.org (Postfix) with ESMTP id EE9266E7CF for ; Wed, 28 Sep 2016 15:42:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by anholt.net (Postfix) with ESMTP id 4CBBB2C9C06F; Wed, 28 Sep 2016 08:42:44 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at anholt.net Received: from anholt.net ([127.0.0.1]) by localhost (kingsolver.anholt.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id zkBLXrq804hW; Wed, 28 Sep 2016 08:42:43 -0700 (PDT) Received: from eliezer.anholt.net (localhost [127.0.0.1]) by anholt.net (Postfix) with ESMTP id 401EB2C9C06D; Wed, 28 Sep 2016 08:42:43 -0700 (PDT) Received: by eliezer.anholt.net (Postfix, from userid 1000) id 65A8F2E730F; Wed, 28 Sep 2016 08:42:42 -0700 (PDT) From: Eric Anholt To: dri-devel@lists.freedesktop.org Subject: [PATCH] drm/vc4: Fix races when the CS reads from render targets. Date: Wed, 28 Sep 2016 08:42:42 -0700 Message-Id: <20160928154242.19151-1-eric@anholt.net> X-Mailer: git-send-email 2.9.3 Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP With the introduction of bin/render pipelining, the previous job may not be completed when we start binning the next one. If the previous job wrote our VBO, IB, or CS textures, then the binning stage might get stale or uninitialized results. Fixes the major rendering failure in glmark2 -b terrain. Signed-off-by: Eric Anholt Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs") Cc: stable@vger.kernel.org --- drivers/gpu/drm/vc4/vc4_drv.h | 19 ++++++++++++++++++- drivers/gpu/drm/vc4/vc4_gem.c | 13 +++++++++++++ drivers/gpu/drm/vc4/vc4_render_cl.c | 21 +++++++++++++++++---- drivers/gpu/drm/vc4/vc4_validate.c | 17 ++++++++++++++--- 4 files changed, 62 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h index 428e24919ef1..f696b752886b 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.h +++ b/drivers/gpu/drm/vc4/vc4_drv.h @@ -122,9 +122,16 @@ to_vc4_dev(struct drm_device *dev) struct vc4_bo { struct drm_gem_cma_object base; - /* seqno of the last job to render to this BO. */ + /* seqno of the last job to render using this BO. */ uint64_t seqno; + /* seqno of the last job to use the RCL to write to this BO. + * + * Note that this doesn't include binner overflow memory + * writes. + */ + uint64_t write_seqno; + /* List entry for the BO's position in either * vc4_exec_info->unref_list or vc4_dev->bo_cache.time_list */ @@ -216,6 +223,9 @@ struct vc4_exec_info { /* Sequence number for this bin/render job. */ uint64_t seqno; + /* Latest write_seqno of any BO that binning depends on. */ + uint64_t bin_dep_seqno; + /* Last current addresses the hardware was processing when the * hangcheck timer checked on us. */ @@ -230,6 +240,13 @@ struct vc4_exec_info { struct drm_gem_cma_object **bo; uint32_t bo_count; + /* List of BOs that are being written by the RCL. Other than + * the binner temporary storage, this is all the BOs written + * by the job. + */ + struct drm_gem_cma_object *rcl_write_bo[4]; + uint32_t rcl_write_bo_count; + /* Pointers for our position in vc4->job_list */ struct list_head head; diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c index b262c5c26f10..ae1609e739ef 100644 --- a/drivers/gpu/drm/vc4/vc4_gem.c +++ b/drivers/gpu/drm/vc4/vc4_gem.c @@ -471,6 +471,11 @@ vc4_update_bo_seqnos(struct vc4_exec_info *exec, uint64_t seqno) list_for_each_entry(bo, &exec->unref_list, unref_head) { bo->seqno = seqno; } + + for (i = 0; i < exec->rcl_write_bo_count; i++) { + bo = to_vc4_bo(&exec->rcl_write_bo[i]->base); + bo->write_seqno = seqno; + } } /* Queues a struct vc4_exec_info for execution. If no job is @@ -673,6 +678,14 @@ vc4_get_bcl(struct drm_device *dev, struct vc4_exec_info *exec) goto fail; ret = vc4_validate_shader_recs(dev, exec); + if (ret) + goto fail; + + /* Block waiting on any previous rendering into the CS's VBO, + * IB, or textures, so that pixels are actually written by the + * time we try to read them. + */ + ret = vc4_wait_for_seqno(dev, exec->bin_dep_seqno, ~0ull, true); fail: drm_free_large(temp); diff --git a/drivers/gpu/drm/vc4/vc4_render_cl.c b/drivers/gpu/drm/vc4/vc4_render_cl.c index 0f12418725e5..08886a309757 100644 --- a/drivers/gpu/drm/vc4/vc4_render_cl.c +++ b/drivers/gpu/drm/vc4/vc4_render_cl.c @@ -45,6 +45,8 @@ struct vc4_rcl_setup { struct drm_gem_cma_object *rcl; u32 next_offset; + + u32 next_write_bo_index; }; static inline void rcl_u8(struct vc4_rcl_setup *setup, u8 val) @@ -407,6 +409,8 @@ static int vc4_rcl_msaa_surface_setup(struct vc4_exec_info *exec, if (!*obj) return -EINVAL; + exec->rcl_write_bo[exec->rcl_write_bo_count++] = *obj; + if (surf->offset & 0xf) { DRM_ERROR("MSAA write must be 16b aligned.\n"); return -EINVAL; @@ -417,7 +421,8 @@ static int vc4_rcl_msaa_surface_setup(struct vc4_exec_info *exec, static int vc4_rcl_surface_setup(struct vc4_exec_info *exec, struct drm_gem_cma_object **obj, - struct drm_vc4_submit_rcl_surface *surf) + struct drm_vc4_submit_rcl_surface *surf, + bool is_write) { uint8_t tiling = VC4_GET_FIELD(surf->bits, VC4_LOADSTORE_TILE_BUFFER_TILING); @@ -440,6 +445,9 @@ static int vc4_rcl_surface_setup(struct vc4_exec_info *exec, if (!*obj) return -EINVAL; + if (is_write) + exec->rcl_write_bo[exec->rcl_write_bo_count++] = *obj; + if (surf->flags & VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES) { if (surf == &exec->args->zs_write) { DRM_ERROR("general zs write may not be a full-res.\n"); @@ -542,6 +550,8 @@ vc4_rcl_render_config_surface_setup(struct vc4_exec_info *exec, if (!*obj) return -EINVAL; + exec->rcl_write_bo[exec->rcl_write_bo_count++] = *obj; + if (tiling > VC4_TILING_FORMAT_LT) { DRM_ERROR("Bad tiling format\n"); return -EINVAL; @@ -599,15 +609,18 @@ int vc4_get_rcl(struct drm_device *dev, struct vc4_exec_info *exec) if (ret) return ret; - ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read); + ret = vc4_rcl_surface_setup(exec, &setup.color_read, &args->color_read, + false); if (ret) return ret; - ret = vc4_rcl_surface_setup(exec, &setup.zs_read, &args->zs_read); + ret = vc4_rcl_surface_setup(exec, &setup.zs_read, &args->zs_read, + false); if (ret) return ret; - ret = vc4_rcl_surface_setup(exec, &setup.zs_write, &args->zs_write); + ret = vc4_rcl_surface_setup(exec, &setup.zs_write, &args->zs_write, + true); if (ret) return ret; diff --git a/drivers/gpu/drm/vc4/vc4_validate.c b/drivers/gpu/drm/vc4/vc4_validate.c index 9ce1d0adf882..26503e307438 100644 --- a/drivers/gpu/drm/vc4/vc4_validate.c +++ b/drivers/gpu/drm/vc4/vc4_validate.c @@ -267,6 +267,9 @@ validate_indexed_prim_list(VALIDATE_ARGS) if (!ib) return -EINVAL; + exec->bin_dep_seqno = max(exec->bin_dep_seqno, + to_vc4_bo(&ib->base)->write_seqno); + if (offset > ib->base.size || (ib->base.size - offset) / index_size < length) { DRM_ERROR("IB access overflow (%d + %d*%d > %zd)\n", @@ -555,8 +558,7 @@ static bool reloc_tex(struct vc4_exec_info *exec, void *uniform_data_u, struct vc4_texture_sample_info *sample, - uint32_t texture_handle_index) - + uint32_t texture_handle_index, bool is_cs) { struct drm_gem_cma_object *tex; uint32_t p0 = *(uint32_t *)(uniform_data_u + sample->p_offset[0]); @@ -714,6 +716,11 @@ reloc_tex(struct vc4_exec_info *exec, *validated_p0 = tex->paddr + p0; + if (is_cs) { + exec->bin_dep_seqno = max(exec->bin_dep_seqno, + to_vc4_bo(&tex->base)->write_seqno); + } + return true; fail: DRM_INFO("Texture p0 at %d: 0x%08x\n", sample->p_offset[0], p0); @@ -835,7 +842,8 @@ validate_gl_shader_rec(struct drm_device *dev, if (!reloc_tex(exec, uniform_data_u, &validated_shader->texture_samples[tex], - texture_handles_u[tex])) { + texture_handles_u[tex], + i == 2)) { return -EINVAL; } } @@ -867,6 +875,9 @@ validate_gl_shader_rec(struct drm_device *dev, uint32_t stride = *(uint8_t *)(pkt_u + o + 5); uint32_t max_index; + exec->bin_dep_seqno = max(exec->bin_dep_seqno, + to_vc4_bo(&vbo->base)->write_seqno); + if (state->addr & 0x8) stride |= (*(uint32_t *)(pkt_u + 100 + i * 4)) & ~0xff;