From patchwork Tue Dec 21 20:00:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m?= X-Patchwork-Id: 12690523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7E4E1C433F5 for ; Tue, 21 Dec 2021 20:01:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1BC2A10FF28; Tue, 21 Dec 2021 20:01:11 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8B13D10FD3C; Tue, 21 Dec 2021 20:01:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640116869; x=1671652869; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WSmF3S2hjt9tmHYokmq8rYaGPzGACIc5aFFSK6FQNGI=; b=dKnkCBQvG6pAr7n92WMZF2QfYsemlQ2yuGs+LdXbA30uyACelKDJNX6i SWQkev35cM8l1OJU3bIH1HYUIkzjbYEMTg6OcZTvISmbuu4CsePBOCWqP A7runlGZksfBRuQau2ihrbsqRiwLXbT/uufnyw4Gy4UviD73Dh8qZlmgX DNCmCiWUfyyNXsHDQjAiTxtw386Pg11ANUbBrIIxm3Q1+urUoRTt7JADR W3TYuP8+yI4P83cDG1aZgQUw79801tiPgiVhsnAGvUekX8DG0gCXkl/qp ZEMN2G4IWr1rOen561lgBMQvDLr97FuxSvJj29WnQqrrPXdmTPwca/TtK A==; X-IronPort-AV: E=McAfee;i="6200,9189,10205"; a="221157927" X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="221157927" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="616887409" Received: from arajji-mobl.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.222]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:07 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Tue, 21 Dec 2021 21:00:47 +0100 Message-Id: <20211221200050.436316-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> References: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 1/4] drm/i915: Avoid using the i915_fence_array when collecting dependencies X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since the gt migration code was using only a single fence for dependencies, these were collected in a dma_fence_array. However, it turns out that it's illegal to use some dma_fences in a dma_fence_array, in particular other dma_fence_arrays and dma_fence_chains, and this causes trouble for us moving forward. Have the gt migration code instead take a const struct i915_deps for dependencies. This means we can skip the dma_fence_array creation and instead pass the struct i915_deps instead to circumvent the problem. v2: - Make the prev_deps() function static. (kernel test robot ) - Update the struct i915_deps kerneldoc. v4: - Rebase. Fixes: 5652df829b3c ("drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous") Signed-off-by: Thomas Hellström Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 129 +++++-------------- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 17 +++ drivers/gpu/drm/i915/gt/intel_migrate.c | 24 ++-- drivers/gpu/drm/i915/gt/intel_migrate.h | 9 +- drivers/gpu/drm/i915/i915_request.c | 22 ++++ drivers/gpu/drm/i915/i915_request.h | 2 + 6 files changed, 91 insertions(+), 112 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index 8ad09fcf3698..62a6fd2d127d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -3,8 +3,6 @@ * Copyright © 2021 Intel Corporation */ -#include - #include #include "i915_drv.h" @@ -44,17 +42,12 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration, #endif /** - * DOC: Set of utilities to dynamically collect dependencies and - * eventually coalesce them into a single fence which is fed into - * the GT migration code, since it only accepts a single dependency - * fence. - * The single fence returned from these utilities, in the case of - * dependencies from multiple fence contexts, a struct dma_fence_array, - * since the i915 request code can break that up and await the individual - * fences. + * DOC: Set of utilities to dynamically collect dependencies into a + * structure which is fed into the GT migration code. * * Once we can do async unbinding, this is also needed to coalesce - * the migration fence with the unbind fences. + * the migration fence with the unbind fences if these are coalesced + * post-migration. * * While collecting the individual dependencies, we store the refcounted * struct dma_fence pointers in a realloc-managed pointer array, since @@ -65,32 +58,13 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration, * A struct i915_deps need to be initialized using i915_deps_init(). * If i915_deps_add_dependency() or i915_deps_add_resv() return an * error code they will internally call i915_deps_fini(), which frees - * all internal references and allocations. After a call to - * i915_deps_to_fence(), or i915_deps_sync(), the struct should similarly - * be viewed as uninitialized. + * all internal references and allocations. * * We might want to break this out into a separate file as a utility. */ #define I915_DEPS_MIN_ALLOC_CHUNK 8U -/** - * struct i915_deps - Collect dependencies into a single dma-fence - * @single: Storage for pointer if the collection is a single fence. - * @fence: Allocated array of fence pointers if more than a single fence; - * otherwise points to the address of @single. - * @num_deps: Current number of dependency fences. - * @fences_size: Size of the @fences array in number of pointers. - * @gfp: Allocation mode. - */ -struct i915_deps { - struct dma_fence *single; - struct dma_fence **fences; - unsigned int num_deps; - unsigned int fences_size; - gfp_t gfp; -}; - static void i915_deps_reset_fences(struct i915_deps *deps) { if (deps->fences != &deps->single) @@ -163,7 +137,7 @@ static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence, return ret; } -static int i915_deps_sync(struct i915_deps *deps, +static int i915_deps_sync(const struct i915_deps *deps, const struct ttm_operation_ctx *ctx) { struct dma_fence **fences = deps->fences; @@ -183,7 +157,6 @@ static int i915_deps_sync(struct i915_deps *deps, break; } - i915_deps_fini(deps); return ret; } @@ -221,34 +194,6 @@ static int i915_deps_add_dependency(struct i915_deps *deps, return i915_deps_grow(deps, fence, ctx); } -static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps, - const struct ttm_operation_ctx *ctx) -{ - struct dma_fence_array *array; - - if (deps->num_deps == 0) - return NULL; - - if (deps->num_deps == 1) { - deps->num_deps = 0; - return deps->fences[0]; - } - - /* - * TODO: Alter the allocation mode here to not try too hard to - * make things async. - */ - array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0, - false); - if (!array) - return ERR_PTR(i915_deps_sync(deps, ctx)); - - deps->fences = NULL; - i915_deps_reset_fences(deps); - - return &array->base; -} - static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, bool all, const bool no_excl, const struct ttm_operation_ctx *ctx) @@ -387,7 +332,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo, struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm, struct sg_table *dst_st, - struct dma_fence *dep) + const struct i915_deps *deps) { struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), bdev); @@ -411,7 +356,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo, return ERR_PTR(-EINVAL); intel_engine_pm_get(to_gt(i915)->migrate.context->engine); - ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, dep, + ret = intel_context_migrate_clear(to_gt(i915)->migrate.context, deps, dst_st->sgl, dst_level, i915_ttm_gtt_binds_lmem(dst_mem), 0, &rq); @@ -425,7 +370,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo, src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm); intel_engine_pm_get(to_gt(i915)->migrate.context->engine); ret = intel_context_migrate_copy(to_gt(i915)->migrate.context, - dep, src_rsgt->table.sgl, + deps, src_rsgt->table.sgl, src_level, i915_ttm_gtt_binds_lmem(bo->resource), dst_st->sgl, dst_level, @@ -610,10 +555,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work, } static struct dma_fence * -__i915_ttm_move(struct ttm_buffer_object *bo, bool clear, +__i915_ttm_move(struct ttm_buffer_object *bo, + const struct ttm_operation_ctx *ctx, bool clear, struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm, struct i915_refct_sgt *dst_rsgt, bool allow_accel, - struct dma_fence *move_dep) + const struct i915_deps *move_deps) { struct i915_ttm_memcpy_work *copy_work = NULL; struct i915_ttm_memcpy_arg _arg, *arg = &_arg; @@ -621,7 +567,7 @@ __i915_ttm_move(struct ttm_buffer_object *bo, bool clear, if (allow_accel) { fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, - &dst_rsgt->table, move_dep); + &dst_rsgt->table, move_deps); /* * We only need to intercept the error when moving to lmem. @@ -655,8 +601,8 @@ __i915_ttm_move(struct ttm_buffer_object *bo, bool clear, if (!IS_ERR(fence)) goto out; - } else if (move_dep) { - int err = dma_fence_wait(move_dep, true); + } else if (move_deps) { + int err = i915_deps_sync(move_deps, ctx); if (err) return ERR_PTR(err); @@ -680,29 +626,21 @@ __i915_ttm_move(struct ttm_buffer_object *bo, bool clear, return fence; } -static struct dma_fence *prev_fence(struct ttm_buffer_object *bo, - struct ttm_operation_ctx *ctx) +static int +prev_deps(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, + struct i915_deps *deps) { - struct i915_deps deps; int ret; - /* - * Instead of trying hard with GFP_KERNEL to allocate memory, - * the dependency collection will just sync if it doesn't - * succeed. - */ - i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); - ret = i915_deps_add_dependency(&deps, bo->moving, ctx); + ret = i915_deps_add_dependency(deps, bo->moving, ctx); if (!ret) /* * TODO: Only await excl fence here, and shared fences before * signaling the migration fence. */ - ret = i915_deps_add_resv(&deps, bo->base.resv, true, false, ctx); - if (ret) - return ERR_PTR(ret); + ret = i915_deps_add_resv(deps, bo->base.resv, true, false, ctx); - return i915_deps_to_fence(&deps, ctx); + return ret; } /** @@ -756,16 +694,18 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm)); if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) { - struct dma_fence *dep = prev_fence(bo, ctx); + struct i915_deps deps; - if (IS_ERR(dep)) { + i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); + ret = prev_deps(bo, ctx, &deps); + if (ret) { i915_refct_sgt_put(dst_rsgt); - return PTR_ERR(dep); + return ret; } - migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm, - dst_rsgt, true, dep); - dma_fence_put(dep); + migration_fence = __i915_ttm_move(bo, ctx, clear, dst_mem, bo->ttm, + dst_rsgt, true, &deps); + i915_deps_fini(&deps); } /* We can possibly get an -ERESTARTSYS here */ @@ -826,7 +766,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst, .interruptible = intr, }; struct i915_refct_sgt *dst_rsgt; - struct dma_fence *copy_fence, *dep_fence; + struct dma_fence *copy_fence; struct i915_deps deps; int ret, shared_err; @@ -847,15 +787,12 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst, if (ret) return ret; - dep_fence = i915_deps_to_fence(&deps, &ctx); - if (IS_ERR(dep_fence)) - return PTR_ERR(dep_fence); - dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource); - copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource, + copy_fence = __i915_ttm_move(src_bo, &ctx, false, dst_bo->resource, dst_bo->ttm, dst_rsgt, allow_accel, - dep_fence); + &deps); + i915_deps_fini(&deps); i915_refct_sgt_put(dst_rsgt); if (IS_ERR_OR_NULL(copy_fence)) return PTR_ERR_OR_ZERO(copy_fence); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h index d2e7f149e05c..138b7647a558 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h @@ -18,6 +18,23 @@ struct ttm_tt; struct drm_i915_gem_object; struct i915_refct_sgt; +/** + * struct i915_deps - Collect dependencies into a single dma-fence + * @single: Storage for pointer if the collection is a single fence. + * @fences: Allocated array of fence pointers if more than a single fence; + * otherwise points to the address of @single. + * @num_deps: Current number of dependency fences. + * @fences_size: Size of the @fences array in number of pointers. + * @gfp: Allocation mode. + */ +struct i915_deps { + struct dma_fence *single; + struct dma_fence **fences; + unsigned int num_deps; + unsigned int fences_size; + gfp_t gfp; +}; + int i915_ttm_move_notify(struct ttm_buffer_object *bo); I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration, diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 19a01878fee3..18b44af56969 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -404,7 +404,7 @@ static int emit_copy(struct i915_request *rq, int size) int intel_context_migrate_copy(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -431,8 +431,8 @@ intel_context_migrate_copy(struct intel_context *ce, goto out_ce; } - if (await) { - err = i915_request_await_dma_fence(rq, await); + if (deps) { + err = i915_request_await_deps(rq, deps); if (err) goto out_rq; @@ -442,7 +442,7 @@ intel_context_migrate_copy(struct intel_context *ce, goto out_rq; } - await = NULL; + deps = NULL; } /* The PTE updates + copy must not be interrupted. */ @@ -525,7 +525,7 @@ static int emit_clear(struct i915_request *rq, int size, u32 value) int intel_context_migrate_clear(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, @@ -550,8 +550,8 @@ intel_context_migrate_clear(struct intel_context *ce, goto out_ce; } - if (await) { - err = i915_request_await_dma_fence(rq, await); + if (deps) { + err = i915_request_await_deps(rq, deps); if (err) goto out_rq; @@ -561,7 +561,7 @@ intel_context_migrate_clear(struct intel_context *ce, goto out_rq; } - await = NULL; + deps = NULL; } /* The PTE updates + clear must not be interrupted. */ @@ -599,7 +599,7 @@ intel_context_migrate_clear(struct intel_context *ce, int intel_migrate_copy(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -624,7 +624,7 @@ int intel_migrate_copy(struct intel_migrate *m, if (err) goto out; - err = intel_context_migrate_copy(ce, await, + err = intel_context_migrate_copy(ce, deps, src, src_cache_level, src_is_lmem, dst, dst_cache_level, dst_is_lmem, out); @@ -638,7 +638,7 @@ int intel_migrate_copy(struct intel_migrate *m, int intel_migrate_clear(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, @@ -661,7 +661,7 @@ intel_migrate_clear(struct intel_migrate *m, if (err) goto out; - err = intel_context_migrate_clear(ce, await, sg, cache_level, + err = intel_context_migrate_clear(ce, deps, sg, cache_level, is_lmem, value, out); intel_context_unpin(ce); diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h index 4e18e755a00b..ccc677ec4aa3 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.h +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h @@ -11,6 +11,7 @@ #include "intel_migrate_types.h" struct dma_fence; +struct i915_deps; struct i915_request; struct i915_gem_ww_ctx; struct intel_gt; @@ -23,7 +24,7 @@ struct intel_context *intel_migrate_create_context(struct intel_migrate *m); int intel_migrate_copy(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -33,7 +34,7 @@ int intel_migrate_copy(struct intel_migrate *m, struct i915_request **out); int intel_context_migrate_copy(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -45,7 +46,7 @@ int intel_context_migrate_copy(struct intel_context *ce, int intel_migrate_clear(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, @@ -53,7 +54,7 @@ intel_migrate_clear(struct intel_migrate *m, struct i915_request **out); int intel_context_migrate_clear(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 40cf8ec0b9fc..7d804df27546 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -32,6 +32,7 @@ #include #include "gem/i915_gem_context.h" +#include "gem/i915_gem_ttm_move.h" #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine.h" @@ -1543,6 +1544,27 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) return 0; } +/** + * i915_request_await_deps - set this request to (async) wait upon a struct + * i915_deps dma_fence collection + * @rq: request we are wishing to use + * @deps: The struct i915_deps containing the dependencies. + * + * Returns 0 if successful, negative error code on error. + */ +int i915_request_await_deps(struct i915_request *rq, const struct i915_deps *deps) +{ + int i, err; + + for (i = 0; i < deps->num_deps; ++i) { + err = i915_request_await_dma_fence(rq, deps->fences[i]); + if (err) + return err; + } + + return 0; +} + /** * i915_request_await_object - set this request to (async) wait upon a bo * @to: request we are wishing to use diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index ce7714210697..170ee78c2858 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -47,6 +47,7 @@ struct drm_file; struct drm_i915_gem_object; struct drm_printer; +struct i915_deps; struct i915_request; #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) @@ -411,6 +412,7 @@ int i915_request_await_object(struct i915_request *to, bool write); int i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence); +int i915_request_await_deps(struct i915_request *rq, const struct i915_deps *deps); int i915_request_await_execution(struct i915_request *rq, struct dma_fence *fence); From patchwork Tue Dec 21 20:00:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m?= X-Patchwork-Id: 12690527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E35BCC433F5 for ; Tue, 21 Dec 2021 20:01:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EF07D112146; Tue, 21 Dec 2021 20:01:16 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id C19AF112145; Tue, 21 Dec 2021 20:01:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640116871; x=1671652871; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0Whg8mrWPnI3NDPkk+N4bKV5ySpnjMKCbG542VaZndg=; b=AQ7vm7bHTYmkzlWriXmKqLT9TCvwJiRgnX3AWzZ3Kvm/jc7ytEWKcnCk D3zstmKwrJxz5eUlkGON1UY2egNW9D7ztEw9pXPcFMbtEPW13tUeAG3jp 1MvZgUI3s7PI+cGDBwZn/dygEJ1YhExvA8eDnQ0GZn9SY4tUKDkridsKq +V3HFwU1YULk2sdXrCk0BbJiQhSrlOzQDPnnif/8h4JsqLzjacWjHw7Ms OYOdhdwmQ5TeBpd4mstTmj5EKroDnNBuPGcm2O2FdRBgf8Z8vPunTEWVw FrV6i7Vun928e4tShXEDoT0VWIiQ7u9nRN8ke1bKFrqksCYmshrJ0MojN A==; X-IronPort-AV: E=McAfee;i="6200,9189,10205"; a="221157941" X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="221157941" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:11 -0800 X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="616887434" Received: from arajji-mobl.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.222]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:09 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Tue, 21 Dec 2021 21:00:48 +0100 Message-Id: <20211221200050.436316-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> References: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 2/4] drm/i915: remove questionable fence optimization during copy X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com, =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Christian König First of all as discussed multiple times now kernel copies *must* always wait for all fences in a BO before actually doing the copy. This is mandatory. Additional to that drop the handling when there can't be a shared slot allocated on the source BO and just properly return an error code. Otherwise this code path would only be tested under out of memory conditions. Signed-off-by: Christian König Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 43 +++++++------------- 1 file changed, 14 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index 62a6fd2d127d..5dbaccf3f201 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -195,19 +195,14 @@ static int i915_deps_add_dependency(struct i915_deps *deps, } static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, - bool all, const bool no_excl, const struct ttm_operation_ctx *ctx) { struct dma_resv_iter iter; struct dma_fence *fence; + int ret; dma_resv_assert_held(resv); - dma_resv_for_each_fence(&iter, resv, all, fence) { - int ret; - - if (no_excl && dma_resv_iter_is_exclusive(&iter)) - continue; - + dma_resv_for_each_fence(&iter, resv, true, fence) { ret = i915_deps_add_dependency(deps, fence, ctx); if (ret) return ret; @@ -634,11 +629,7 @@ prev_deps(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, ret = i915_deps_add_dependency(deps, bo->moving, ctx); if (!ret) - /* - * TODO: Only await excl fence here, and shared fences before - * signaling the migration fence. - */ - ret = i915_deps_add_resv(deps, bo->base.resv, true, false, ctx); + ret = i915_deps_add_resv(deps, bo->base.resv, ctx); return ret; } @@ -768,22 +759,21 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst, struct i915_refct_sgt *dst_rsgt; struct dma_fence *copy_fence; struct i915_deps deps; - int ret, shared_err; + int ret; assert_object_held(dst); assert_object_held(src); i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); - /* - * We plan to add a shared fence only for the source. If that - * fails, we await all source fences before commencing - * the copy instead of only the exclusive. - */ - shared_err = dma_resv_reserve_shared(src_bo->base.resv, 1); - ret = i915_deps_add_resv(&deps, dst_bo->base.resv, true, false, &ctx); - if (!ret) - ret = i915_deps_add_resv(&deps, src_bo->base.resv, - !!shared_err, false, &ctx); + ret = dma_resv_reserve_shared(src_bo->base.resv, 1); + if (ret) + return ret; + + ret = i915_deps_add_resv(&deps, dst_bo->base.resv, &ctx); + if (ret) + return ret; + + ret = i915_deps_add_resv(&deps, src_bo->base.resv, &ctx); if (ret) return ret; @@ -798,12 +788,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst, return PTR_ERR_OR_ZERO(copy_fence); dma_resv_add_excl_fence(dst_bo->base.resv, copy_fence); - - /* If we failed to reserve a shared slot, add an exclusive fence */ - if (shared_err) - dma_resv_add_excl_fence(src_bo->base.resv, copy_fence); - else - dma_resv_add_shared_fence(src_bo->base.resv, copy_fence); + dma_resv_add_shared_fence(src_bo->base.resv, copy_fence); dma_fence_put(copy_fence); From patchwork Tue Dec 21 20:00:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m?= X-Patchwork-Id: 12690525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 97EF8C433FE for ; Tue, 21 Dec 2021 20:01:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 68CB8112145; Tue, 21 Dec 2021 20:01:15 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id E2BF110FF42; Tue, 21 Dec 2021 20:01:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640116873; x=1671652873; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ldw7bqeBo+Ju1sL2QiK/ms8tf1IXUteu2QZiI7nKiR0=; b=I3aNMN+/3QwHO+o+OY9Y4HYyx/FkPVZ3AsmlMvp02w32bGp0fn2x2mIJ k21sRegorEKrEPmYP8UF5Cv88EIObPKSaP69Bozh61vn7VcnB74XXdxmI H+CwnK8mMdB/JXU2ps+mdEOGHgxUhi9r0x74a5hsMbGeHesZeBUTrMSIa y5yAd2TEmW1R/XHNCAVNTtt/QjeFYM4X3CkwDggh+KnrNaPwnul45V35A inS+hhJNyBSG5jIGuEQlZ8D5xfkGB970Ly9yOJ8beUATkNCJbXPgGy+RV sPUhNLjf2UihqqUevtihXhFks5OibYHLbq2tskD3H0s+tCw3NNmjTH8DM A==; X-IronPort-AV: E=McAfee;i="6200,9189,10205"; a="221157957" X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="221157957" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="616887457" Received: from arajji-mobl.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.222]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:11 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Tue, 21 Dec 2021 21:00:49 +0100 Message-Id: <20211221200050.436316-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> References: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 3/4] drm/i915: Break out the i915_deps utility X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since it's starting to be used outside the i915 TTM move code, move it to a separate set of files. v2: - Update the documentation. v4: - Rebase. Signed-off-by: Thomas Hellström Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 171 +------------ drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 17 -- drivers/gpu/drm/i915/i915_deps.c | 237 +++++++++++++++++++ drivers/gpu/drm/i915/i915_deps.h | 45 ++++ drivers/gpu/drm/i915/i915_request.c | 2 +- 6 files changed, 285 insertions(+), 188 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_deps.c create mode 100644 drivers/gpu/drm/i915/i915_deps.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6ddd2d2bbaaf..1b62b9f65196 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -163,6 +163,7 @@ i915-y += \ i915_active.o \ i915_buddy.o \ i915_cmd_parser.o \ + i915_deps.o \ i915_gem_evict.o \ i915_gem_gtt.o \ i915_gem_ww.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index 5dbaccf3f201..ee9612a3ee5e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -5,6 +5,7 @@ #include +#include "i915_deps.h" #include "i915_drv.h" #include "intel_memory_region.h" #include "intel_region_ttm.h" @@ -41,176 +42,6 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration, } #endif -/** - * DOC: Set of utilities to dynamically collect dependencies into a - * structure which is fed into the GT migration code. - * - * Once we can do async unbinding, this is also needed to coalesce - * the migration fence with the unbind fences if these are coalesced - * post-migration. - * - * While collecting the individual dependencies, we store the refcounted - * struct dma_fence pointers in a realloc-managed pointer array, since - * that can be easily fed into a dma_fence_array. Other options are - * available, like for example an xarray for similarity with drm/sched. - * Can be changed easily if needed. - * - * A struct i915_deps need to be initialized using i915_deps_init(). - * If i915_deps_add_dependency() or i915_deps_add_resv() return an - * error code they will internally call i915_deps_fini(), which frees - * all internal references and allocations. - * - * We might want to break this out into a separate file as a utility. - */ - -#define I915_DEPS_MIN_ALLOC_CHUNK 8U - -static void i915_deps_reset_fences(struct i915_deps *deps) -{ - if (deps->fences != &deps->single) - kfree(deps->fences); - deps->num_deps = 0; - deps->fences_size = 1; - deps->fences = &deps->single; -} - -static void i915_deps_init(struct i915_deps *deps, gfp_t gfp) -{ - deps->fences = NULL; - deps->gfp = gfp; - i915_deps_reset_fences(deps); -} - -static void i915_deps_fini(struct i915_deps *deps) -{ - unsigned int i; - - for (i = 0; i < deps->num_deps; ++i) - dma_fence_put(deps->fences[i]); - - if (deps->fences != &deps->single) - kfree(deps->fences); -} - -static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence, - const struct ttm_operation_ctx *ctx) -{ - int ret; - - if (deps->num_deps >= deps->fences_size) { - unsigned int new_size = 2 * deps->fences_size; - struct dma_fence **new_fences; - - new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK); - new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp); - if (!new_fences) - goto sync; - - memcpy(new_fences, deps->fences, - deps->fences_size * sizeof(*new_fences)); - swap(new_fences, deps->fences); - if (new_fences != &deps->single) - kfree(new_fences); - deps->fences_size = new_size; - } - deps->fences[deps->num_deps++] = dma_fence_get(fence); - return 0; - -sync: - if (ctx->no_wait_gpu && !dma_fence_is_signaled(fence)) { - ret = -EBUSY; - goto unref; - } - - ret = dma_fence_wait(fence, ctx->interruptible); - if (ret) - goto unref; - - ret = fence->error; - if (ret) - goto unref; - - return 0; - -unref: - i915_deps_fini(deps); - return ret; -} - -static int i915_deps_sync(const struct i915_deps *deps, - const struct ttm_operation_ctx *ctx) -{ - struct dma_fence **fences = deps->fences; - unsigned int i; - int ret = 0; - - for (i = 0; i < deps->num_deps; ++i, ++fences) { - if (ctx->no_wait_gpu && !dma_fence_is_signaled(*fences)) { - ret = -EBUSY; - break; - } - - ret = dma_fence_wait(*fences, ctx->interruptible); - if (!ret) - ret = (*fences)->error; - if (ret) - break; - } - - return ret; -} - -static int i915_deps_add_dependency(struct i915_deps *deps, - struct dma_fence *fence, - const struct ttm_operation_ctx *ctx) -{ - unsigned int i; - int ret; - - if (!fence) - return 0; - - if (dma_fence_is_signaled(fence)) { - ret = fence->error; - if (ret) - i915_deps_fini(deps); - return ret; - } - - for (i = 0; i < deps->num_deps; ++i) { - struct dma_fence *entry = deps->fences[i]; - - if (!entry->context || entry->context != fence->context) - continue; - - if (dma_fence_is_later(fence, entry)) { - dma_fence_put(entry); - deps->fences[i] = dma_fence_get(fence); - } - - return 0; - } - - return i915_deps_grow(deps, fence, ctx); -} - -static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, - const struct ttm_operation_ctx *ctx) -{ - struct dma_resv_iter iter; - struct dma_fence *fence; - int ret; - - dma_resv_assert_held(resv); - dma_resv_for_each_fence(&iter, resv, true, fence) { - ret = i915_deps_add_dependency(deps, fence, ctx); - if (ret) - return ret; - } - - return 0; -} - static enum i915_cache_level i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res, struct ttm_tt *ttm) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h index 138b7647a558..d2e7f149e05c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h @@ -18,23 +18,6 @@ struct ttm_tt; struct drm_i915_gem_object; struct i915_refct_sgt; -/** - * struct i915_deps - Collect dependencies into a single dma-fence - * @single: Storage for pointer if the collection is a single fence. - * @fences: Allocated array of fence pointers if more than a single fence; - * otherwise points to the address of @single. - * @num_deps: Current number of dependency fences. - * @fences_size: Size of the @fences array in number of pointers. - * @gfp: Allocation mode. - */ -struct i915_deps { - struct dma_fence *single; - struct dma_fence **fences; - unsigned int num_deps; - unsigned int fences_size; - gfp_t gfp; -}; - int i915_ttm_move_notify(struct ttm_buffer_object *bo); I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration, diff --git a/drivers/gpu/drm/i915/i915_deps.c b/drivers/gpu/drm/i915/i915_deps.c new file mode 100644 index 000000000000..999210b37325 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_deps.c @@ -0,0 +1,237 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include +#include + +#include + +#include "i915_deps.h" + +/** + * DOC: Set of utilities to dynamically collect dependencies into a + * structure which is fed into the GT migration code. + * + * Once we can do async unbinding, this is also needed to coalesce + * the migration fence with the unbind fences if these are coalesced + * post-migration. + * + * While collecting the individual dependencies, we store the refcounted + * struct dma_fence pointers in a realloc-managed pointer array, since + * that can be easily fed into a dma_fence_array. Other options are + * available, like for example an xarray for similarity with drm/sched. + * Can be changed easily if needed. + * + * A struct i915_deps need to be initialized using i915_deps_init(). + * If i915_deps_add_dependency() or i915_deps_add_resv() return an + * error code they will internally call i915_deps_fini(), which frees + * all internal references and allocations. + */ + +/* Min number of fence pointers in the array when an allocation occurs. */ +#define I915_DEPS_MIN_ALLOC_CHUNK 8U + +static void i915_deps_reset_fences(struct i915_deps *deps) +{ + if (deps->fences != &deps->single) + kfree(deps->fences); + deps->num_deps = 0; + deps->fences_size = 1; + deps->fences = &deps->single; +} + +/** + * i915_deps_init - Initialize an i915_deps structure + * @deps: Pointer to the i915_deps structure to initialize. + * @gfp: The allocation mode for subsequenst allocations. + */ +void i915_deps_init(struct i915_deps *deps, gfp_t gfp) +{ + deps->fences = NULL; + deps->gfp = gfp; + i915_deps_reset_fences(deps); +} + +/** + * i915_deps_fini - Finalize an i915_deps structure + * @deps: Pointer to the i915_deps structure to finalize. + * + * This function drops all fence references taken, conditionally frees and + * then resets the fences array. + */ +void i915_deps_fini(struct i915_deps *deps) +{ + unsigned int i; + + for (i = 0; i < deps->num_deps; ++i) + dma_fence_put(deps->fences[i]); + + if (deps->fences != &deps->single) + kfree(deps->fences); +} + +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence, + const struct ttm_operation_ctx *ctx) +{ + int ret; + + if (deps->num_deps >= deps->fences_size) { + unsigned int new_size = 2 * deps->fences_size; + struct dma_fence **new_fences; + + new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK); + new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp); + if (!new_fences) + goto sync; + + memcpy(new_fences, deps->fences, + deps->fences_size * sizeof(*new_fences)); + swap(new_fences, deps->fences); + if (new_fences != &deps->single) + kfree(new_fences); + deps->fences_size = new_size; + } + deps->fences[deps->num_deps++] = dma_fence_get(fence); + return 0; + +sync: + if (ctx->no_wait_gpu && !dma_fence_is_signaled(fence)) { + ret = -EBUSY; + goto unref; + } + + ret = dma_fence_wait(fence, ctx->interruptible); + if (ret) + goto unref; + + ret = fence->error; + if (ret) + goto unref; + + return 0; + +unref: + i915_deps_fini(deps); + return ret; +} + +/** + * i915_deps_sync - Wait for all the fences in the dependency collection + * @deps: Pointer to the i915_deps structure the fences of which to wait for. + * @ctx: Pointer to a struct ttm_operation_ctx indicating how the waits + * should be performed. + * + * This function waits for fences in the dependency collection. If it + * encounters an error during the wait or a fence error, the wait for + * further fences is aborted and the error returned. + * + * Return: Zero if successful, Negative error code on error. + */ +int i915_deps_sync(const struct i915_deps *deps, const struct ttm_operation_ctx *ctx) +{ + struct dma_fence **fences = deps->fences; + unsigned int i; + int ret = 0; + + for (i = 0; i < deps->num_deps; ++i, ++fences) { + if (ctx->no_wait_gpu && !dma_fence_is_signaled(*fences)) { + ret = -EBUSY; + break; + } + + ret = dma_fence_wait(*fences, ctx->interruptible); + if (!ret) + ret = (*fences)->error; + if (ret) + break; + } + + return ret; +} + +/** + * i915_deps_add_dependency - Add a fence to the dependency collection + * @deps: Pointer to the i915_deps structure a fence is to be added to. + * @fence: The fence to add. + * @ctx: Pointer to a struct ttm_operation_ctx indicating how waits are to + * be performed if waiting. + * + * Adds a fence to the dependency collection, and takes a reference on it. + * If the fence context is not zero and there was a later fence from the + * same fence context already added, then the fence is not added to the + * dependency collection. If the fence context is not zero and there was + * an earlier fence already added, then the fence will replace the older + * fence from the same context and the reference on the earlier fence will + * be dropped. + * If there is a failure to allocate memory to accommodate the new fence to + * be added, the new fence will instead be waited for and an error may + * be returned; depending on the value of @ctx, or if there was a fence + * error. If an error was returned, the dependency collection will be + * finalized and all fence reference dropped. + * + * Return: 0 if success. Negative error code on error. + */ +int i915_deps_add_dependency(struct i915_deps *deps, + struct dma_fence *fence, + const struct ttm_operation_ctx *ctx) +{ + unsigned int i; + int ret; + + if (!fence) + return 0; + + if (dma_fence_is_signaled(fence)) { + ret = fence->error; + if (ret) + i915_deps_fini(deps); + return ret; + } + + for (i = 0; i < deps->num_deps; ++i) { + struct dma_fence *entry = deps->fences[i]; + + if (!entry->context || entry->context != fence->context) + continue; + + if (dma_fence_is_later(fence, entry)) { + dma_fence_put(entry); + deps->fences[i] = dma_fence_get(fence); + } + + return 0; + } + + return i915_deps_grow(deps, fence, ctx); +} + +/** + * i915_deps_add_resv - Add the fences of a reservation object to a dependency + * collection. + * @deps: Pointer to the i915_deps structure a fence is to be added to. + * @resv: The reservation object, then fences of which to add. + * @ctx: Pointer to a struct ttm_operation_ctx indicating how waits are to + * be performed if waiting. + * + * Calls i915_deps_add_depencency() on the indicated fences of @resv. + * + * Return: Zero on success. Negative error code on error. + */ +int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, + const struct ttm_operation_ctx *ctx) +{ + struct dma_resv_iter iter; + struct dma_fence *fence; + + dma_resv_assert_held(resv); + dma_resv_for_each_fence(&iter, resv, true, fence) { + int ret = i915_deps_add_dependency(deps, fence, ctx); + + if (ret) + return ret; + } + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_deps.h b/drivers/gpu/drm/i915/i915_deps.h new file mode 100644 index 000000000000..d76c0106c910 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_deps.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef _I915_DEPS_H_ +#define _I915_DEPS_H_ + +#include + +struct ttm_operation_ctx; +struct dma_fence; +struct dma_resv; + +/** + * struct i915_deps - Collect dependencies into a single dma-fence + * @single: Storage for pointer if the collection is a single fence. + * @fences: Allocated array of fence pointers if more than a single fence; + * otherwise points to the address of @single. + * @num_deps: Current number of dependency fences. + * @fences_size: Size of the @fences array in number of pointers. + * @gfp: Allocation mode. + */ +struct i915_deps { + struct dma_fence *single; + struct dma_fence **fences; + unsigned int num_deps; + unsigned int fences_size; + gfp_t gfp; +}; + +void i915_deps_init(struct i915_deps *deps, gfp_t gfp); + +void i915_deps_fini(struct i915_deps *deps); + +int i915_deps_add_dependency(struct i915_deps *deps, + struct dma_fence *fence, + const struct ttm_operation_ctx *ctx); + +int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, + const struct ttm_operation_ctx *ctx); + +int i915_deps_sync(const struct i915_deps *deps, + const struct ttm_operation_ctx *ctx); +#endif diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7d804df27546..76cf5ac91e94 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -32,7 +32,6 @@ #include #include "gem/i915_gem_context.h" -#include "gem/i915_gem_ttm_move.h" #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine.h" @@ -43,6 +42,7 @@ #include "gt/intel_rps.h" #include "i915_active.h" +#include "i915_deps.h" #include "i915_drv.h" #include "i915_trace.h" #include "intel_pm.h" From patchwork Tue Dec 21 20:00:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Thomas_Hellstr=C3=B6m?= X-Patchwork-Id: 12690529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A6D77C4332F for ; Tue, 21 Dec 2021 20:01:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B41B11227A; Tue, 21 Dec 2021 20:01:20 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 874E8112146; Tue, 21 Dec 2021 20:01:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640116875; x=1671652875; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NFcXM/ELeEJuNTgN+KKE62gM2vIjow9iw7ode9EcCWU=; b=FgsysVsJMo3FPeNpJFR/EbP+pTW4aPv0cEOh9+uUGDK4bHl2Ryu2Jw7b 5fNU018Z3uqAJCXqyhZq1UUCsZWpFJEur5tMBrLBUpCdeDUydeMpOErgQ TRqBVh27hntCH+nMjvaPRdsEmI4Zk5KrBlXA+FrXw6AWvtn0nQQJU3qPG 8bEWqLk7qmwrg884uGz9TMVJ6svBbgnuw2XnSB8nVsSK5a9MEK1Bnnzbe vv4frRvQgIxBV76tS8WPYJGDbhaxrdeBkJPQTy2xy23baYgU1vi0yAVGL U+fI32ccuOn3bnPn7UMz0/NZkeLossJ6JbEZhZnFTI51vcLYM9dDi0dmc g==; X-IronPort-AV: E=McAfee;i="6200,9189,10205"; a="221157968" X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="221157968" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:15 -0800 X-IronPort-AV: E=Sophos;i="5.88,224,1635231600"; d="scan'208";a="616887473" Received: from arajji-mobl.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.222]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2021 12:01:13 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Tue, 21 Dec 2021 21:00:50 +0100 Message-Id: <20211221200050.436316-5-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> References: <20211221200050.436316-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 4/4] drm/i915: Require the vm mutex for i915_vma_bind() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Protect updates of struct i915_vma flags and async binding / unbinding with the vm::mutex. This means that i915_vma_bind() needs to assert vm::mutex held. In order to make that possible drop the caching of kmap_atomic() maps around i915_vma_bind(). An alternative would be to use kmap_local() but since we block cpu unplugging during sleeps inside kmap_local() sections this may have unwanted side-effects. Particularly since we might wait for gpu while holding the vm mutex. This change may theoretically increase execbuf cpu-usage on snb, but at least on non-highmem systems that increase should be very small. Signed-off-by: Thomas Hellström Reviewed-by: Matthew Auld --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 50 ++++++++++++++++++- drivers/gpu/drm/i915/i915_vma.c | 1 + 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index ec7c4a29a720..b8330f0bf652 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1109,6 +1109,47 @@ static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache) return &i915->ggtt; } +static void reloc_cache_unmap(struct reloc_cache *cache) +{ + void *vaddr; + + if (!cache->vaddr) + return; + + vaddr = unmask_page(cache->vaddr); + if (cache->vaddr & KMAP) + kunmap_atomic(vaddr); + else + io_mapping_unmap_atomic((void __iomem *)vaddr); +} + +static void reloc_cache_remap(struct reloc_cache *cache, + struct drm_i915_gem_object *obj) +{ + void *vaddr; + + if (!cache->vaddr) + return; + + if (cache->vaddr & KMAP) { + struct page *page = i915_gem_object_get_page(obj, cache->page); + + vaddr = kmap_atomic(page); + cache->vaddr = unmask_flags(cache->vaddr) | + (unsigned long)vaddr; + } else { + struct i915_ggtt *ggtt = cache_to_ggtt(cache); + unsigned long offset; + + offset = cache->node.start; + if (!drm_mm_node_allocated(&cache->node)) + offset += cache->page << PAGE_SHIFT; + + cache->vaddr = (unsigned long) + io_mapping_map_atomic_wc(&ggtt->iomap, offset); + } +} + static void reloc_cache_reset(struct reloc_cache *cache, struct i915_execbuffer *eb) { void *vaddr; @@ -1373,10 +1414,17 @@ eb_relocate_entry(struct i915_execbuffer *eb, * batchbuffers. */ if (reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION && - GRAPHICS_VER(eb->i915) == 6) { + GRAPHICS_VER(eb->i915) == 6 && + !i915_vma_is_bound(target->vma, I915_VMA_GLOBAL_BIND)) { + struct i915_vma *vma = target->vma; + + reloc_cache_unmap(&eb->reloc_cache); + mutex_lock(&vma->vm->mutex); err = i915_vma_bind(target->vma, target->vma->obj->cache_level, PIN_GLOBAL, NULL); + mutex_unlock(&vma->vm->mutex); + reloc_cache_remap(&eb->reloc_cache, ev->vma->obj); if (err) return err; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index de24e4b3b19b..be208a8f1ed0 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -393,6 +393,7 @@ int i915_vma_bind(struct i915_vma *vma, u32 bind_flags; u32 vma_flags; + lockdep_assert_held(&vma->vm->mutex); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); GEM_BUG_ON(vma->size > vma->node.size);