From patchwork Fri Aug 9 22:26:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Auld X-Patchwork-Id: 11087793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDD1914D5 for ; Fri, 9 Aug 2019 22:27:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA172205FD for ; Fri, 9 Aug 2019 22:27:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AD9B721FAC; Fri, 9 Aug 2019 22:27:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=unavailable version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0661121C9A for ; Fri, 9 Aug 2019 22:27:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 662176EF05; Fri, 9 Aug 2019 22:27:13 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7C6876EEE3; Fri, 9 Aug 2019 22:27:02 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Aug 2019 15:27:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,366,1559545200"; d="scan'208";a="176927017" Received: from jmath3-mobl1.ger.corp.intel.com (HELO mwahaha-bdw.ger.corp.intel.com) ([10.252.5.86]) by fmsmga007.fm.intel.com with ESMTP; 09 Aug 2019 15:27:00 -0700 From: Matthew Auld To: intel-gfx@lists.freedesktop.org Date: Fri, 9 Aug 2019 23:26:17 +0100 Message-Id: <20190809222643.23142-12-matthew.auld@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190809222643.23142-1-matthew.auld@intel.com> References: <20190809222643.23142-1-matthew.auld@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 11/37] drm/i915/blt: bump size restriction X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dri-devel@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Reported-by: Chris Wilson Signed-off-by: Matthew Auld --- .../gpu/drm/i915/gem/i915_gem_client_blt.c | 31 +++- .../gpu/drm/i915/gem/i915_gem_object_blt.c | 139 ++++++++++++++---- .../gpu/drm/i915/gem/i915_gem_object_blt.h | 9 +- .../i915/gem/selftests/i915_gem_client_blt.c | 16 +- .../i915/gem/selftests/i915_gem_object_blt.c | 22 ++- 5 files changed, 170 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c index 08a84c940d8d..4b096309a97e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c @@ -5,6 +5,8 @@ #include "i915_drv.h" #include "gt/intel_context.h" +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_pool.h" #include "i915_gem_client_blt.h" #include "i915_gem_object_blt.h" @@ -156,7 +158,9 @@ static void clear_pages_worker(struct work_struct *work) struct drm_i915_private *i915 = w->ce->engine->i915; struct drm_i915_gem_object *obj = w->sleeve->vma->obj; struct i915_vma *vma = w->sleeve->vma; + struct intel_engine_pool_node *pool; struct i915_request *rq; + struct i915_vma *batch; int err = w->dma.error; if (unlikely(err)) @@ -176,10 +180,17 @@ static void clear_pages_worker(struct work_struct *work) if (unlikely(err)) goto out_unlock; + intel_engine_pm_get(w->ce->engine); + batch = intel_emit_vma_fill_blt(&pool, w->ce, vma, w->value); + if (IS_ERR(batch)) { + err = PTR_ERR(batch); + goto out_unpin; + } + rq = intel_context_create_request(w->ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_unpin; + goto out_batch; } /* There's no way the fence has signalled */ @@ -187,6 +198,16 @@ static void clear_pages_worker(struct work_struct *work) clear_pages_dma_fence_cb)) GEM_BUG_ON(1); + i915_vma_lock(batch); + err = i915_vma_move_to_active(batch, rq, 0); + i915_vma_unlock(batch); + if (unlikely(err)) + goto out_request; + + err = intel_engine_pool_mark_active(pool, rq); + if (unlikely(err)) + goto out_request; + if (w->ce->engine->emit_init_breadcrumb) { err = w->ce->engine->emit_init_breadcrumb(rq); if (unlikely(err)) @@ -202,7 +223,9 @@ static void clear_pages_worker(struct work_struct *work) if (err) goto out_request; - err = intel_emit_vma_fill_blt(rq, vma, w->value); + err = w->ce->engine->emit_bb_start(rq, + batch->node.start, batch->node.size, + 0); out_request: if (unlikely(err)) { i915_request_skip(rq, err); @@ -210,7 +233,11 @@ static void clear_pages_worker(struct work_struct *work) } i915_request_add(rq); +out_batch: + i915_vma_unpin(batch); + intel_engine_pool_put(pool); out_unpin: + intel_engine_pm_put(w->ce->engine); i915_vma_unpin(vma); out_unlock: mutex_unlock(&i915->drm.struct_mutex); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c index fa90c38c8b07..c1e5edd1e359 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c @@ -5,49 +5,103 @@ #include "i915_drv.h" #include "gt/intel_context.h" +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_pool.h" +#include "gt/intel_gt.h" #include "i915_gem_clflush.h" #include "i915_gem_object_blt.h" -int intel_emit_vma_fill_blt(struct i915_request *rq, - struct i915_vma *vma, - u32 value) +struct i915_vma *intel_emit_vma_fill_blt(struct intel_engine_pool_node **p, + struct intel_context *ce, + struct i915_vma *vma, + u32 value) { - u32 *cs; - - cs = intel_ring_begin(rq, 8); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - if (INTEL_GEN(rq->i915) >= 8) { - *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2); - *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; - *cs++ = 0; - *cs++ = vma->size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = lower_32_bits(vma->node.start); - *cs++ = upper_32_bits(vma->node.start); - *cs++ = value; - *cs++ = MI_NOOP; - } else { - *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); - *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; - *cs++ = 0; - *cs++ = vma->size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = vma->node.start; - *cs++ = value; - *cs++ = MI_NOOP; - *cs++ = MI_NOOP; + struct drm_i915_private *i915 = ce->vm->i915; + const u32 block_size = S16_MAX * PAGE_SIZE; + struct intel_engine_pool_node *pool; + struct i915_vma *batch; + u64 offset; + u64 count; + u64 rem; + u32 size; + u32 *cmd; + int err; + + count = div_u64(vma->size, block_size); + size = (1 + 8 * count) * sizeof(u32); + size = round_up(size, PAGE_SIZE); + pool = intel_engine_pool_get(&ce->engine->pool, size); + if (IS_ERR(pool)) + return ERR_CAST(pool); + + cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_WC); + if (IS_ERR(cmd)) { + err = PTR_ERR(cmd); + goto out_put; + } + + rem = vma->size; + offset = vma->node.start; + + do { + u32 size = min_t(u64, rem, block_size); + + GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); + + if (INTEL_GEN(i915) >= 8) { + *cmd++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2); + *cmd++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; + *cmd++ = 0; + *cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cmd++ = lower_32_bits(offset); + *cmd++ = upper_32_bits(offset); + *cmd++ = value; + } else { + *cmd++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); + *cmd++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; + *cmd++ = 0; + *cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cmd++ = offset; + *cmd++ = value; + } + + /* Allow ourselves to be preempted in between blocks. */ + *cmd++ = MI_ARB_CHECK; + + offset += size; + rem -= size; + } while (rem); + + *cmd = MI_BATCH_BUFFER_END; + intel_gt_chipset_flush(ce->vm->gt); + + i915_gem_object_unpin_map(pool->obj); + + batch = i915_vma_instance(pool->obj, ce->vm, NULL); + if (IS_ERR(batch)) { + err = PTR_ERR(batch); + goto out_put; } - intel_ring_advance(rq, cs); + err = i915_vma_pin(batch, 0, 0, PIN_USER); + if (unlikely(err)) + goto out_put; + + *p = pool; + return batch; - return 0; +out_put: + intel_engine_pool_put(pool); + return ERR_PTR(err); } int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, struct intel_context *ce, u32 value) { + struct intel_engine_pool_node *pool; struct i915_request *rq; + struct i915_vma *batch; struct i915_vma *vma; int err; @@ -65,12 +119,29 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, i915_gem_object_unlock(obj); } + intel_engine_pm_get(ce->engine); + batch = intel_emit_vma_fill_blt(&pool, ce, vma, value); + if (IS_ERR(batch)) { + err = PTR_ERR(batch); + goto out_unpin; + } + rq = intel_context_create_request(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_unpin; + goto out_batch; } + i915_vma_lock(batch); + err = i915_vma_move_to_active(batch, rq, 0); + i915_vma_unlock(batch); + if (unlikely(err)) + goto out_request; + + err = intel_engine_pool_mark_active(pool, rq); + if (unlikely(err)) + goto out_request; + err = i915_request_await_object(rq, obj, true); if (unlikely(err)) goto out_request; @@ -87,13 +158,19 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, if (unlikely(err)) goto out_request; - err = intel_emit_vma_fill_blt(rq, vma, value); + err = ce->engine->emit_bb_start(rq, + batch->node.start, batch->node.size, + 0); out_request: if (unlikely(err)) i915_request_skip(rq, err); i915_request_add(rq); +out_batch: + i915_vma_unpin(batch); + intel_engine_pool_put(pool); out_unpin: + intel_engine_pm_put(ce->engine); i915_vma_unpin(vma); return err; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h index 7ec7de6ac0c0..a7425c234d50 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h @@ -9,13 +9,14 @@ #include struct drm_i915_gem_object; +struct intel_engine_pool_node; struct intel_context; -struct i915_request; struct i915_vma; -int intel_emit_vma_fill_blt(struct i915_request *rq, - struct i915_vma *vma, - u32 value); +struct i915_vma *intel_emit_vma_fill_blt(struct intel_engine_pool_node **p, + struct intel_context *ce, + struct i915_vma *vma, + u32 value); int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, struct intel_context *ce, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index 275c28926067..d8804a847945 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -9,6 +9,7 @@ #include "selftests/igt_flush_test.h" #include "selftests/mock_drm.h" +#include "huge_gem_object.h" #include "mock_context.h" static int igt_client_fill(void *arg) @@ -24,15 +25,19 @@ static int igt_client_fill(void *arg) prandom_seed_state(&prng, i915_selftest.random_seed); do { - u32 sz = prandom_u32_state(&prng) % SZ_32M; + const u32 max_block_size = S16_MAX * PAGE_SIZE; + u32 sz = min_t(u64, ce->vm->total >> 4, prandom_u32_state(&prng)); + u32 phys_sz = sz % (max_block_size + 1); u32 val = prandom_u32_state(&prng); u32 i; sz = round_up(sz, PAGE_SIZE); + phys_sz = round_up(phys_sz, PAGE_SIZE); - pr_debug("%s with sz=%x, val=%x\n", __func__, sz, val); + pr_debug("%s with phys_sz= %x, sz=%x, val=%x\n", __func__, + phys_sz, sz, val); - obj = i915_gem_object_create_internal(i915, sz); + obj = huge_gem_object(i915, phys_sz, sz); if (IS_ERR(obj)) { err = PTR_ERR(obj); goto err_flush; @@ -54,7 +59,8 @@ static int igt_client_fill(void *arg) * values after we do the set_to_cpu_domain and pick it up as a * test failure. */ - memset32(vaddr, val ^ 0xdeadbeaf, obj->base.size / sizeof(u32)); + memset32(vaddr, val ^ 0xdeadbeaf, + huge_gem_object_phys_size(obj) / sizeof(u32)); if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE)) obj->cache_dirty = true; @@ -71,7 +77,7 @@ static int igt_client_fill(void *arg) if (err) goto err_unpin; - for (i = 0; i < obj->base.size / sizeof(u32); ++i) { + for (i = 0; i < huge_gem_object_phys_size(obj) / sizeof(u32); ++i) { if (vaddr[i] != val) { pr_err("vaddr[%u]=%x, expected=%x\n", i, vaddr[i], val); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c index 19843acc84d3..c6e1eebe53f5 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c @@ -9,6 +9,7 @@ #include "selftests/igt_flush_test.h" #include "selftests/mock_drm.h" +#include "huge_gem_object.h" #include "mock_context.h" static int igt_fill_blt(void *arg) @@ -23,16 +24,26 @@ static int igt_fill_blt(void *arg) prandom_seed_state(&prng, i915_selftest.random_seed); + /* + * XXX: needs some threads to scale all these tests, also maybe throw + * in submission from higher priority context to see if we are + * preempted for very large objects... + */ + do { - u32 sz = prandom_u32_state(&prng) % SZ_32M; + const u32 max_block_size = S16_MAX * PAGE_SIZE; + u32 sz = min_t(u64, ce->vm->total >> 4, prandom_u32_state(&prng)); + u32 phys_sz = sz % (max_block_size + 1); u32 val = prandom_u32_state(&prng); u32 i; sz = round_up(sz, PAGE_SIZE); + phys_sz = round_up(phys_sz, PAGE_SIZE); - pr_debug("%s with sz=%x, val=%x\n", __func__, sz, val); + pr_debug("%s with phys_sz= %x, sz=%x, val=%x\n", __func__, + phys_sz, sz, val); - obj = i915_gem_object_create_internal(i915, sz); + obj = huge_gem_object(i915, phys_sz, sz); if (IS_ERR(obj)) { err = PTR_ERR(obj); goto err_flush; @@ -48,7 +59,8 @@ static int igt_fill_blt(void *arg) * Make sure the potentially async clflush does its job, if * required. */ - memset32(vaddr, val ^ 0xdeadbeaf, obj->base.size / sizeof(u32)); + memset32(vaddr, val ^ 0xdeadbeaf, + huge_gem_object_phys_size(obj) / sizeof(u32)); if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE)) obj->cache_dirty = true; @@ -65,7 +77,7 @@ static int igt_fill_blt(void *arg) if (err) goto err_unpin; - for (i = 0; i < obj->base.size / sizeof(u32); ++i) { + for (i = 0; i < huge_gem_object_phys_size(obj) / sizeof(u32); ++i) { if (vaddr[i] != val) { pr_err("vaddr[%u]=%x, expected=%x\n", i, vaddr[i], val);