From patchwork Mon Feb 1 08:56:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7306FC433E0 for ; Mon, 1 Feb 2021 08:58:09 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2591D64E3F for ; Mon, 1 Feb 2021 08:58:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2591D64E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1C4636E4E6; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8EC926E4B1 for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757714-1500050 for multiple; Mon, 01 Feb 2021 08:57:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:19 +0000 Message-Id: <20210201085715.27435-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/57] drm/i915/gt: Restrict the GT clock override to just Icelake X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" It appears that Elkhart Lake uses the same clock for CTX_TIMESTAMP as CS_TIMESTAMP, leaving Icelake as the odd one out. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3024 Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c index f8c79efb1a87..09b290fe0867 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c @@ -160,7 +160,7 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt) gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1); /* Icelake appears to use another fixed frequency for CTX_TIMESTAMP */ - if (IS_GEN(gt->i915, 11)) + if (IS_ICELAKE(gt->i915)) gt->clock_period_ns = NSEC_PER_SEC / 13750000; GT_TRACE(gt, From patchwork Mon Feb 1 08:56:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E14AC433DB for ; Mon, 1 Feb 2021 08:58:17 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5476A64E40 for ; Mon, 1 Feb 2021 08:58:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5476A64E40 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 972396E519; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id F1C446E4EC for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757715-1500050 for multiple; Mon, 01 Feb 2021 08:57:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:20 +0000 Message-Id: <20210201085715.27435-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 02/57] drm/i915/selftests: Exercise relative mmio paths to non-privileged registers X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Verify that context isolation is also preserved when accessing context-local registers with relative-mmio commands. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/selftest_lrc.c | 88 ++++++++++++++++++++------ 1 file changed, 67 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 7bf34c439876..0524232378e4 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -910,7 +910,9 @@ create_user_vma(struct i915_address_space *vm, unsigned long size) } static struct i915_vma * -store_context(struct intel_context *ce, struct i915_vma *scratch) +store_context(struct intel_context *ce, + struct i915_vma *scratch, + bool relative) { struct i915_vma *batch; u32 dw, x, *cs, *hw; @@ -939,6 +941,9 @@ store_context(struct intel_context *ce, struct i915_vma *scratch) hw += LRC_STATE_OFFSET / sizeof(*hw); do { u32 len = hw[dw] & 0x7f; + u32 cmd = MI_STORE_REGISTER_MEM_GEN8; + u32 offset = 0; + u32 mask = ~0; if (hw[dw] == 0) { dw++; @@ -950,11 +955,19 @@ store_context(struct intel_context *ce, struct i915_vma *scratch) continue; } + if (hw[dw] & MI_LRI_LRM_CS_MMIO) { + mask = 0xfff; + if (relative) + cmd |= MI_LRI_LRM_CS_MMIO; + else + offset = ce->engine->mmio_base; + } + dw++; len = (len + 1) / 2; while (len--) { - *cs++ = MI_STORE_REGISTER_MEM_GEN8; - *cs++ = hw[dw]; + *cs++ = cmd; + *cs++ = (hw[dw] & mask) + offset; *cs++ = lower_32_bits(scratch->node.start + x); *cs++ = upper_32_bits(scratch->node.start + x); @@ -993,6 +1006,7 @@ static struct i915_request * record_registers(struct intel_context *ce, struct i915_vma *before, struct i915_vma *after, + bool relative, u32 *sema) { struct i915_vma *b_before, *b_after; @@ -1000,11 +1014,11 @@ record_registers(struct intel_context *ce, u32 *cs; int err; - b_before = store_context(ce, before); + b_before = store_context(ce, before, relative); if (IS_ERR(b_before)) return ERR_CAST(b_before); - b_after = store_context(ce, after); + b_after = store_context(ce, after, relative); if (IS_ERR(b_after)) { rq = ERR_CAST(b_after); goto err_before; @@ -1074,7 +1088,8 @@ record_registers(struct intel_context *ce, goto err_after; } -static struct i915_vma *load_context(struct intel_context *ce, u32 poison) +static struct i915_vma * +load_context(struct intel_context *ce, u32 poison, bool relative) { struct i915_vma *batch; u32 dw, *cs, *hw; @@ -1101,7 +1116,10 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison) hw = defaults; hw += LRC_STATE_OFFSET / sizeof(*hw); do { + u32 cmd = MI_INSTR(0x22, 0); u32 len = hw[dw] & 0x7f; + u32 offset = 0; + u32 mask = ~0; if (hw[dw] == 0) { dw++; @@ -1113,11 +1131,19 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison) continue; } + if (hw[dw] & MI_LRI_LRM_CS_MMIO) { + mask = 0xfff; + if (relative) + cmd |= MI_LRI_LRM_CS_MMIO; + else + offset = ce->engine->mmio_base; + } + dw++; + *cs++ = cmd | len; len = (len + 1) / 2; - *cs++ = MI_LOAD_REGISTER_IMM(len); while (len--) { - *cs++ = hw[dw]; + *cs++ = (hw[dw] & mask) + offset; *cs++ = poison; dw += 2; } @@ -1134,14 +1160,18 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison) return batch; } -static int poison_registers(struct intel_context *ce, u32 poison, u32 *sema) +static int +poison_registers(struct intel_context *ce, + u32 poison, + bool relative, + u32 *sema) { struct i915_request *rq; struct i915_vma *batch; u32 *cs; int err; - batch = load_context(ce, poison); + batch = load_context(ce, poison, relative); if (IS_ERR(batch)) return PTR_ERR(batch); @@ -1191,7 +1221,7 @@ static int compare_isolation(struct intel_engine_cs *engine, struct i915_vma *ref[2], struct i915_vma *result[2], struct intel_context *ce, - u32 poison) + u32 poison, bool relative) { u32 x, dw, *hw, *lrc; u32 *A[2], *B[2]; @@ -1240,6 +1270,7 @@ static int compare_isolation(struct intel_engine_cs *engine, hw += LRC_STATE_OFFSET / sizeof(*hw); do { u32 len = hw[dw] & 0x7f; + bool is_relative = relative; if (hw[dw] == 0) { dw++; @@ -1251,6 +1282,9 @@ static int compare_isolation(struct intel_engine_cs *engine, continue; } + if (!(hw[dw] & MI_LRI_LRM_CS_MMIO)) + is_relative = false; + dw++; len = (len + 1) / 2; while (len--) { @@ -1262,9 +1296,10 @@ static int compare_isolation(struct intel_engine_cs *engine, break; default: - pr_err("%s[%d]: Mismatch for register %4x, default %08x, reference %08x, result (%08x, %08x), poison %08x, context %08x\n", - engine->name, dw, - hw[dw], hw[dw + 1], + pr_err("%s[%d]: Mismatch for register %4x [using relative? %s], default %08x, reference %08x, result (%08x, %08x), poison %08x, context %08x\n", + engine->name, dw, hw[dw], + yesno(is_relative), + hw[dw + 1], A[0][x], B[0][x], B[1][x], poison, lrc[dw + 1]); err = -EINVAL; @@ -1290,7 +1325,8 @@ static int compare_isolation(struct intel_engine_cs *engine, return err; } -static int __lrc_isolation(struct intel_engine_cs *engine, u32 poison) +static int +__lrc_isolation(struct intel_engine_cs *engine, u32 poison, bool relative) { u32 *sema = memset32(engine->status_page.addr + 1000, 0, 1); struct i915_vma *ref[2], *result[2]; @@ -1320,7 +1356,7 @@ static int __lrc_isolation(struct intel_engine_cs *engine, u32 poison) goto err_ref0; } - rq = record_registers(A, ref[0], ref[1], sema); + rq = record_registers(A, ref[0], ref[1], relative, sema); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_ref1; @@ -1348,13 +1384,13 @@ static int __lrc_isolation(struct intel_engine_cs *engine, u32 poison) goto err_result0; } - rq = record_registers(A, result[0], result[1], sema); + rq = record_registers(A, result[0], result[1], relative, sema); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_result1; } - err = poison_registers(B, poison, sema); + err = poison_registers(B, poison, relative, sema); if (err) { WRITE_ONCE(*sema, -1); i915_request_put(rq); @@ -1368,7 +1404,7 @@ static int __lrc_isolation(struct intel_engine_cs *engine, u32 poison) } i915_request_put(rq); - err = compare_isolation(engine, ref, result, A, poison); + err = compare_isolation(engine, ref, result, A, poison, relative); err_result1: i915_vma_put(result[1]); @@ -1430,13 +1466,23 @@ static int live_lrc_isolation(void *arg) for (i = 0; i < ARRAY_SIZE(poison); i++) { int result; - result = __lrc_isolation(engine, poison[i]); + result = __lrc_isolation(engine, poison[i], false); if (result && !err) err = result; - result = __lrc_isolation(engine, ~poison[i]); + result = __lrc_isolation(engine, ~poison[i], false); if (result && !err) err = result; + + if (intel_engine_has_relative_mmio(engine)) { + result = __lrc_isolation(engine, poison[i], true); + if (result && !err) + err = result; + + result = __lrc_isolation(engine, ~poison[i], true); + if (result && !err) + err = result; + } } intel_engine_pm_put(engine); if (igt_flush_test(gt->i915)) { From patchwork Mon Feb 1 08:56:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C80FC433E6 for ; Mon, 1 Feb 2021 08:58:12 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2609064E3F for ; Mon, 1 Feb 2021 08:58:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2609064E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B4CDC6E4C9; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5E2A66E491 for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757716-1500050 for multiple; Mon, 01 Feb 2021 08:57:16 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:21 +0000 Message-Id: <20210201085715.27435-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/57] drm/i915/selftests: Exercise cross-process context isolation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Verify that one context running on engine A cannot manipulate another client's context concurrently running on engine B using unprivileged access. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/selftest_lrc.c | 275 +++++++++++++++++++++---- 1 file changed, 238 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 0524232378e4..e97adf1b7729 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -911,6 +911,7 @@ create_user_vma(struct i915_address_space *vm, unsigned long size) static struct i915_vma * store_context(struct intel_context *ce, + struct intel_engine_cs *engine, struct i915_vma *scratch, bool relative) { @@ -928,7 +929,7 @@ store_context(struct intel_context *ce, return ERR_CAST(cs); } - defaults = shmem_pin_map(ce->engine->default_state); + defaults = shmem_pin_map(engine->default_state); if (!defaults) { i915_gem_object_unpin_map(batch->obj); i915_vma_put(batch); @@ -960,7 +961,7 @@ store_context(struct intel_context *ce, if (relative) cmd |= MI_LRI_LRM_CS_MMIO; else - offset = ce->engine->mmio_base; + offset = engine->mmio_base; } dw++; @@ -979,7 +980,7 @@ store_context(struct intel_context *ce, *cs++ = MI_BATCH_BUFFER_END; - shmem_unpin_map(ce->engine->default_state, defaults); + shmem_unpin_map(engine->default_state, defaults); i915_gem_object_flush_map(batch->obj); i915_gem_object_unpin_map(batch->obj); @@ -1002,23 +1003,48 @@ static int move_to_active(struct i915_request *rq, return err; } +struct hwsp_semaphore { + u32 ggtt; + u32 *va; +}; + +static struct hwsp_semaphore hwsp_semaphore(struct intel_engine_cs *engine) +{ + struct hwsp_semaphore s; + + s.va = memset32(engine->status_page.addr + 1000, 0, 1); + s.ggtt = (i915_ggtt_offset(engine->status_page.vma) + + offset_in_page(s.va)); + + return s; +} + +static u32 *emit_noops(u32 *cs, int count) +{ + while (count--) + *cs++ = MI_NOOP; + + return cs; +} + static struct i915_request * record_registers(struct intel_context *ce, + struct intel_engine_cs *engine, struct i915_vma *before, struct i915_vma *after, bool relative, - u32 *sema) + const struct hwsp_semaphore *sema) { struct i915_vma *b_before, *b_after; struct i915_request *rq; u32 *cs; int err; - b_before = store_context(ce, before, relative); + b_before = store_context(ce, engine, before, relative); if (IS_ERR(b_before)) return ERR_CAST(b_before); - b_after = store_context(ce, after, relative); + b_after = store_context(ce, engine, after, relative); if (IS_ERR(b_after)) { rq = ERR_CAST(b_after); goto err_before; @@ -1044,7 +1070,7 @@ record_registers(struct intel_context *ce, if (err) goto err_rq; - cs = intel_ring_begin(rq, 14); + cs = intel_ring_begin(rq, 18); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_rq; @@ -1055,16 +1081,28 @@ record_registers(struct intel_context *ce, *cs++ = lower_32_bits(b_before->node.start); *cs++ = upper_32_bits(b_before->node.start); - *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - *cs++ = MI_SEMAPHORE_WAIT | - MI_SEMAPHORE_GLOBAL_GTT | - MI_SEMAPHORE_POLL | - MI_SEMAPHORE_SAD_NEQ_SDD; - *cs++ = 0; - *cs++ = i915_ggtt_offset(ce->engine->status_page.vma) + - offset_in_page(sema); - *cs++ = 0; - *cs++ = MI_NOOP; + if (sema) { + WRITE_ONCE(*sema->va, -1); + + /* Signal the poisoner */ + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; + *cs++ = sema->ggtt; + *cs++ = 0; + *cs++ = 0; + + /* Then wait for the poison to settle */ + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + *cs++ = MI_SEMAPHORE_WAIT | + MI_SEMAPHORE_GLOBAL_GTT | + MI_SEMAPHORE_POLL | + MI_SEMAPHORE_SAD_NEQ_SDD; + *cs++ = 0; + *cs++ = sema->ggtt; + *cs++ = 0; + *cs++ = MI_NOOP; + } else { + cs = emit_noops(cs, 10); + } *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; *cs++ = MI_BATCH_BUFFER_START_GEN8 | BIT(8); @@ -1073,7 +1111,6 @@ record_registers(struct intel_context *ce, intel_ring_advance(rq, cs); - WRITE_ONCE(*sema, 0); i915_request_get(rq); i915_request_add(rq); err_after: @@ -1089,7 +1126,9 @@ record_registers(struct intel_context *ce, } static struct i915_vma * -load_context(struct intel_context *ce, u32 poison, bool relative) +load_context(struct intel_context *ce, + struct intel_engine_cs *engine, + u32 poison, bool relative) { struct i915_vma *batch; u32 dw, *cs, *hw; @@ -1105,7 +1144,7 @@ load_context(struct intel_context *ce, u32 poison, bool relative) return ERR_CAST(cs); } - defaults = shmem_pin_map(ce->engine->default_state); + defaults = shmem_pin_map(engine->default_state); if (!defaults) { i915_gem_object_unpin_map(batch->obj); i915_vma_put(batch); @@ -1136,7 +1175,7 @@ load_context(struct intel_context *ce, u32 poison, bool relative) if (relative) cmd |= MI_LRI_LRM_CS_MMIO; else - offset = ce->engine->mmio_base; + offset = engine->mmio_base; } dw++; @@ -1152,7 +1191,7 @@ load_context(struct intel_context *ce, u32 poison, bool relative) *cs++ = MI_BATCH_BUFFER_END; - shmem_unpin_map(ce->engine->default_state, defaults); + shmem_unpin_map(engine->default_state, defaults); i915_gem_object_flush_map(batch->obj); i915_gem_object_unpin_map(batch->obj); @@ -1162,16 +1201,17 @@ load_context(struct intel_context *ce, u32 poison, bool relative) static int poison_registers(struct intel_context *ce, + struct intel_engine_cs *engine, u32 poison, bool relative, - u32 *sema) + const struct hwsp_semaphore *sema) { struct i915_request *rq; struct i915_vma *batch; u32 *cs; int err; - batch = load_context(ce, poison, relative); + batch = load_context(ce, engine, poison, relative); if (IS_ERR(batch)) return PTR_ERR(batch); @@ -1185,20 +1225,29 @@ poison_registers(struct intel_context *ce, if (err) goto err_rq; - cs = intel_ring_begin(rq, 8); + cs = intel_ring_begin(rq, 14); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_rq; } + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + *cs++ = MI_SEMAPHORE_WAIT | + MI_SEMAPHORE_GLOBAL_GTT | + MI_SEMAPHORE_POLL | + MI_SEMAPHORE_SAD_EQ_SDD; + *cs++ = 0; + *cs++ = sema->ggtt; + *cs++ = 0; + *cs++ = MI_NOOP; + *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; *cs++ = MI_BATCH_BUFFER_START_GEN8 | BIT(8); *cs++ = lower_32_bits(batch->node.start); *cs++ = upper_32_bits(batch->node.start); *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; - *cs++ = i915_ggtt_offset(ce->engine->status_page.vma) + - offset_in_page(sema); + *cs++ = sema->ggtt; *cs++ = 0; *cs++ = 1; @@ -1258,7 +1307,7 @@ static int compare_isolation(struct intel_engine_cs *engine, } lrc += LRC_STATE_OFFSET / sizeof(*hw); - defaults = shmem_pin_map(ce->engine->default_state); + defaults = shmem_pin_map(engine->default_state); if (!defaults) { err = -ENOMEM; goto err_lrc; @@ -1311,7 +1360,7 @@ static int compare_isolation(struct intel_engine_cs *engine, } while (dw < PAGE_SIZE / sizeof(u32) && (hw[dw] & ~BIT(0)) != MI_BATCH_BUFFER_END); - shmem_unpin_map(ce->engine->default_state, defaults); + shmem_unpin_map(engine->default_state, defaults); err_lrc: i915_gem_object_unpin_map(ce->state->obj); err_B1: @@ -1328,7 +1377,7 @@ static int compare_isolation(struct intel_engine_cs *engine, static int __lrc_isolation(struct intel_engine_cs *engine, u32 poison, bool relative) { - u32 *sema = memset32(engine->status_page.addr + 1000, 0, 1); + struct hwsp_semaphore sema = hwsp_semaphore(engine); struct i915_vma *ref[2], *result[2]; struct intel_context *A, *B; struct i915_request *rq; @@ -1356,15 +1405,12 @@ __lrc_isolation(struct intel_engine_cs *engine, u32 poison, bool relative) goto err_ref0; } - rq = record_registers(A, ref[0], ref[1], relative, sema); + rq = record_registers(A, engine, ref[0], ref[1], relative, NULL); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_ref1; } - WRITE_ONCE(*sema, 1); - wmb(); - if (i915_request_wait(rq, 0, HZ / 2) < 0) { i915_request_put(rq); err = -ETIME; @@ -1384,15 +1430,15 @@ __lrc_isolation(struct intel_engine_cs *engine, u32 poison, bool relative) goto err_result0; } - rq = record_registers(A, result[0], result[1], relative, sema); + rq = record_registers(A, engine, result[0], result[1], relative, &sema); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_result1; } - err = poison_registers(B, poison, relative, sema); + err = poison_registers(B, engine, poison, relative, &sema); if (err) { - WRITE_ONCE(*sema, -1); + WRITE_ONCE(*sema.va, -1); i915_request_put(rq); goto err_result1; } @@ -1494,6 +1540,160 @@ static int live_lrc_isolation(void *arg) return err; } +static int __lrc_cross(struct intel_engine_cs *a, + struct intel_engine_cs *b, + u32 poison) +{ + struct hwsp_semaphore sema = hwsp_semaphore(a); + struct i915_vma *ref[2], *result[2]; + struct intel_context *A, *B; + struct i915_request *rq; + int err; + + GEM_BUG_ON(a->gt->ggtt != b->gt->ggtt); + + pr_debug("Context on %s, poisoning from %s with %08x\n", + a->name, b->name, poison); + + A = intel_context_create(a); + if (IS_ERR(A)) + return PTR_ERR(A); + + B = intel_context_create(b); + if (IS_ERR(B)) { + err = PTR_ERR(B); + goto err_A; + } + + ref[0] = create_user_vma(A->vm, SZ_64K); + if (IS_ERR(ref[0])) { + err = PTR_ERR(ref[0]); + goto err_B; + } + + ref[1] = create_user_vma(A->vm, SZ_64K); + if (IS_ERR(ref[1])) { + err = PTR_ERR(ref[1]); + goto err_ref0; + } + + rq = record_registers(A, a, ref[0], ref[1], false, NULL); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto err_ref1; + } + + if (i915_request_wait(rq, 0, HZ / 2) < 0) { + i915_request_put(rq); + err = -ETIME; + goto err_ref1; + } + i915_request_put(rq); + + result[0] = create_user_vma(A->vm, SZ_64K); + if (IS_ERR(result[0])) { + err = PTR_ERR(result[0]); + goto err_ref1; + } + + result[1] = create_user_vma(A->vm, SZ_64K); + if (IS_ERR(result[1])) { + err = PTR_ERR(result[1]); + goto err_result0; + } + + rq = record_registers(A, a, result[0], result[1], false, &sema); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto err_result1; + } + + err = poison_registers(B, a, poison, false, &sema); + if (err) { + WRITE_ONCE(*sema.va, -1); + i915_request_put(rq); + goto err_result1; + } + + if (i915_request_wait(rq, 0, HZ / 2) < 0) { + i915_request_put(rq); + err = -ETIME; + goto err_result1; + } + i915_request_put(rq); + + err = compare_isolation(a, ref, result, A, poison, false); + +err_result1: + i915_vma_put(result[1]); +err_result0: + i915_vma_put(result[0]); +err_ref1: + i915_vma_put(ref[1]); +err_ref0: + i915_vma_put(ref[0]); +err_B: + intel_context_put(B); +err_A: + intel_context_put(A); + return err; +} + +static int live_lrc_cross(void *arg) +{ + struct intel_gt *gt = arg; + struct intel_engine_cs *a, *b; + enum intel_engine_id a_id, b_id; + const u32 poison[] = { + STACK_MAGIC, + 0x3a3a3a3a, + 0x5c5c5c5c, + 0xffffffff, + 0xffff0000, + }; + int err = 0; + int i; + + /* + * Our goal is to try and tamper with another client's context + * running concurrently. The HW's goal is to stop us. + */ + + for_each_engine(a, gt, a_id) { + if (!IS_ENABLED(CONFIG_DRM_I915_SELFTEST_BROKEN) && + skip_isolation(a)) + continue; + + intel_engine_pm_get(a); + for_each_engine(b, gt, b_id) { + if (a == b) + continue; + + intel_engine_pm_get(b); + for (i = 0; i < ARRAY_SIZE(poison); i++) { + int result; + + result = __lrc_cross(a, b, poison[i]); + if (result && !err) + err = result; + + result = __lrc_cross(a, b, ~poison[i]); + if (result && !err) + err = result; + } + intel_engine_pm_put(b); + } + intel_engine_pm_put(a); + + if (igt_flush_test(gt->i915)) { + err = -EIO; + break; + } + } + + return err; +} + static int indirect_ctx_submit_req(struct intel_context *ce) { struct i915_request *rq; @@ -1884,6 +2084,7 @@ int intel_lrc_live_selftests(struct drm_i915_private *i915) SUBTEST(live_lrc_isolation), SUBTEST(live_lrc_timestamp), SUBTEST(live_lrc_garbage), + SUBTEST(live_lrc_cross), SUBTEST(live_pphwsp_runtime), SUBTEST(live_lrc_indirect_ctx_bb), }; From patchwork Mon Feb 1 08:56:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A0E2C433E0 for ; Mon, 1 Feb 2021 08:58:22 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1165664E33 for ; Mon, 1 Feb 2021 08:58:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1165664E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A45956E542; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 30AB66E4AD for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757717-1500050 for multiple; Mon, 01 Feb 2021 08:57:16 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:22 +0000 Message-Id: <20210201085715.27435-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/57] drm/i915: Protect against request freeing during cancellation on wedging X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" As soon as we mark a request as completed, it may be retired. So when cancelling a request and marking it complete, make sure we first keep a reference to the request. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- .../drm/i915/gt/intel_execlists_submission.c | 19 +++++++++++-------- drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++++++--------- .../gpu/drm/i915/gt/intel_ring_submission.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 8 +++++--- drivers/gpu/drm/i915/i915_request.c | 9 +++++++-- drivers/gpu/drm/i915/i915_request.h | 2 +- 6 files changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index e7593df6777d..45a8ac152b88 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2976,7 +2976,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) /* Mark all executing requests as skipped. */ list_for_each_entry(rq, &engine->active.requests, sched.link) - i915_request_mark_eio(rq); + i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(engine); /* Flush the queued requests to the timeline list (for retiring). */ @@ -2984,8 +2984,10 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { - i915_request_mark_eio(rq); - __i915_request_submit(rq); + if (i915_request_mark_eio(rq)) { + __i915_request_submit(rq); + i915_request_put(rq); + } } rb_erase_cached(&p->node, &execlists->queue); @@ -2994,7 +2996,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) /* On-hold requests will be flushed to timeline upon their release */ list_for_each_entry(rq, &engine->active.hold, sched.link) - i915_request_mark_eio(rq); + i915_request_put(i915_request_mark_eio(rq)); /* Cancel all attached virtual engines */ while ((rb = rb_first_cached(&execlists->virtual))) { @@ -3007,10 +3009,11 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) spin_lock(&ve->base.active.lock); rq = fetch_and_zero(&ve->request); if (rq) { - i915_request_mark_eio(rq); - - rq->engine = engine; - __i915_request_submit(rq); + if (i915_request_mark_eio(rq)) { + rq->engine = engine; + __i915_request_submit(rq); + i915_request_put(rq); + } i915_request_put(rq); ve->base.execlists.queue_priority_hint = INT_MIN; diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 107430e1e864..a82c4d7b23bc 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -786,18 +786,15 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) static void nop_submit_request(struct i915_request *request) { - struct intel_engine_cs *engine = request->engine; - unsigned long flags; - RQ_TRACE(request, "-EIO\n"); - i915_request_set_error_once(request, -EIO); - spin_lock_irqsave(&engine->active.lock, flags); - __i915_request_submit(request); - i915_request_mark_complete(request); - spin_unlock_irqrestore(&engine->active.lock, flags); + request = i915_request_mark_eio(request); + if (request) { + i915_request_submit(request); + intel_engine_signal_breadcrumbs(request->engine); - intel_engine_signal_breadcrumbs(engine); + i915_request_put(request); + } } static void __intel_gt_set_wedged(struct intel_gt *gt) diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 8b7cc637c432..9c2c605d7a92 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -400,7 +400,7 @@ static void reset_cancel(struct intel_engine_cs *engine) /* Mark all submitted requests as skipped. */ list_for_each_entry(request, &engine->active.requests, sched.link) - i915_request_mark_eio(request); + i915_request_put(i915_request_mark_eio(request)); intel_engine_signal_breadcrumbs(engine); /* Remaining _unready_ requests will be nop'ed when submitted */ diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index df7c1b1acc32..cf1269e74998 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -239,13 +239,15 @@ static void mock_reset_cancel(struct intel_engine_cs *engine) /* Mark all submitted requests as skipped. */ list_for_each_entry(rq, &engine->active.requests, sched.link) - i915_request_mark_eio(rq); + i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(engine); /* Cancel and submit all pending requests. */ list_for_each_entry(rq, &mock->hw_queue, mock.link) { - i915_request_mark_eio(rq); - __i915_request_submit(rq); + if (i915_request_mark_eio(rq)) { + __i915_request_submit(rq); + i915_request_put(rq); + } } INIT_LIST_HEAD(&mock->hw_queue); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d66981b083cd..a336d6c40d8b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -514,15 +514,20 @@ void i915_request_set_error_once(struct i915_request *rq, int error) } while (!try_cmpxchg(&rq->fence.error, &old, error)); } -void i915_request_mark_eio(struct i915_request *rq) +struct i915_request *i915_request_mark_eio(struct i915_request *rq) { if (__i915_request_is_complete(rq)) - return; + return NULL; GEM_BUG_ON(i915_request_signaled(rq)); + /* As soon as the request is completed, it may be retired */ + rq = i915_request_get(rq); + i915_request_set_error_once(rq, -EIO); i915_request_mark_complete(rq); + + return rq; } bool __i915_request_submit(struct i915_request *request) diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 1bfe214a47e9..c0bd4cb8786a 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -311,7 +311,7 @@ i915_request_create(struct intel_context *ce); void __i915_request_skip(struct i915_request *rq); void i915_request_set_error_once(struct i915_request *rq, int error); -void i915_request_mark_eio(struct i915_request *rq); +struct i915_request *i915_request_mark_eio(struct i915_request *rq); struct i915_request *__i915_request_commit(struct i915_request *request); void __i915_request_queue(struct i915_request *rq, From patchwork Mon Feb 1 08:56:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31D38C433E0 for ; Mon, 1 Feb 2021 08:58:20 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C39F864E33 for ; Mon, 1 Feb 2021 08:58:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C39F864E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 928C56E52F; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 685A16E4F8 for ; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757718-1500050 for multiple; Mon, 01 Feb 2021 08:57:16 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:23 +0000 Message-Id: <20210201085715.27435-5-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 05/57] drm/i915: Take rcu_read_lock for querying fence's driver/timeline names X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The name very often may be freed independently of the fence, with the only protection being RCU. To be safe as we read the names, hold RCU. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_sw_fence.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 2744558f3050..dfabf291e5cd 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -430,11 +430,13 @@ static void timer_i915_sw_fence_wake(struct timer_list *t) if (!fence) return; + rcu_read_lock(); pr_notice("Asynchronous wait on fence %s:%s:%llx timed out (hint:%ps)\n", cb->dma->ops->get_driver_name(cb->dma), cb->dma->ops->get_timeline_name(cb->dma), cb->dma->seqno, i915_sw_fence_debug_hint(fence)); + rcu_read_unlock(); i915_sw_fence_set_error_once(fence, -ETIMEDOUT); i915_sw_fence_complete(fence); From patchwork Mon Feb 1 08:56:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB529C4332B for ; Mon, 1 Feb 2021 08:58:23 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4B62064E3F for ; Mon, 1 Feb 2021 08:58:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B62064E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 320036E525; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB33D6E4EA for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757719-1500050 for multiple; Mon, 01 Feb 2021 08:57:16 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:24 +0000 Message-Id: <20210201085715.27435-6-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 06/57] drm/i915/gt: Always flush the submission queue on checking for idle X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We check for idle during debug prints and other debugging actions. Simplify the flow by not touching execlists state. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 0aaf0626425a..727128c0166a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1248,14 +1248,8 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) return true; /* Waiting to drain ELSP? */ - if (execlists_active(&engine->execlists)) { - synchronize_hardirq(engine->i915->drm.pdev->irq); - - intel_engine_flush_submission(engine); - - if (execlists_active(&engine->execlists)) - return false; - } + synchronize_hardirq(engine->i915->drm.pdev->irq); + intel_engine_flush_submission(engine); /* ELSP is empty, but there are ready requests? E.g. after reset */ if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) From patchwork Mon Feb 1 08:56:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61550C433E9 for ; Mon, 1 Feb 2021 08:58:05 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CF2EF64E33 for ; Mon, 1 Feb 2021 08:58:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF2EF64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7CCDA6E48C; Mon, 1 Feb 2021 08:57:41 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 86DE76E4FE for ; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757720-1500050 for multiple; Mon, 01 Feb 2021 08:57:16 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:25 +0000 Message-Id: <20210201085715.27435-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 07/57] drm/i915/gt: Move engine setup out of set_default_submission X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that we no longer switch back and forth between guc and execlists, we no longer need to restore the backend's vfunc and can leave them set after initialisation. The only catch is that we lose the submission on wedging and still need to reset the submit_request vfunc on unwedging. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../drm/i915/gt/intel_execlists_submission.c | 46 ++++++++--------- .../gpu/drm/i915/gt/intel_ring_submission.c | 4 -- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 50 ++++++++----------- 3 files changed, 44 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 45a8ac152b88..5d824e1cfcba 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3089,29 +3089,6 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine) engine->submit_request = execlists_submit_request; engine->schedule = i915_schedule; engine->execlists.tasklet.callback = execlists_submission_tasklet; - - engine->reset.prepare = execlists_reset_prepare; - engine->reset.rewind = execlists_reset_rewind; - engine->reset.cancel = execlists_reset_cancel; - engine->reset.finish = execlists_reset_finish; - - engine->park = execlists_park; - engine->unpark = NULL; - - engine->flags |= I915_ENGINE_SUPPORTS_STATS; - if (!intel_vgpu_active(engine->i915)) { - engine->flags |= I915_ENGINE_HAS_SEMAPHORES; - if (can_preempt(engine)) { - engine->flags |= I915_ENGINE_HAS_PREEMPTION; - if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) - engine->flags |= I915_ENGINE_HAS_TIMESLICES; - } - } - - if (intel_engine_has_preemption(engine)) - engine->emit_bb_start = gen8_emit_bb_start; - else - engine->emit_bb_start = gen8_emit_bb_start_noarb; } static void execlists_shutdown(struct intel_engine_cs *engine) @@ -3142,6 +3119,14 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; + engine->reset.prepare = execlists_reset_prepare; + engine->reset.rewind = execlists_reset_rewind; + engine->reset.cancel = execlists_reset_cancel; + engine->reset.finish = execlists_reset_finish; + + engine->park = execlists_park; + engine->unpark = NULL; + engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs; @@ -3162,6 +3147,21 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) * until a more refined solution exists. */ } + + engine->flags |= I915_ENGINE_SUPPORTS_STATS; + if (!intel_vgpu_active(engine->i915)) { + engine->flags |= I915_ENGINE_HAS_SEMAPHORES; + if (can_preempt(engine)) { + engine->flags |= I915_ENGINE_HAS_PREEMPTION; + if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) + engine->flags |= I915_ENGINE_HAS_TIMESLICES; + } + } + + if (intel_engine_has_preemption(engine)) + engine->emit_bb_start = gen8_emit_bb_start; + else + engine->emit_bb_start = gen8_emit_bb_start_noarb; } static void logical_ring_default_irqs(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 9c2c605d7a92..3cb2ce503544 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -969,14 +969,10 @@ static void gen6_bsd_submit_request(struct i915_request *request) static void i9xx_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = i9xx_submit_request; - - engine->park = NULL; - engine->unpark = NULL; } static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine) { - i9xx_set_default_submission(engine); engine->submit_request = gen6_bsd_submit_request; } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 92688a9b6717..f72faa0b8339 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -608,35 +608,6 @@ static int guc_resume(struct intel_engine_cs *engine) static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request; - engine->schedule = i915_schedule; - engine->execlists.tasklet.callback = guc_submission_tasklet; - - engine->reset.prepare = guc_reset_prepare; - engine->reset.rewind = guc_reset_rewind; - engine->reset.cancel = guc_reset_cancel; - engine->reset.finish = guc_reset_finish; - - engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; - engine->flags |= I915_ENGINE_HAS_PREEMPTION; - - /* - * TODO: GuC supports timeslicing and semaphores as well, but they're - * handled by the firmware so some minor tweaks are required before - * enabling. - * - * engine->flags |= I915_ENGINE_HAS_TIMESLICES; - * engine->flags |= I915_ENGINE_HAS_SEMAPHORES; - */ - - engine->emit_bb_start = gen8_emit_bb_start; - - /* - * For the breadcrumb irq to work we need the interrupts to stay - * enabled. However, on all platforms on which we'll have support for - * GuC submission we don't allow disabling the interrupts at runtime, so - * we're always safe with the current flow. - */ - GEM_BUG_ON(engine->irq_enable || engine->irq_disable); } static void guc_release(struct intel_engine_cs *engine) @@ -658,6 +629,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; + engine->schedule = i915_schedule; + + engine->reset.prepare = guc_reset_prepare; + engine->reset.rewind = guc_reset_rewind; + engine->reset.cancel = guc_reset_cancel; + engine->reset.finish = guc_reset_finish; + engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs; @@ -666,6 +644,20 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->emit_flush = gen12_emit_flush_xcs; } engine->set_default_submission = guc_set_default_submission; + + engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; + engine->flags |= I915_ENGINE_HAS_PREEMPTION; + + /* + * TODO: GuC supports timeslicing and semaphores as well, but they're + * handled by the firmware so some minor tweaks are required before + * enabling. + * + * engine->flags |= I915_ENGINE_HAS_TIMESLICES; + * engine->flags |= I915_ENGINE_HAS_SEMAPHORES; + */ + + engine->emit_bb_start = gen8_emit_bb_start; } static void rcs_submission_override(struct intel_engine_cs *engine) From patchwork Mon Feb 1 08:56:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35EC1C433E6 for ; Mon, 1 Feb 2021 08:58:24 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C7EE264E3F for ; Mon, 1 Feb 2021 08:58:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C7EE264E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5EEB36E56A; Mon, 1 Feb 2021 08:57:46 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 09B936E49B for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757721-1500050 for multiple; Mon, 01 Feb 2021 08:57:17 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:26 +0000 Message-Id: <20210201085715.27435-8-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/57] drm/i915/gt: Move submission_method into intel_gt X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since we setup the submission method for the engines once, it is easy to assign an enum and use that instead of probing into the backends. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 8 +++++++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 12 ++++++++---- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 8 -------- drivers/gpu/drm/i915/gt/intel_execlists_submission.h | 3 --- drivers/gpu/drm/i915/gt/intel_gt_types.h | 7 +++++++ drivers/gpu/drm/i915/gt/intel_reset.c | 7 +++---- drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 +- drivers/gpu/drm/i915/gt/selftest_ring_submission.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 ----- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 1 - drivers/gpu/drm/i915/i915_perf.c | 10 +++++----- 11 files changed, 32 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 47ee8578e511..8d9184920c51 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -13,8 +13,9 @@ #include "i915_reg.h" #include "i915_request.h" #include "i915_selftest.h" -#include "gt/intel_timeline.h" #include "intel_engine_types.h" +#include "intel_gt_types.h" +#include "intel_timeline.h" #include "intel_workarounds.h" struct drm_printer; @@ -262,6 +263,11 @@ void intel_engine_init_active(struct intel_engine_cs *engine, #define ENGINE_MOCK 1 #define ENGINE_VIRTUAL 2 +static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine) +{ + return engine->gt->submission_method >= INTEL_SUBMISSION_GUC; +} + static inline bool intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 727128c0166a..3d1bf6b3c3bf 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -891,12 +891,16 @@ int intel_engines_init(struct intel_gt *gt) enum intel_engine_id id; int err; - if (intel_uc_uses_guc_submission(>->uc)) + if (intel_uc_uses_guc_submission(>->uc)) { + gt->submission_method = INTEL_SUBMISSION_GUC; setup = intel_guc_submission_setup; - else if (HAS_EXECLISTS(gt->i915)) + } else if (HAS_EXECLISTS(gt->i915)) { + gt->submission_method = INTEL_SUBMISSION_ELSP; setup = intel_execlists_submission_setup; - else + } else { + gt->submission_method = INTEL_SUBMISSION_RING; setup = intel_ring_submission_setup; + } for_each_engine(engine, gt, id) { err = engine_setup_common(engine); @@ -1461,7 +1465,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR)); } - if (intel_engine_in_guc_submission_mode(engine)) { + if (intel_engine_uses_guc(engine)) { /* nothing to print yet */ } else if (HAS_EXECLISTS(dev_priv)) { struct i915_request * const *port, *rq; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 5d824e1cfcba..4ddd2099a931 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1757,7 +1757,6 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) */ GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) && !reset_in_progress(execlists)); - GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine)); /* * Note that csb_write, csb_status may be either in HWSP or mmio. @@ -3897,13 +3896,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, spin_unlock_irqrestore(&engine->active.lock, flags); } -bool -intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine) -{ - return engine->set_default_submission == - execlists_set_default_submission; -} - #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_execlists.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h index a8fd7adefd82..f7bd3fccfee8 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h @@ -41,7 +41,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, const struct intel_engine_cs *master, const struct intel_engine_cs *sibling); -bool -intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine); - #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index 91d20daca536..626af37c7790 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -29,6 +29,12 @@ struct i915_ggtt; struct intel_engine_cs; struct intel_uncore; +enum intel_submission_method { + INTEL_SUBMISSION_RING, + INTEL_SUBMISSION_ELSP, + INTEL_SUBMISSION_GUC, +}; + struct intel_gt { struct drm_i915_private *i915; struct intel_uncore *uncore; @@ -108,6 +114,7 @@ struct intel_gt { struct intel_engine_cs *engine[I915_NUM_ENGINES]; struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1] [MAX_ENGINE_INSTANCE + 1]; + enum intel_submission_method submission_method; /* * Default address space (either GGTT or ppGTT depending on arch). diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index a82c4d7b23bc..4a8f982a1a4f 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1113,7 +1113,6 @@ static int intel_gt_reset_engine(struct intel_engine_cs *engine) int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) { struct intel_gt *gt = engine->gt; - bool uses_guc = intel_engine_in_guc_submission_mode(engine); int ret; ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); @@ -1129,10 +1128,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]); - if (!uses_guc) - ret = intel_gt_reset_engine(engine); - else + if (intel_engine_uses_guc(engine)) ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine); + else + ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */ ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret); diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 5d7fac383add..9304a35384aa 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -4715,7 +4715,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915) SUBTEST(live_virtual_reset), }; - if (!HAS_EXECLISTS(i915)) + if (i915->gt.submission_method != INTEL_SUBMISSION_ELSP) return 0; if (intel_gt_is_wedged(&i915->gt)) diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c index 3350e7c995bc..6cd9f6bc240c 100644 --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c @@ -291,7 +291,7 @@ int intel_ring_submission_live_selftests(struct drm_i915_private *i915) SUBTEST(live_ctx_switch_wa), }; - if (HAS_EXECLISTS(i915)) + if (i915->gt.submission_method > INTEL_SUBMISSION_RING) return 0; return intel_gt_live_subtests(tests, &i915->gt); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index f72faa0b8339..17b551a0c89f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -745,8 +745,3 @@ void intel_guc_submission_init_early(struct intel_guc *guc) { guc->submission_selected = __guc_submission_selected(guc); } - -bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine) -{ - return engine->set_default_submission == guc_set_default_submission; -} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 5f7b9e6347d0..3f7005018939 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -20,7 +20,6 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); -bool intel_engine_in_guc_submission_mode(const struct intel_engine_cs *engine); static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 112ba5f2ce90..89665e14ab01 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1273,11 +1273,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) case 8: case 9: case 10: - if (intel_engine_in_execlists_submission_mode(ce->engine)) { - stream->specific_ctx_id_mask = - (1U << GEN8_CTX_ID_WIDTH) - 1; - stream->specific_ctx_id = stream->specific_ctx_id_mask; - } else { + if (intel_engine_uses_guc(ce->engine)) { /* * When using GuC, the context descriptor we write in * i915 is read by GuC and rewritten before it's @@ -1296,6 +1292,10 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) */ stream->specific_ctx_id_mask = (1U << (GEN8_CTX_ID_WIDTH - 1)) - 1; + } else { + stream->specific_ctx_id_mask = + (1U << GEN8_CTX_ID_WIDTH) - 1; + stream->specific_ctx_id = stream->specific_ctx_id_mask; } break; From patchwork Mon Feb 1 08:56:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE8AAC433DB for ; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 117C764E33 for ; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 117C764E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6256C6E4C7; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6CA736E48B for ; Mon, 1 Feb 2021 08:57:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757722-1500050 for multiple; Mon, 01 Feb 2021 08:57:17 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:27 +0000 Message-Id: <20210201085715.27435-9-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/57] drm/i915: Replace engine->schedule() with a known request operation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Looking to the future, we want to set the scheduling attributes explicitly and so replace the generic engine->schedule() with the more direct i915_request_set_priority() What it loses in removing the 'schedule' name from the function, it gains in having an explicit entry point with a stated goal. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/display/intel_display.c | 5 ++- drivers/gpu/drm/i915/gem/i915_gem_object.h | 5 ++- drivers/gpu/drm/i915/gem/i915_gem_wait.c | 29 +++++----------- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 -- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 4 +-- drivers/gpu/drm/i915/gt/intel_engine_types.h | 29 ++++++++-------- drivers/gpu/drm/i915/gt/intel_engine_user.c | 2 +- .../drm/i915/gt/intel_execlists_submission.c | 3 +- drivers/gpu/drm/i915/gt/selftest_execlists.c | 33 +++++-------------- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 11 +++---- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +- drivers/gpu/drm/i915/i915_request.c | 10 +++--- drivers/gpu/drm/i915/i915_request.h | 5 +++ drivers/gpu/drm/i915/i915_scheduler.c | 15 +++++---- drivers/gpu/drm/i915/i915_scheduler.h | 3 +- 15 files changed, 65 insertions(+), 95 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index d8f10589e09e..aca964f7ba72 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -13662,7 +13662,6 @@ int intel_prepare_plane_fb(struct drm_plane *_plane, struct drm_plane_state *_new_plane_state) { - struct i915_sched_attr attr = { .priority = I915_PRIORITY_DISPLAY }; struct intel_plane *plane = to_intel_plane(_plane); struct intel_plane_state *new_plane_state = to_intel_plane_state(_new_plane_state); @@ -13703,7 +13702,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane, if (new_plane_state->uapi.fence) { /* explicit fencing */ i915_gem_fence_wait_priority(new_plane_state->uapi.fence, - &attr); + I915_PRIORITY_DISPLAY); ret = i915_sw_fence_await_dma_fence(&state->commit_ready, new_plane_state->uapi.fence, i915_fence_timeout(dev_priv), @@ -13725,7 +13724,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane, if (ret) return ret; - i915_gem_object_wait_priority(obj, 0, &attr); + i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY); i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB); if (!new_plane_state->uapi.fence) { /* implicit fencing */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 3411ad197fa6..325766abca21 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -549,15 +549,14 @@ static inline void __start_cpu_write(struct drm_i915_gem_object *obj) obj->cache_dirty = true; } -void i915_gem_fence_wait_priority(struct dma_fence *fence, - const struct i915_sched_attr *attr); +void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio); int i915_gem_object_wait(struct drm_i915_gem_object *obj, unsigned int flags, long timeout); int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, unsigned int flags, - const struct i915_sched_attr *attr); + int prio); void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, enum fb_op_origin origin); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c index 4b9856d5ba14..d79bf16083bd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -91,22 +91,12 @@ i915_gem_object_wait_reservation(struct dma_resv *resv, return timeout; } -static void fence_set_priority(struct dma_fence *fence, - const struct i915_sched_attr *attr) +static void fence_set_priority(struct dma_fence *fence, int prio) { - struct i915_request *rq; - struct intel_engine_cs *engine; - if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence)) return; - rq = to_request(fence); - engine = rq->engine; - - rcu_read_lock(); /* RCU serialisation for set-wedged protection */ - if (engine->schedule) - engine->schedule(rq, attr); - rcu_read_unlock(); + i915_request_set_priority(to_request(fence), prio); } static inline bool __dma_fence_is_chain(const struct dma_fence *fence) @@ -114,8 +104,7 @@ static inline bool __dma_fence_is_chain(const struct dma_fence *fence) return fence->ops == &dma_fence_chain_ops; } -void i915_gem_fence_wait_priority(struct dma_fence *fence, - const struct i915_sched_attr *attr) +void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio) { if (dma_fence_is_signaled(fence)) return; @@ -128,19 +117,19 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence, int i; for (i = 0; i < array->num_fences; i++) - fence_set_priority(array->fences[i], attr); + fence_set_priority(array->fences[i], prio); } else if (__dma_fence_is_chain(fence)) { struct dma_fence *iter; /* The chain is ordered; if we boost the last, we boost all */ dma_fence_chain_for_each(iter, fence) { fence_set_priority(to_dma_fence_chain(iter)->fence, - attr); + prio); break; } dma_fence_put(iter); } else { - fence_set_priority(fence, attr); + fence_set_priority(fence, prio); } local_bh_enable(); /* kick the tasklets if queues were reprioritised */ @@ -149,7 +138,7 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence, int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, unsigned int flags, - const struct i915_sched_attr *attr) + int prio) { struct dma_fence *excl; @@ -164,7 +153,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, return ret; for (i = 0; i < count; i++) { - i915_gem_fence_wait_priority(shared[i], attr); + i915_gem_fence_wait_priority(shared[i], prio); dma_fence_put(shared[i]); } @@ -174,7 +163,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, } if (excl) { - i915_gem_fence_wait_priority(excl, attr); + i915_gem_fence_wait_priority(excl, prio); dma_fence_put(excl); } return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 3d1bf6b3c3bf..44b6e8e646ed 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -319,9 +319,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (engine->context_size) DRIVER_CAPS(i915)->has_logical_contexts = true; - /* Nothing to do here, execute in order of dependencies */ - engine->schedule = NULL; - ewma__engine_latency_init(&engine->latency); ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 778bcae5ef2c..0b026cde9f09 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -114,7 +114,7 @@ static void heartbeat(struct work_struct *wrk) * but all other contexts, including the kernel * context are stuck waiting for the signal. */ - } else if (engine->schedule && + } else if (intel_engine_has_scheduler(engine) && rq->sched.attr.priority < I915_PRIORITY_BARRIER) { /* * Gradually raise the priority of the heartbeat to @@ -129,7 +129,7 @@ static void heartbeat(struct work_struct *wrk) attr.priority = I915_PRIORITY_BARRIER; local_bh_disable(); - engine->schedule(rq, &attr); + i915_request_set_priority(rq, attr.priority); local_bh_enable(); } else { if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 9d59de5c559a..175695c73d5f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -453,14 +453,6 @@ struct intel_engine_cs { void (*bond_execute)(struct i915_request *rq, struct dma_fence *signal); - /* - * Call when the priority on a request has changed and it and its - * dependencies may need rescheduling. Note the request itself may - * not be ready to run! - */ - void (*schedule)(struct i915_request *request, - const struct i915_sched_attr *attr); - void (*release)(struct intel_engine_cs *engine); struct intel_engine_execlists execlists; @@ -478,13 +470,14 @@ struct intel_engine_cs { #define I915_ENGINE_USING_CMD_PARSER BIT(0) #define I915_ENGINE_SUPPORTS_STATS BIT(1) -#define I915_ENGINE_HAS_PREEMPTION BIT(2) -#define I915_ENGINE_HAS_SEMAPHORES BIT(3) -#define I915_ENGINE_HAS_TIMESLICES BIT(4) -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5) -#define I915_ENGINE_IS_VIRTUAL BIT(6) -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7) -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8) +#define I915_ENGINE_HAS_SCHEDULER BIT(2) +#define I915_ENGINE_HAS_PREEMPTION BIT(3) +#define I915_ENGINE_HAS_SEMAPHORES BIT(4) +#define I915_ENGINE_HAS_TIMESLICES BIT(5) +#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(6) +#define I915_ENGINE_IS_VIRTUAL BIT(7) +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(8) +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(9) unsigned int flags; /* @@ -567,6 +560,12 @@ intel_engine_supports_stats(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_SUPPORTS_STATS; } +static inline bool +intel_engine_has_scheduler(const struct intel_engine_cs *engine) +{ + return engine->flags & I915_ENGINE_HAS_SCHEDULER; +} + static inline bool intel_engine_has_preemption(const struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 1cbd84eb24e4..64eccdf32a22 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -107,7 +107,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915) for_each_uabi_engine(engine, i915) { /* all engines must agree! */ int i; - if (engine->schedule) + if (intel_engine_has_scheduler(engine)) enabled |= (I915_SCHEDULER_CAP_ENABLED | I915_SCHEDULER_CAP_PRIORITY); else diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 4ddd2099a931..ea449fee8148 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3086,7 +3086,6 @@ static bool can_preempt(struct intel_engine_cs *engine) static void execlists_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = execlists_submit_request; - engine->schedule = i915_schedule; engine->execlists.tasklet.callback = execlists_submission_tasklet; } @@ -3147,6 +3146,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) */ } + engine->flags |= I915_ENGINE_HAS_SCHEDULER; engine->flags |= I915_ENGINE_SUPPORTS_STATS; if (!intel_vgpu_active(engine->i915)) { engine->flags |= I915_ENGINE_HAS_SEMAPHORES; @@ -3659,7 +3659,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.cops = &virtual_context_ops; ve->base.request_alloc = execlists_request_alloc; - ve->base.schedule = i915_schedule; ve->base.submit_request = virtual_submit_request; ve->base.bond_execute = virtual_bond_execute; diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 9304a35384aa..951e2bf867e1 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -268,12 +268,8 @@ static int live_unlite_restore(struct intel_gt *gt, int prio) i915_request_put(rq[0]); if (prio) { - struct i915_sched_attr attr = { - .priority = prio, - }; - /* Alternatively preempt the spinner with ce[1] */ - engine->schedule(rq[1], &attr); + i915_request_set_priority(rq[1], prio); } /* And switch back to ce[0] for good measure */ @@ -873,9 +869,6 @@ release_queue(struct intel_engine_cs *engine, struct i915_vma *vma, int idx, int prio) { - struct i915_sched_attr attr = { - .priority = prio, - }; struct i915_request *rq; u32 *cs; @@ -900,7 +893,7 @@ release_queue(struct intel_engine_cs *engine, i915_request_add(rq); local_bh_disable(); - engine->schedule(rq, &attr); + i915_request_set_priority(rq, prio); local_bh_enable(); /* kick tasklet */ i915_request_put(rq); @@ -1310,7 +1303,6 @@ static int live_timeslice_queue(void *arg) goto err_pin; for_each_engine(engine, gt, id) { - struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX }; struct i915_request *rq, *nop; if (!intel_engine_has_preemption(engine)) @@ -1325,7 +1317,7 @@ static int live_timeslice_queue(void *arg) err = PTR_ERR(rq); goto err_heartbeat; } - engine->schedule(rq, &attr); + i915_request_set_priority(rq, I915_PRIORITY_MAX); err = wait_for_submit(engine, rq, HZ / 2); if (err) { pr_err("%s: Timed out trying to submit semaphores\n", @@ -1806,7 +1798,6 @@ static int live_late_preempt(void *arg) struct i915_gem_context *ctx_hi, *ctx_lo; struct igt_spinner spin_hi, spin_lo; struct intel_engine_cs *engine; - struct i915_sched_attr attr = {}; enum intel_engine_id id; int err = -ENOMEM; @@ -1866,8 +1857,7 @@ static int live_late_preempt(void *arg) goto err_wedged; } - attr.priority = I915_PRIORITY_MAX; - engine->schedule(rq, &attr); + i915_request_set_priority(rq, I915_PRIORITY_MAX); if (!igt_wait_for_spinner(&spin_hi, rq)) { pr_err("High priority context failed to preempt the low priority context\n"); @@ -2412,7 +2402,6 @@ static int live_preempt_cancel(void *arg) static int live_suppress_self_preempt(void *arg) { - struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX }; struct intel_gt *gt = arg; struct intel_engine_cs *engine; struct preempt_client a, b; @@ -2480,7 +2469,7 @@ static int live_suppress_self_preempt(void *arg) i915_request_add(rq_b); GEM_BUG_ON(i915_request_completed(rq_a)); - engine->schedule(rq_a, &attr); + i915_request_set_priority(rq_a, I915_PRIORITY_MAX); igt_spinner_end(&a.spin); if (!igt_wait_for_spinner(&b.spin, rq_b)) { @@ -2545,7 +2534,6 @@ static int live_chain_preempt(void *arg) goto err_client_hi; for_each_engine(engine, gt, id) { - struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX }; struct igt_live_test t; struct i915_request *rq; int ring_size, count, i; @@ -2612,7 +2600,7 @@ static int live_chain_preempt(void *arg) i915_request_get(rq); i915_request_add(rq); - engine->schedule(rq, &attr); + i915_request_set_priority(rq, I915_PRIORITY_MAX); igt_spinner_end(&hi.spin); if (i915_request_wait(rq, 0, HZ / 5) < 0) { @@ -2964,14 +2952,12 @@ static int live_preempt_gang(void *arg) return -EIO; do { - struct i915_sched_attr attr = { .priority = prio++ }; - err = create_gang(engine, &rq); if (err) break; /* Submit each spinner at increasing priority */ - engine->schedule(rq, &attr); + i915_request_set_priority(rq, prio++); } while (prio <= I915_PRIORITY_MAX && !__igt_timeout(end_time, NULL)); pr_debug("%s: Preempt chain of %d requests\n", @@ -3192,9 +3178,6 @@ static int preempt_user(struct intel_engine_cs *engine, struct i915_vma *global, int id) { - struct i915_sched_attr attr = { - .priority = I915_PRIORITY_MAX - }; struct i915_request *rq; int err = 0; u32 *cs; @@ -3219,7 +3202,7 @@ static int preempt_user(struct intel_engine_cs *engine, i915_request_get(rq); i915_request_add(rq); - engine->schedule(rq, &attr); + i915_request_set_priority(rq, I915_PRIORITY_MAX); if (i915_request_wait(rq, 0, HZ / 2) < 0) err = -ETIME; diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index d6ce4075602c..8cad102922e7 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -858,12 +858,11 @@ static int active_engine(void *data) rq[idx] = i915_request_get(new); i915_request_add(new); - if (engine->schedule && arg->flags & TEST_PRIORITY) { - struct i915_sched_attr attr = { - .priority = - i915_prandom_u32_max_state(512, &prng), - }; - engine->schedule(rq[idx], &attr); + if (intel_engine_has_scheduler(engine) && + arg->flags & TEST_PRIORITY) { + int prio = i915_prandom_u32_max_state(512, &prng); + + i915_request_set_priority(rq[idx], prio); } err = active_request_put(old); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 17b551a0c89f..06fe95250ba2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -629,8 +629,6 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; - engine->schedule = i915_schedule; - engine->reset.prepare = guc_reset_prepare; engine->reset.rewind = guc_reset_rewind; engine->reset.cancel = guc_reset_cancel; @@ -645,6 +643,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) } engine->set_default_submission = guc_set_default_submission; + engine->flags |= I915_ENGINE_HAS_SCHEDULER; engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; engine->flags |= I915_ENGINE_HAS_PREEMPTION; diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index a336d6c40d8b..916e74fbab6c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1223,7 +1223,7 @@ __i915_request_await_execution(struct i915_request *to, } /* Couple the dependency tree for PI on this exposed to->fence */ - if (to->engine->schedule) { + if (i915_request_use_scheduler(to)) { err = i915_sched_node_add_dependency(&to->sched, &from->sched, I915_DEPENDENCY_WEAK); @@ -1364,7 +1364,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) return 0; } - if (to->engine->schedule) { + if (i915_request_use_scheduler(to)) { ret = i915_sched_node_add_dependency(&to->sched, &from->sched, I915_DEPENDENCY_EXTERNAL); @@ -1551,7 +1551,7 @@ __i915_request_add_to_timeline(struct i915_request *rq) __i915_sw_fence_await_dma_fence(&rq->submit, &prev->fence, &rq->dmaq); - if (rq->engine->schedule) + if (i915_request_use_scheduler(rq)) __i915_sched_node_add_dependency(&rq->sched, &prev->sched, &rq->dep, @@ -1623,8 +1623,8 @@ void __i915_request_queue(struct i915_request *rq, * decide whether to preempt the entire chain so that it is ready to * run at the earliest possible convenience. */ - if (attr && rq->engine->schedule) - rq->engine->schedule(rq, attr); + if (attr) + i915_request_set_priority(rq, attr->priority); local_bh_disable(); __i915_request_queue_bh(rq); diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index c0bd4cb8786a..9ce074ffc1dd 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -616,4 +616,9 @@ i915_request_active_timeline(const struct i915_request *rq) lockdep_is_held(&rq->engine->active.lock)); } +static inline bool i915_request_use_scheduler(const struct i915_request *rq) +{ + return intel_engine_has_scheduler(rq->engine); +} + #endif /* I915_REQUEST_H */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 85d18037a915..84a55df88687 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -227,10 +227,8 @@ static void kick_submission(struct intel_engine_cs *engine, rcu_read_unlock(); } -static void __i915_schedule(struct i915_sched_node *node, - const struct i915_sched_attr *attr) +static void __i915_schedule(struct i915_sched_node *node, int prio) { - const int prio = max(attr->priority, node->attr.priority); struct intel_engine_cs *engine; struct i915_dependency *dep, *p; struct i915_dependency stack; @@ -244,6 +242,8 @@ static void __i915_schedule(struct i915_sched_node *node, if (node_signaled(node)) return; + prio = max(prio, node->attr.priority); + stack.signaler = node; list_add(&stack.dfs_link, &dfs); @@ -297,7 +297,7 @@ static void __i915_schedule(struct i915_sched_node *node, */ if (node->attr.priority == I915_PRIORITY_INVALID) { GEM_BUG_ON(!list_empty(&node->link)); - node->attr = *attr; + node->attr.priority = prio; if (stack.dfs_link.next == stack.dfs_link.prev) return; @@ -352,10 +352,13 @@ static void __i915_schedule(struct i915_sched_node *node, spin_unlock(&engine->active.lock); } -void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr) +void i915_request_set_priority(struct i915_request *rq, int prio) { + if (!i915_request_use_scheduler(rq)) + return; + spin_lock_irq(&schedule_lock); - __i915_schedule(&rq->sched, attr); + __i915_schedule(&rq->sched, prio); spin_unlock_irq(&schedule_lock); } diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 8c5ed6fe0994..a045be784c67 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -35,8 +35,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, void i915_sched_node_retire(struct i915_sched_node *node); -void i915_schedule(struct i915_request *request, - const struct i915_sched_attr *attr); +void i915_request_set_priority(struct i915_request *request, int prio); struct list_head * i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio); From patchwork Mon Feb 1 08:56:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC98CC4332D for ; Mon, 1 Feb 2021 08:58:04 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4BCE964E33 for ; Mon, 1 Feb 2021 08:58:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4BCE964E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1E0626E45D; Mon, 1 Feb 2021 08:57:41 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id D752D6E4A2 for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757723-1500050 for multiple; Mon, 01 Feb 2021 08:57:17 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:28 +0000 Message-Id: <20210201085715.27435-10-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/57] drm/i915: Restructure priority inheritance X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" In anticipation of wanting to be able to call pi from underneath an engine's active.lock, rework the priority inheritance to primarily work along an engine's priority queue, delegating any other engine that the chain may traverse to a worker. This reduces the global spinlock from governing the entire multi-engine priority inheritance depth-first search, to a smaller lock on each engine around a single list on that engine. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 + .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 3 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 + drivers/gpu/drm/i915/i915_scheduler.c | 358 +++++++++++------- drivers/gpu/drm/i915/i915_scheduler.h | 3 + drivers/gpu/drm/i915/i915_scheduler_types.h | 23 +- 6 files changed, 250 insertions(+), 142 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 44b6e8e646ed..e55e57b6edf6 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -575,6 +575,8 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine) execlists->queue_priority_hint = INT_MIN; execlists->queue = RB_ROOT_CACHED; + + i915_sched_init_ipi(&execlists->ipi); } static void cleanup_status_page(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 0b026cde9f09..48a91c0dbad6 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -114,8 +114,7 @@ static void heartbeat(struct work_struct *wrk) * but all other contexts, including the kernel * context are stuck waiting for the signal. */ - } else if (intel_engine_has_scheduler(engine) && - rq->sched.attr.priority < I915_PRIORITY_BARRIER) { + } else if (rq->sched.attr.priority < I915_PRIORITY_BARRIER) { /* * Gradually raise the priority of the heartbeat to * give high priority work [which presumably desires diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 175695c73d5f..71ceaa5dcf40 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -20,6 +20,7 @@ #include "i915_gem.h" #include "i915_pmu.h" #include "i915_priolist_types.h" +#include "i915_scheduler_types.h" #include "i915_selftest.h" #include "intel_breadcrumbs_types.h" #include "intel_sseu.h" @@ -257,6 +258,8 @@ struct intel_engine_execlists { struct rb_root_cached queue; struct rb_root_cached virtual; + struct i915_sched_ipi ipi; + /** * @csb_write: control register for Context Switch buffer * diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 84a55df88687..ec9da9109dc3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -17,8 +17,6 @@ static struct i915_global_scheduler { struct kmem_cache *slab_priorities; } global; -static DEFINE_SPINLOCK(schedule_lock); - static struct i915_sched_node *node_get(struct i915_sched_node *node) { i915_request_get(container_of(node, struct i915_request, sched)); @@ -30,17 +28,124 @@ static void node_put(struct i915_sched_node *node) i915_request_put(container_of(node, struct i915_request, sched)); } +static inline int rq_prio(const struct i915_request *rq) +{ + return READ_ONCE(rq->sched.attr.priority); +} + +static int ipi_get_prio(struct i915_request *rq) +{ + if (READ_ONCE(rq->sched.ipi_priority) == I915_PRIORITY_INVALID) + return I915_PRIORITY_INVALID; + + return xchg(&rq->sched.ipi_priority, I915_PRIORITY_INVALID); +} + +static void ipi_schedule(struct work_struct *wrk) +{ + struct i915_sched_ipi *ipi = container_of(wrk, typeof(*ipi), work); + struct i915_request *rq = xchg(&ipi->list, NULL); + + do { + struct i915_request *rn = xchg(&rq->sched.ipi_link, NULL); + int prio; + + prio = ipi_get_prio(rq); + + /* + * For cross-engine scheduling to work we rely on one of two + * things: + * + * a) The requests are using dma-fence fences and so will not + * be scheduled until the previous engine is completed, and + * so we cannot cross back onto the original engine and end up + * queuing an earlier request after the first (due to the + * interrupted DFS). + * + * b) The requests are using semaphores and so may be already + * be in flight, in which case if we cross back onto the same + * engine, we will already have put the interrupted DFS into + * the priolist, and the continuation will now be queued + * afterwards [out-of-order]. However, since we are using + * semaphores in this case, we also perform yield on semaphore + * waits and so will reorder the requests back into the correct + * sequence. This occurrence (of promoting a request chain + * that crosses the engines using semaphores back unto itself) + * should be unlikely enough that it probably does not matter... + */ + local_bh_disable(); + i915_request_set_priority(rq, prio); + local_bh_enable(); + + i915_request_put(rq); + rq = ptr_mask_bits(rn, 1); + } while (rq); +} + +void i915_sched_init_ipi(struct i915_sched_ipi *ipi) +{ + INIT_WORK(&ipi->work, ipi_schedule); + ipi->list = NULL; +} + +static void __ipi_add(struct i915_request *rq) +{ +#define STUB ((struct i915_request *)1) + struct intel_engine_cs *engine = READ_ONCE(rq->engine); + struct i915_request *first; + + if (!i915_request_get_rcu(rq)) + return; + + /* + * We only want to add the request once into the ipi.list (or else + * the chain will be broken). The worker must be guaranteed to run + * at least once for every call to ipi_add, but it is allowed to + * coalesce multiple ipi_add into a single pass using the final + * property value. + */ + if (__i915_request_is_complete(rq) || + cmpxchg(&rq->sched.ipi_link, NULL, STUB)) { /* already queued */ + i915_request_put(rq); + return; + } + + /* Carefully insert ourselves into the head of the llist */ + first = READ_ONCE(engine->execlists.ipi.list); + do { + rq->sched.ipi_link = ptr_pack_bits(first, 1, 1); + } while (!try_cmpxchg(&engine->execlists.ipi.list, &first, rq)); + + if (!first) + queue_work(system_unbound_wq, &engine->execlists.ipi.work); +} + +/* + * Virtual engines complicate acquiring the engine timeline lock, + * as their rq->engine pointer is not stable until under that + * engine lock. The simple ploy we use is to take the lock then + * check that the rq still belongs to the newly locked engine. + */ +#define lock_engine_irqsave(rq, flags) ({ \ + struct i915_request * const rq__ = (rq); \ + struct intel_engine_cs *engine__ = READ_ONCE(rq__->engine); \ +\ + spin_lock_irqsave(&engine__->active.lock, (flags)); \ + while (engine__ != READ_ONCE((rq__)->engine)) { \ + spin_unlock(&engine__->active.lock); \ + engine__ = READ_ONCE(rq__->engine); \ + spin_lock(&engine__->active.lock); \ + } \ +\ + engine__; \ +}) + static const struct i915_request * node_to_request(const struct i915_sched_node *node) { return container_of(node, const struct i915_request, sched); } -static inline bool node_started(const struct i915_sched_node *node) -{ - return i915_request_started(node_to_request(node)); -} - static inline bool node_signaled(const struct i915_sched_node *node) { return i915_request_completed(node_to_request(node)); @@ -137,42 +242,6 @@ void __i915_priolist_free(struct i915_priolist *p) kmem_cache_free(global.slab_priorities, p); } -struct sched_cache { - struct list_head *priolist; -}; - -static struct intel_engine_cs * -sched_lock_engine(const struct i915_sched_node *node, - struct intel_engine_cs *locked, - struct sched_cache *cache) -{ - const struct i915_request *rq = node_to_request(node); - struct intel_engine_cs *engine; - - GEM_BUG_ON(!locked); - - /* - * Virtual engines complicate acquiring the engine timeline lock, - * as their rq->engine pointer is not stable until under that - * engine lock. The simple ploy we use is to take the lock then - * check that the rq still belongs to the newly locked engine. - */ - while (locked != (engine = READ_ONCE(rq->engine))) { - spin_unlock(&locked->active.lock); - memset(cache, 0, sizeof(*cache)); - spin_lock(&engine->active.lock); - locked = engine; - } - - GEM_BUG_ON(locked != engine); - return locked; -} - -static inline int rq_prio(const struct i915_request *rq) -{ - return rq->sched.attr.priority; -} - static inline bool need_preempt(int prio, int active) { /* @@ -198,19 +267,17 @@ static void kick_submission(struct intel_engine_cs *engine, if (prio <= engine->execlists.queue_priority_hint) return; - rcu_read_lock(); - /* Nothing currently active? We're overdue for a submission! */ inflight = execlists_active(&engine->execlists); if (!inflight) - goto unlock; + return; /* * If we are already the currently executing context, don't * bother evaluating if we should preempt ourselves. */ if (inflight->context == rq->context) - goto unlock; + return; ENGINE_TRACE(engine, "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n", @@ -222,30 +289,28 @@ static void kick_submission(struct intel_engine_cs *engine, engine->execlists.queue_priority_hint = prio; if (need_preempt(prio, rq_prio(inflight))) tasklet_hi_schedule(&engine->execlists.tasklet); - -unlock: - rcu_read_unlock(); } -static void __i915_schedule(struct i915_sched_node *node, int prio) +static void ipi_priority(struct i915_request *rq, int prio) { - struct intel_engine_cs *engine; - struct i915_dependency *dep, *p; - struct i915_dependency stack; - struct sched_cache cache; + int old = READ_ONCE(rq->sched.ipi_priority); + + do { + if (prio <= old) + return; + } while (!try_cmpxchg(&rq->sched.ipi_priority, &old, prio)); + + __ipi_add(rq); +} + +static void __i915_request_set_priority(struct i915_request *rq, int prio) +{ + struct intel_engine_cs *engine = rq->engine; + struct i915_request *rn; + struct list_head *plist; LIST_HEAD(dfs); - /* Needed in order to use the temporary link inside i915_dependency */ - lockdep_assert_held(&schedule_lock); - GEM_BUG_ON(prio == I915_PRIORITY_INVALID); - - if (node_signaled(node)) - return; - - prio = max(prio, node->attr.priority); - - stack.signaler = node; - list_add(&stack.dfs_link, &dfs); + list_add(&rq->sched.dfs, &dfs); /* * Recursively bump all dependent priorities to match the new request. @@ -265,66 +330,41 @@ static void __i915_schedule(struct i915_sched_node *node, int prio) * end result is a topological list of requests in reverse order, the * last element in the list is the request we must execute first. */ - list_for_each_entry(dep, &dfs, dfs_link) { - struct i915_sched_node *node = dep->signaler; + list_for_each_entry(rq, &dfs, sched.dfs) { + struct i915_dependency *p; - /* If we are already flying, we know we have no signalers */ - if (node_started(node)) - continue; + /* Also release any children on this engine that are ready */ + GEM_BUG_ON(rq->engine != engine); - /* - * Within an engine, there can be no cycle, but we may - * refer to the same dependency chain multiple times - * (redundant dependencies are not eliminated) and across - * engines. - */ - list_for_each_entry(p, &node->signalers_list, signal_link) { - GEM_BUG_ON(p == dep); /* no cycles! */ + for_each_signaler(p, rq) { + struct i915_request *s = + container_of(p->signaler, typeof(*s), sched); - if (node_signaled(p->signaler)) + GEM_BUG_ON(s == rq); + + if (rq_prio(s) >= prio) continue; - if (prio > READ_ONCE(p->signaler->attr.priority)) - list_move_tail(&p->dfs_link, &dfs); + if (__i915_request_is_complete(s)) + continue; + + if (s->engine != rq->engine) { + ipi_priority(s, prio); + continue; + } + + list_move_tail(&s->sched.dfs, &dfs); } } - /* - * If we didn't need to bump any existing priorities, and we haven't - * yet submitted this request (i.e. there is no potential race with - * execlists_submit_request()), we can set our own priority and skip - * acquiring the engine locks. - */ - if (node->attr.priority == I915_PRIORITY_INVALID) { - GEM_BUG_ON(!list_empty(&node->link)); - node->attr.priority = prio; + plist = i915_sched_lookup_priolist(engine, prio); - if (stack.dfs_link.next == stack.dfs_link.prev) - return; + /* Fifo and depth-first replacement ensure our deps execute first */ + list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) { + GEM_BUG_ON(rq->engine != engine); - __list_del_entry(&stack.dfs_link); - } - - memset(&cache, 0, sizeof(cache)); - engine = node_to_request(node)->engine; - spin_lock(&engine->active.lock); - - /* Fifo and depth-first replacement ensure our deps execute before us */ - engine = sched_lock_engine(node, engine, &cache); - list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) { - INIT_LIST_HEAD(&dep->dfs_link); - - node = dep->signaler; - engine = sched_lock_engine(node, engine, &cache); - lockdep_assert_held(&engine->active.lock); - - /* Recheck after acquiring the engine->timeline.lock */ - if (prio <= node->attr.priority || node_signaled(node)) - continue; - - GEM_BUG_ON(node_to_request(node)->engine != engine); - - WRITE_ONCE(node->attr.priority, prio); + INIT_LIST_HEAD(&rq->sched.dfs); + WRITE_ONCE(rq->sched.attr.priority, prio); /* * Once the request is ready, it will be placed into the @@ -334,32 +374,79 @@ static void __i915_schedule(struct i915_sched_node *node, int prio) * any preemption required, be dealt with upon submission. * See engine->submit_request() */ - if (list_empty(&node->link)) + if (!i915_request_is_ready(rq)) continue; - if (i915_request_in_priority_queue(node_to_request(node))) { - if (!cache.priolist) - cache.priolist = - i915_sched_lookup_priolist(engine, - prio); - list_move_tail(&node->link, cache.priolist); - } + if (i915_request_in_priority_queue(rq)) + list_move_tail(&rq->sched.link, plist); - /* Defer (tasklet) submission until after all of our updates. */ - kick_submission(engine, node_to_request(node), prio); + /* Defer (tasklet) submission until after all updates. */ + kick_submission(engine, rq, prio); } - - spin_unlock(&engine->active.lock); } +#define all_signalers_checked(p, rq) \ + list_entry_is_head(p, &(rq)->sched.signalers_list, signal_link) + void i915_request_set_priority(struct i915_request *rq, int prio) { - if (!i915_request_use_scheduler(rq)) + struct intel_engine_cs *engine; + unsigned long flags; + + if (prio <= rq_prio(rq)) return; - spin_lock_irq(&schedule_lock); - __i915_schedule(&rq->sched, prio); - spin_unlock_irq(&schedule_lock); + /* + * If we are setting the priority before being submitted, see if we + * can quickly adjust our own priority in-situ and avoid taking + * the contended engine->active.lock. If we need priority inheritance, + * take the slow route. + */ + if (rq_prio(rq) == I915_PRIORITY_INVALID) { + struct i915_dependency *p; + + rcu_read_lock(); + for_each_signaler(p, rq) { + struct i915_request *s = + container_of(p->signaler, typeof(*s), sched); + + if (rq_prio(s) >= prio) + continue; + + if (__i915_request_is_complete(s)) + continue; + + break; + } + rcu_read_unlock(); + + /* Update priority in place if no PI required */ + if (all_signalers_checked(p, rq) && + cmpxchg(&rq->sched.attr.priority, + I915_PRIORITY_INVALID, + prio) == I915_PRIORITY_INVALID) + return; + } + + engine = lock_engine_irqsave(rq, flags); + if (prio <= rq_prio(rq)) + goto unlock; + + if (__i915_request_is_complete(rq)) + goto unlock; + + if (!intel_engine_has_scheduler(engine)) { + rq->sched.attr.priority = prio; + goto unlock; + } + + rcu_read_lock(); + __i915_request_set_priority(rq, prio); + rcu_read_unlock(); + GEM_BUG_ON(rq_prio(rq) != prio); + +unlock: + spin_unlock_irqrestore(&engine->active.lock, flags); } void i915_sched_node_init(struct i915_sched_node *node) @@ -369,6 +456,9 @@ void i915_sched_node_init(struct i915_sched_node *node) INIT_LIST_HEAD(&node->signalers_list); INIT_LIST_HEAD(&node->waiters_list); INIT_LIST_HEAD(&node->link); + INIT_LIST_HEAD(&node->dfs); + + node->ipi_link = NULL; i915_sched_node_reinit(node); } @@ -379,6 +469,9 @@ void i915_sched_node_reinit(struct i915_sched_node *node) node->semaphores = 0; node->flags = 0; + GEM_BUG_ON(node->ipi_link); + node->ipi_priority = I915_PRIORITY_INVALID; + GEM_BUG_ON(!list_empty(&node->signalers_list)); GEM_BUG_ON(!list_empty(&node->waiters_list)); GEM_BUG_ON(!list_empty(&node->link)); @@ -414,7 +507,6 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, spin_lock(&signal->lock); if (!node_signaled(signal)) { - INIT_LIST_HEAD(&dep->dfs_link); dep->signaler = signal; dep->waiter = node_get(node); dep->flags = flags; diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index a045be784c67..2870fa3e089e 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -14,6 +14,7 @@ #include "i915_scheduler_types.h" struct drm_printer; +struct intel_engine_cs; #define priolist_for_each_request(it, plist) \ list_for_each_entry(it, &(plist)->requests, sched.link) @@ -35,6 +36,8 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, void i915_sched_node_retire(struct i915_sched_node *node); +void i915_sched_init_ipi(struct i915_sched_ipi *ipi); + void i915_request_set_priority(struct i915_request *request, int prio); struct list_head * diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 623bf41fcf35..2a5265d9aff1 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -8,13 +8,17 @@ #define _I915_SCHEDULER_TYPES_H_ #include +#include -#include "gt/intel_engine_types.h" #include "i915_priolist_types.h" -struct drm_i915_private; struct i915_request; -struct intel_engine_cs; + +/* Inter-engine scheduling delegation */ +struct i915_sched_ipi { + struct i915_request *list; + struct work_struct work; +}; struct i915_sched_attr { /** @@ -61,13 +65,19 @@ struct i915_sched_attr { */ struct i915_sched_node { spinlock_t lock; /* protect the lists */ + struct list_head signalers_list; /* those before us, we depend upon */ struct list_head waiters_list; /* those after us, they depend upon us */ - struct list_head link; + struct list_head link; /* guarded by engine->active.lock */ + struct list_head dfs; /* guarded by engine->active.lock */ struct i915_sched_attr attr; - unsigned int flags; + unsigned long flags; #define I915_SCHED_HAS_EXTERNAL_CHAIN BIT(0) - intel_engine_mask_t semaphores; + unsigned long semaphores; + + /* handle being scheduled for PI from outside of our active.lock */ + struct i915_request *ipi_link; + int ipi_priority; }; struct i915_dependency { @@ -75,7 +85,6 @@ struct i915_dependency { struct i915_sched_node *waiter; struct list_head signal_link; struct list_head wait_link; - struct list_head dfs_link; struct rcu_head rcu; unsigned long flags; #define I915_DEPENDENCY_ALLOC BIT(0) From patchwork Mon Feb 1 08:56:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A80AFC433E9 for ; Mon, 1 Feb 2021 08:58:16 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3E92764E3F for ; Mon, 1 Feb 2021 08:58:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E92764E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7E4636E50D; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB0516E4FF for ; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757724-1500050 for multiple; Mon, 01 Feb 2021 08:57:17 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:29 +0000 Message-Id: <20210201085715.27435-11-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 11/57] drm/i915/selftests: Measure set-priority duration X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" As a topological sort, we expect it to run in linear graph time, O(V+E). In removing the recursion, it is no longer a DFS but rather a BFS, and performs as O(VE). Let's demonstrate how bad this is with a few examples, and build a few test cases to verify a potential fix. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_scheduler.c | 4 + .../drm/i915/selftests/i915_live_selftests.h | 1 + .../drm/i915/selftests/i915_perf_selftests.h | 1 + .../gpu/drm/i915/selftests/i915_scheduler.c | 672 ++++++++++++++++++ 4 files changed, 678 insertions(+) create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index ec9da9109dc3..a56a812cbf29 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -609,6 +609,10 @@ void i915_request_show_with_schedule(struct drm_printer *m, rcu_read_unlock(); } +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +#include "selftests/i915_scheduler.c" +#endif + static void i915_global_scheduler_shrink(void) { kmem_cache_shrink(global.slab_dependencies); diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h index a92c0e9b7e6b..2200a5baa68e 100644 --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h @@ -26,6 +26,7 @@ selftest(gt_mocs, intel_mocs_live_selftests) selftest(gt_pm, intel_gt_pm_live_selftests) selftest(gt_heartbeat, intel_heartbeat_live_selftests) selftest(requests, i915_request_live_selftests) +selftest(scheduler, i915_scheduler_live_selftests) selftest(active, i915_active_live_selftests) selftest(objects, i915_gem_object_live_selftests) selftest(mman, i915_gem_mman_live_selftests) diff --git a/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h b/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h index c2389f8a257d..137e35283fee 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h @@ -17,5 +17,6 @@ */ selftest(engine_cs, intel_engine_cs_perf_selftests) selftest(request, i915_request_perf_selftests) +selftest(scheduler, i915_scheduler_perf_selftests) selftest(blt, i915_gem_object_blt_perf_selftests) selftest(region, intel_memory_region_perf_selftests) diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c new file mode 100644 index 000000000000..d095fab2ccec --- /dev/null +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -0,0 +1,672 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include "i915_selftest.h" + +#include "gt/intel_context.h" +#include "gt/intel_gpu_commands.h" +#include "gt/selftest_engine_heartbeat.h" +#include "selftests/igt_spinner.h" +#include "selftests/i915_random.h" + +static void scheduling_disable(struct intel_engine_cs *engine) +{ + engine->props.preempt_timeout_ms = 0; + engine->props.timeslice_duration_ms = 0; + + st_engine_heartbeat_disable(engine); +} + +static void scheduling_enable(struct intel_engine_cs *engine) +{ + st_engine_heartbeat_enable(engine); + + engine->props.preempt_timeout_ms = + engine->defaults.preempt_timeout_ms; + engine->props.timeslice_duration_ms = + engine->defaults.timeslice_duration_ms; +} + +static int first_engine(struct drm_i915_private *i915, + int (*chain)(struct intel_engine_cs *engine, + unsigned long param, + bool (*fn)(struct i915_request *rq, + unsigned long v, + unsigned long e)), + unsigned long param, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_engine_cs *engine; + + for_each_uabi_engine(engine, i915) { + if (!intel_engine_has_scheduler(engine)) + continue; + + return chain(engine, param, fn); + } + + return 0; +} + +static int all_engines(struct drm_i915_private *i915, + int (*chain)(struct intel_engine_cs *engine, + unsigned long param, + bool (*fn)(struct i915_request *rq, + unsigned long v, + unsigned long e)), + unsigned long param, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_engine_cs *engine; + int err; + + for_each_uabi_engine(engine, i915) { + if (!intel_engine_has_scheduler(engine)) + continue; + + err = chain(engine, param, fn); + if (err) + return err; + } + + return 0; +} + +static bool check_context_order(struct intel_engine_cs *engine) +{ + u64 last_seqno, last_context; + unsigned long count; + bool result = false; + struct rb_node *rb; + int last_prio; + + /* We expect the execution order to follow ascending fence-context */ + spin_lock_irq(&engine->active.lock); + + count = 0; + last_context = 0; + last_seqno = 0; + last_prio = 0; + for (rb = rb_first_cached(&engine->execlists.queue); rb; rb = rb_next(rb)) { + struct i915_priolist *p = rb_entry(rb, typeof(*p), node); + struct i915_request *rq; + + priolist_for_each_request(rq, p) { + if (rq->fence.context < last_context || + (rq->fence.context == last_context && + rq->fence.seqno < last_seqno)) { + pr_err("[%lu] %llx:%lld [prio:%d] after %llx:%lld [prio:%d]\n", + count, + rq->fence.context, + rq->fence.seqno, + rq_prio(rq), + last_context, + last_seqno, + last_prio); + goto out_unlock; + } + + last_context = rq->fence.context; + last_seqno = rq->fence.seqno; + last_prio = rq_prio(rq); + count++; + } + } + result = true; +out_unlock: + spin_unlock_irq(&engine->active.lock); + + return result; +} + +static int __single_chain(struct intel_engine_cs *engine, unsigned long length, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_context *ce; + struct igt_spinner spin; + struct i915_request *rq; + unsigned long count; + unsigned long min; + int err = 0; + + if (!intel_engine_can_store_dword(engine)) + return 0; + + scheduling_disable(engine); + + if (igt_spinner_init(&spin, engine->gt)) { + err = -ENOMEM; + goto err_heartbeat; + } + + ce = intel_context_create(engine); + if (IS_ERR(ce)) { + err = PTR_ERR(ce); + goto err_spin; + } + ce->ring = __intel_context_ring_size(SZ_512K); + + rq = igt_spinner_create_request(&spin, ce, MI_NOOP); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto err_context; + } + i915_request_add(rq); + min = ce->ring->size - ce->ring->space; + + count = 1; + while (count < length && ce->ring->space > min) { + rq = intel_context_create_request(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + break; + } + i915_request_add(rq); + count++; + } + intel_engine_flush_submission(engine); + + execlists_active_lock_bh(&engine->execlists); + if (fn(rq, count, count - 1) && !check_context_order(engine)) + err = -EINVAL; + execlists_active_unlock_bh(&engine->execlists); + + igt_spinner_end(&spin); +err_context: + intel_context_put(ce); +err_spin: + igt_spinner_fini(&spin); +err_heartbeat: + scheduling_enable(engine); + return err; +} + +static int __wide_chain(struct intel_engine_cs *engine, unsigned long width, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_context **ce; + struct i915_request **rq; + struct igt_spinner spin; + unsigned long count; + unsigned long i, j; + int err = 0; + + if (!intel_engine_can_store_dword(engine)) + return 0; + + scheduling_disable(engine); + + if (igt_spinner_init(&spin, engine->gt)) { + err = -ENOMEM; + goto err_heartbeat; + } + + ce = kmalloc_array(width, sizeof(*ce), GFP_KERNEL); + if (!ce) { + err = -ENOMEM; + goto err_spin; + } + + for (i = 0; i < width; i++) { + ce[i] = intel_context_create(engine); + if (IS_ERR(ce[i])) { + err = PTR_ERR(ce[i]); + width = i; + goto err_context; + } + } + + rq = kmalloc_array(width, sizeof(*rq), GFP_KERNEL); + if (!rq) { + err = -ENOMEM; + goto err_context; + } + + rq[0] = igt_spinner_create_request(&spin, ce[0], MI_NOOP); + if (IS_ERR(rq[0])) { + err = PTR_ERR(rq[0]); + goto err_free; + } + i915_request_add(rq[0]); + + count = 0; + for (i = 1; i < width; i++) { + GEM_BUG_ON(i915_request_completed(rq[0])); + + rq[i] = intel_context_create_request(ce[i]); + if (IS_ERR(rq[i])) { + err = PTR_ERR(rq[i]); + break; + } + for (j = 0; j < i; j++) { + err = i915_request_await_dma_fence(rq[i], + &rq[j]->fence); + if (err) + break; + count++; + } + i915_request_add(rq[i]); + } + intel_engine_flush_submission(engine); + + execlists_active_lock_bh(&engine->execlists); + if (fn(rq[i - 1], i, count) && !check_context_order(engine)) + err = -EINVAL; + execlists_active_unlock_bh(&engine->execlists); + + igt_spinner_end(&spin); +err_free: + kfree(rq); +err_context: + for (i = 0; i < width; i++) + intel_context_put(ce[i]); + kfree(ce); +err_spin: + igt_spinner_fini(&spin); +err_heartbeat: + scheduling_enable(engine); + return err; +} + +static int __inv_chain(struct intel_engine_cs *engine, unsigned long width, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_context **ce; + struct i915_request **rq; + struct igt_spinner spin; + unsigned long count; + unsigned long i, j; + int err = 0; + + if (!intel_engine_can_store_dword(engine)) + return 0; + + scheduling_disable(engine); + + if (igt_spinner_init(&spin, engine->gt)) { + err = -ENOMEM; + goto err_heartbeat; + } + + ce = kmalloc_array(width, sizeof(*ce), GFP_KERNEL); + if (!ce) { + err = -ENOMEM; + goto err_spin; + } + + for (i = 0; i < width; i++) { + ce[i] = intel_context_create(engine); + if (IS_ERR(ce[i])) { + err = PTR_ERR(ce[i]); + width = i; + goto err_context; + } + } + + rq = kmalloc_array(width, sizeof(*rq), GFP_KERNEL); + if (!rq) { + err = -ENOMEM; + goto err_context; + } + + rq[0] = igt_spinner_create_request(&spin, ce[0], MI_NOOP); + if (IS_ERR(rq[0])) { + err = PTR_ERR(rq[0]); + goto err_free; + } + i915_request_add(rq[0]); + + count = 0; + for (i = 1; i < width; i++) { + GEM_BUG_ON(i915_request_completed(rq[0])); + + rq[i] = intel_context_create_request(ce[i]); + if (IS_ERR(rq[i])) { + err = PTR_ERR(rq[i]); + break; + } + for (j = i; j > 0; j--) { + err = i915_request_await_dma_fence(rq[i], + &rq[j - 1]->fence); + if (err) + break; + count++; + } + i915_request_add(rq[i]); + } + intel_engine_flush_submission(engine); + + execlists_active_lock_bh(&engine->execlists); + if (fn(rq[i - 1], i, count) && !check_context_order(engine)) + err = -EINVAL; + execlists_active_unlock_bh(&engine->execlists); + + igt_spinner_end(&spin); +err_free: + kfree(rq); +err_context: + for (i = 0; i < width; i++) + intel_context_put(ce[i]); + kfree(ce); +err_spin: + igt_spinner_fini(&spin); +err_heartbeat: + scheduling_enable(engine); + return err; +} + +static int __sparse_chain(struct intel_engine_cs *engine, unsigned long width, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_context **ce; + struct i915_request **rq; + struct igt_spinner spin; + I915_RND_STATE(prng); + unsigned long count; + unsigned long i, j; + int err = 0; + + if (!intel_engine_can_store_dword(engine)) + return 0; + + scheduling_disable(engine); + + if (igt_spinner_init(&spin, engine->gt)) { + err = -ENOMEM; + goto err_heartbeat; + } + + ce = kmalloc_array(width, sizeof(*ce), GFP_KERNEL); + if (!ce) { + err = -ENOMEM; + goto err_spin; + } + + for (i = 0; i < width; i++) { + ce[i] = intel_context_create(engine); + if (IS_ERR(ce[i])) { + err = PTR_ERR(ce[i]); + width = i; + goto err_context; + } + } + + rq = kmalloc_array(width, sizeof(*rq), GFP_KERNEL); + if (!rq) { + err = -ENOMEM; + goto err_context; + } + + rq[0] = igt_spinner_create_request(&spin, ce[0], MI_NOOP); + if (IS_ERR(rq[0])) { + err = PTR_ERR(rq[0]); + goto err_free; + } + i915_request_add(rq[0]); + + count = 0; + for (i = 1; i < width; i++) { + GEM_BUG_ON(i915_request_completed(rq[0])); + + rq[i] = intel_context_create_request(ce[i]); + if (IS_ERR(rq[i])) { + err = PTR_ERR(rq[i]); + break; + } + + if (err == 0 && i > 1) { + j = i915_prandom_u32_max_state(i - 1, &prng); + err = i915_request_await_dma_fence(rq[i], + &rq[j]->fence); + count++; + } + + if (err == 0) { + err = i915_request_await_dma_fence(rq[i], + &rq[i - 1]->fence); + count++; + } + + if (err == 0 && i > 2) { + j = i915_prandom_u32_max_state(i - 2, &prng); + err = i915_request_await_dma_fence(rq[i], + &rq[j]->fence); + count++; + } + + i915_request_add(rq[i]); + if (err) + break; + } + intel_engine_flush_submission(engine); + + execlists_active_lock_bh(&engine->execlists); + if (fn(rq[i - 1], i, count) && !check_context_order(engine)) + err = -EINVAL; + execlists_active_unlock_bh(&engine->execlists); + + igt_spinner_end(&spin); +err_free: + kfree(rq); +err_context: + for (i = 0; i < width; i++) + intel_context_put(ce[i]); + kfree(ce); +err_spin: + igt_spinner_fini(&spin); +err_heartbeat: + scheduling_enable(engine); + return err; +} + +static int igt_schedule_chains(struct drm_i915_private *i915, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + static int (* const chains[])(struct intel_engine_cs *engine, + unsigned long length, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) = { + __single_chain, + __wide_chain, + __inv_chain, + __sparse_chain, + }; + int n, err; + + for (n = 0; n < ARRAY_SIZE(chains); n++) { + err = all_engines(i915, chains[n], 17, fn); + if (err) + return err; + } + + return 0; +} + +static bool igt_priority(struct i915_request *rq, + unsigned long v, unsigned long e) +{ + i915_request_set_priority(rq, I915_PRIORITY_BARRIER); + GEM_BUG_ON(rq_prio(rq) != I915_PRIORITY_BARRIER); + return true; +} + +static int igt_priority_chains(void *arg) +{ + return igt_schedule_chains(arg, igt_priority); +} + +int i915_scheduler_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(igt_priority_chains), + }; + + return i915_subtests(tests, i915); +} + +static int chains(struct drm_i915_private *i915, + int (*chain)(struct drm_i915_private *i915, + unsigned long length, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)), + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + unsigned long x[] = { 1, 4, 16, 64, 128, 256, 512, 1024, 4096 }; + int i, err; + + for (i = 0; i < ARRAY_SIZE(x); i++) { + IGT_TIMEOUT(end_time); + + err = chain(i915, x[i], fn); + if (err) + return err; + + if (__igt_timeout(end_time, NULL)) + break; + } + + return 0; +} + +static int single_chain(struct drm_i915_private *i915, + unsigned long length, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return first_engine(i915, __single_chain, length, fn); +} + +static int single(struct drm_i915_private *i915, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return chains(i915, single_chain, fn); +} + +static int wide_chain(struct drm_i915_private *i915, + unsigned long width, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return first_engine(i915, __wide_chain, width, fn); +} + +static int wide(struct drm_i915_private *i915, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return chains(i915, wide_chain, fn); +} + +static int inv_chain(struct drm_i915_private *i915, + unsigned long width, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return first_engine(i915, __inv_chain, width, fn); +} + +static int inv(struct drm_i915_private *i915, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return chains(i915, inv_chain, fn); +} + +static int sparse_chain(struct drm_i915_private *i915, + unsigned long width, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return first_engine(i915, __sparse_chain, width, fn); +} + +static int sparse(struct drm_i915_private *i915, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + return chains(i915, sparse_chain, fn); +} + +static void report(const char *what, unsigned long v, unsigned long e, u64 dt) +{ + pr_info("(%4lu, %7lu), %s:%10lluns\n", v, e, what, dt); +} + +static u64 __set_priority(struct i915_request *rq, int prio) +{ + u64 dt; + + preempt_disable(); + dt = ktime_get_raw_fast_ns(); + i915_request_set_priority(rq, prio); + dt = ktime_get_raw_fast_ns() - dt; + preempt_enable(); + + return dt; +} + +static bool set_priority(struct i915_request *rq, + unsigned long v, unsigned long e) +{ + report("set-priority", v, e, __set_priority(rq, I915_PRIORITY_BARRIER)); + return true; +} + +static int single_priority(void *arg) +{ + return single(arg, set_priority); +} + +static int wide_priority(void *arg) +{ + return wide(arg, set_priority); +} + +static int inv_priority(void *arg) +{ + return inv(arg, set_priority); +} + +static int sparse_priority(void *arg) +{ + return sparse(arg, set_priority); +} + +int i915_scheduler_perf_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(single_priority), + SUBTEST(wide_priority), + SUBTEST(inv_priority), + SUBTEST(sparse_priority), + }; + static const struct { + const char *name; + size_t sz; + } types[] = { +#define T(t) { #t, sizeof(struct t) } + T(i915_priolist), + T(i915_sched_attr), + T(i915_sched_node), + T(i915_dependency), +#undef T + {} + }; + typeof(*types) *t; + + for (t = types; t->name; t++) + pr_info("sizeof(%s): %zd\n", t->name, t->sz); + + return i915_subtests(tests, i915); +} From patchwork Mon Feb 1 08:56:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDC83C433DB for ; Mon, 1 Feb 2021 08:58:14 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5BAFA64E41 for ; Mon, 1 Feb 2021 08:58:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BAFA64E41 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0F06D6E4F8; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id B1B996E4E6 for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757725-1500050 for multiple; Mon, 01 Feb 2021 08:57:17 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:30 +0000 Message-Id: <20210201085715.27435-12-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 12/57] drm/i915/selftests: Exercise priority inheritance around an engine loop X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Exercise rescheduling priority inheritance around a sequence of requests that wrap around all the engines. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/selftests/i915_scheduler.c | 225 ++++++++++++++++++ 1 file changed, 225 insertions(+) diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index d095fab2ccec..acc666f755d7 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -7,6 +7,7 @@ #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" +#include "gt/intel_ring.h" #include "gt/selftest_engine_heartbeat.h" #include "selftests/igt_spinner.h" #include "selftests/i915_random.h" @@ -504,10 +505,234 @@ static int igt_priority_chains(void *arg) return igt_schedule_chains(arg, igt_priority); } +static struct i915_request * +__write_timestamp(struct intel_engine_cs *engine, + struct drm_i915_gem_object *obj, + int slot, + struct i915_request *prev) +{ + struct i915_request *rq = ERR_PTR(-EINVAL); + bool use_64b = INTEL_GEN(engine->i915) >= 8; + struct intel_context *ce; + struct i915_vma *vma; + int err = 0; + u32 *cs; + + ce = intel_context_create(engine); + if (IS_ERR(ce)) + return ERR_CAST(ce); + + vma = i915_vma_instance(obj, ce->vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto out_ce; + } + + err = i915_vma_pin(vma, 0, 0, PIN_USER); + if (err) + goto out_ce; + + rq = intel_context_create_request(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out_unpin; + } + + i915_vma_lock(vma); + err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE); + i915_vma_unlock(vma); + if (err) + goto out_request; + + if (prev) { + err = i915_request_await_dma_fence(rq, &prev->fence); + if (err) + goto out_request; + } + + if (engine->emit_init_breadcrumb) { + err = engine->emit_init_breadcrumb(rq); + if (err) + goto out_request; + } + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) { + err = PTR_ERR(cs); + goto out_request; + } + + *cs++ = MI_STORE_REGISTER_MEM + use_64b; + *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(engine->mmio_base)); + *cs++ = lower_32_bits(vma->node.start) + sizeof(u32) * slot; + *cs++ = upper_32_bits(vma->node.start); + intel_ring_advance(rq, cs); + + i915_request_get(rq); +out_request: + i915_request_add(rq); +out_unpin: + i915_vma_unpin(vma); +out_ce: + intel_context_put(ce); + i915_request_put(prev); + return err ? ERR_PTR(err) : rq; +} + +static struct i915_request *create_spinner(struct drm_i915_private *i915, + struct igt_spinner *spin) +{ + struct intel_engine_cs *engine; + + for_each_uabi_engine(engine, i915) { + struct intel_context *ce; + struct i915_request *rq; + + if (igt_spinner_init(spin, engine->gt)) + return ERR_PTR(-ENOMEM); + + ce = intel_context_create(engine); + if (IS_ERR(ce)) + return ERR_CAST(ce); + + rq = igt_spinner_create_request(spin, ce, MI_NOOP); + intel_context_put(ce); + if (rq == ERR_PTR(-ENODEV)) + continue; + if (IS_ERR(rq)) + return rq; + + i915_request_get(rq); + i915_request_add(rq); + return rq; + } + + return ERR_PTR(-ENODEV); +} + +static bool has_timestamp(const struct drm_i915_private *i915) +{ + return INTEL_GEN(i915) >= 7; +} + +static int __igt_schedule_cycle(struct drm_i915_private *i915, + bool (*fn)(struct i915_request *rq, + unsigned long v, unsigned long e)) +{ + struct intel_engine_cs *engine; + struct drm_i915_gem_object *obj; + struct igt_spinner spin; + struct i915_request *rq; + unsigned long count, n; + u32 *time, last; + int err; + + /* + * Queue a bunch of ordered requests (each waiting on the previous) + * around the engines a couple of times. Each request will write + * the timestamp it executes at into the scratch, with the expectation + * that the timestamp will be in our desired execution order. + */ + + if (!i915->caps.scheduler || !has_timestamp(i915)) + return 0; + + obj = i915_gem_object_create_internal(i915, SZ_64K); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + time = i915_gem_object_pin_map(obj, I915_MAP_WC); + if (IS_ERR(time)) { + err = PTR_ERR(time); + goto out_obj; + } + + rq = create_spinner(i915, &spin); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out_obj; + } + + err = 0; + count = 0; + for_each_uabi_engine(engine, i915) { + if (!intel_engine_has_scheduler(engine)) + continue; + + rq = __write_timestamp(engine, obj, count, rq); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + break; + } + + count++; + } + for_each_uabi_engine(engine, i915) { + if (!intel_engine_has_scheduler(engine)) + continue; + + rq = __write_timestamp(engine, obj, count, rq); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + break; + } + + count++; + } + GEM_BUG_ON(count * sizeof(u32) > obj->base.size); + if (err || !count) + goto out_spin; + + fn(rq, count + 1, count); + igt_spinner_end(&spin); + + if (i915_request_wait(rq, 0, HZ / 2) < 0) { + err = -ETIME; + goto out_request; + } + + last = time[0]; + for (n = 1; n < count; n++) { + if (i915_seqno_passed(last, time[n])) { + pr_err("Timestamp[%lu] %x before previous %x\n", + n, time[n], last); + err = -EINVAL; + break; + } + last = time[n]; + } + +out_request: + i915_request_put(rq); +out_spin: + igt_spinner_fini(&spin); +out_obj: + i915_gem_object_put(obj); + return err; +} + +static bool noop(struct i915_request *rq, unsigned long v, unsigned long e) +{ + return true; +} + +static int igt_schedule_cycle(void *arg) +{ + return __igt_schedule_cycle(arg, noop); +} + +static int igt_priority_cycle(void *arg) +{ + return __igt_schedule_cycle(arg, igt_priority); +} + int i915_scheduler_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { SUBTEST(igt_priority_chains), + + SUBTEST(igt_schedule_cycle), + SUBTEST(igt_priority_cycle), }; return i915_subtests(tests, i915); From patchwork Mon Feb 1 08:56:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E5EAC4332B for ; Mon, 1 Feb 2021 08:58:19 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E325A64E33 for ; Mon, 1 Feb 2021 08:58:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E325A64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2CA866E566; Mon, 1 Feb 2021 08:57:46 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0AFC46E4A1 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757726-1500050 for multiple; Mon, 01 Feb 2021 08:57:17 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:31 +0000 Message-Id: <20210201085715.27435-13-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 13/57] drm/i915/selftests: Force a rewind if at first we don't succeed X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" live_timeslice_rewind assumes a particular traversal and reordering after the first timeslice yield. However, the outcome can be either (A1, A2, B1) or (A1, B2, A2) depending on the path taken through the dependency graph. So if we do not get the outcome we need at first, give it a priority kick to force a rewind. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/selftest_execlists.c | 21 +++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 951e2bf867e1..68e1398704a4 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -1107,6 +1107,7 @@ static int live_timeslice_rewind(void *arg) struct i915_request *rq[3] = {}; struct intel_context *ce; unsigned long timeslice; + unsigned long timeout; int i, err = 0; u32 *slot; @@ -1173,11 +1174,29 @@ static int live_timeslice_rewind(void *arg) /* ELSP[] = { { A:rq1, A:rq2 }, { B:rq1 } } */ ENGINE_TRACE(engine, "forcing tasklet for rewind\n"); - while (i915_request_is_active(rq[A2])) { /* semaphore yield! */ + i = 0; + timeout = jiffies + HZ; + while (i915_request_is_active(rq[A2]) && + time_before(jiffies, timeout)) { /* semaphore yield! */ /* Wait for the timeslice to kick in */ del_timer(&engine->execlists.timer); tasklet_hi_schedule(&engine->execlists.tasklet); intel_engine_flush_submission(engine); + + /* + * Unfortunately this assumes that during the + * search of the wait tree it sees the requests + * in a particular order. That order is not + * strictly determined and it may pick either + * A2 or B1 to immediately follow A1. + * + * Break the tie with a set-priority. This defeats + * the goal of trying to cause a rewind with a + * timeslice, but alas, a rewind is better than + * none. + */ + if (i++) + i915_request_set_priority(rq[B1], 1); } /* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */ GEM_BUG_ON(!i915_request_is_active(rq[A1])); From patchwork Mon Feb 1 08:56:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA516C4332B for ; Mon, 1 Feb 2021 08:58:05 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6DF1164E33 for ; Mon, 1 Feb 2021 08:58:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6DF1164E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 95D6F6E4A2; Mon, 1 Feb 2021 08:57:41 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id A66446E4A2 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757727-1500050 for multiple; Mon, 01 Feb 2021 08:57:18 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:32 +0000 Message-Id: <20210201085715.27435-14-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 14/57] drm/i915: Improve DFS for priority inheritance X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The core of the scheduling algorithm is that we compute the topological order of the fence DAG. Knowing that we have a DAG, we should be able to use a DFS to compute the topological sort in linear time. However, during the conversion of the recursive algorithm into an iterative one, the memoization of how far we had progressed down a branch was forgotten. The result was that instead of running in linear time, it was running in geometric time and could easily run for a few hundred milliseconds given a wide enough graph, not the microseconds as required. Signed-off-by: Chris Wilson Reviewed-by: Andi Shyti Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_scheduler.c | 58 ++++++++++++--------- drivers/gpu/drm/i915/i915_scheduler_types.h | 6 ++- 2 files changed, 39 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index a56a812cbf29..cea5129560a5 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -242,6 +242,26 @@ void __i915_priolist_free(struct i915_priolist *p) kmem_cache_free(global.slab_priorities, p); } +static struct i915_request * +stack_push(struct i915_request *rq, + struct i915_request *prev, + struct list_head *pos) +{ + prev->sched.dfs.pos = pos; + rq->sched.dfs.prev = prev; + return rq; +} + +static struct i915_request * +stack_pop(struct i915_request *rq, + struct list_head **pos) +{ + rq = rq->sched.dfs.prev; + if (rq) + *pos = rq->sched.dfs.pos; + return rq; +} + static inline bool need_preempt(int prio, int active) { /* @@ -306,11 +326,10 @@ static void ipi_priority(struct i915_request *rq, int prio) static void __i915_request_set_priority(struct i915_request *rq, int prio) { struct intel_engine_cs *engine = rq->engine; - struct i915_request *rn; + struct list_head *pos = &rq->sched.signalers_list; struct list_head *plist; - LIST_HEAD(dfs); - list_add(&rq->sched.dfs, &dfs); + plist = i915_sched_lookup_priolist(engine, prio); /* * Recursively bump all dependent priorities to match the new request. @@ -330,40 +349,31 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) * end result is a topological list of requests in reverse order, the * last element in the list is the request we must execute first. */ - list_for_each_entry(rq, &dfs, sched.dfs) { - struct i915_dependency *p; - - /* Also release any children on this engine that are ready */ - GEM_BUG_ON(rq->engine != engine); - - for_each_signaler(p, rq) { + rq->sched.dfs.prev = NULL; + do { + list_for_each_continue(pos, &rq->sched.signalers_list) { + struct i915_dependency *p = + list_entry(pos, typeof(*p), signal_link); struct i915_request *s = container_of(p->signaler, typeof(*s), sched); - GEM_BUG_ON(s == rq); - if (rq_prio(s) >= prio) continue; if (__i915_request_is_complete(s)) continue; - if (s->engine != rq->engine) { + if (s->engine != engine) { ipi_priority(s, prio); continue; } - list_move_tail(&s->sched.dfs, &dfs); + /* Remember our position along this branch */ + rq = stack_push(s, rq, pos); + pos = &rq->sched.signalers_list; } - } - plist = i915_sched_lookup_priolist(engine, prio); - - /* Fifo and depth-first replacement ensure our deps execute first */ - list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) { - GEM_BUG_ON(rq->engine != engine); - - INIT_LIST_HEAD(&rq->sched.dfs); + RQ_TRACE(rq, "set-priority:%d\n", prio); WRITE_ONCE(rq->sched.attr.priority, prio); /* @@ -377,12 +387,13 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) if (!i915_request_is_ready(rq)) continue; + GEM_BUG_ON(rq->engine != engine); if (i915_request_in_priority_queue(rq)) list_move_tail(&rq->sched.link, plist); /* Defer (tasklet) submission until after all updates. */ kick_submission(engine, rq, prio); - } + } while ((rq = stack_pop(rq, &pos))); } #define all_signalers_checked(p, rq) \ @@ -456,7 +467,6 @@ void i915_sched_node_init(struct i915_sched_node *node) INIT_LIST_HEAD(&node->signalers_list); INIT_LIST_HEAD(&node->waiters_list); INIT_LIST_HEAD(&node->link); - INIT_LIST_HEAD(&node->dfs); node->ipi_link = NULL; diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 2a5265d9aff1..28138c3fcc81 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -69,7 +69,11 @@ struct i915_sched_node { struct list_head signalers_list; /* those before us, we depend upon */ struct list_head waiters_list; /* those after us, they depend upon us */ struct list_head link; /* guarded by engine->active.lock */ - struct list_head dfs; /* guarded by engine->active.lock */ + struct i915_sched_stack { + /* Branch memoization used during depth-first search */ + struct i915_request *prev; + struct list_head *pos; + } dfs; /* guarded by engine->active.lock */ struct i915_sched_attr attr; unsigned long flags; #define I915_SCHED_HAS_EXTERNAL_CHAIN BIT(0) From patchwork Mon Feb 1 08:56:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D52FC433E0 for ; Mon, 1 Feb 2021 08:58:12 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8FB3364E33 for ; Mon, 1 Feb 2021 08:58:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FB3364E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1DAB06E4EA; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 779DE6E507 for ; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757728-1500050 for multiple; Mon, 01 Feb 2021 08:57:18 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:33 +0000 Message-Id: <20210201085715.27435-15-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 15/57] drm/i915: Extract request submission from execlists X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" In the process of preparing to reuse the request submission logic for other backends, lift it out of the execlists backend. It already operates on the common structs, so just a matter of moving and renaming. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../drm/i915/gt/intel_execlists_submission.c | 55 +------------ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 30 +------ drivers/gpu/drm/i915/i915_scheduler.c | 82 +++++++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.h | 2 + 4 files changed, 86 insertions(+), 83 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index ea449fee8148..51044387a8a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2413,59 +2413,6 @@ static void execlists_preempt(struct timer_list *timer) execlists_kick(timer, preempt); } -static void queue_request(struct intel_engine_cs *engine, - struct i915_request *rq) -{ - GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(engine, rq_prio(rq))); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); -} - -static bool submit_queue(struct intel_engine_cs *engine, - const struct i915_request *rq) -{ - struct intel_engine_execlists *execlists = &engine->execlists; - - if (rq_prio(rq) <= execlists->queue_priority_hint) - return false; - - execlists->queue_priority_hint = rq_prio(rq); - return true; -} - -static bool ancestor_on_hold(const struct intel_engine_cs *engine, - const struct i915_request *rq) -{ - GEM_BUG_ON(i915_request_on_hold(rq)); - return !list_empty(&engine->active.hold) && hold_request(rq); -} - -static void execlists_submit_request(struct i915_request *request) -{ - struct intel_engine_cs *engine = request->engine; - unsigned long flags; - - /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); - - if (unlikely(ancestor_on_hold(engine, request))) { - RQ_TRACE(request, "ancestor on hold\n"); - list_add_tail(&request->sched.link, &engine->active.hold); - i915_request_set_hold(request); - } else { - queue_request(engine, request); - - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - GEM_BUG_ON(list_empty(&request->sched.link)); - - if (submit_queue(engine, request)) - __execlists_kick(&engine->execlists); - } - - spin_unlock_irqrestore(&engine->active.lock, flags); -} - static int execlists_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr) @@ -3085,7 +3032,7 @@ static bool can_preempt(struct intel_engine_cs *engine) static void execlists_set_default_submission(struct intel_engine_cs *engine) { - engine->submit_request = execlists_submit_request; + engine->submit_request = i915_request_enqueue; engine->execlists.tasklet.callback = execlists_submission_tasklet; } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 06fe95250ba2..dc33e5751776 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -511,34 +511,6 @@ static int guc_request_alloc(struct i915_request *request) return 0; } -static inline void queue_request(struct intel_engine_cs *engine, - struct i915_request *rq, - int prio) -{ - GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(engine, prio)); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); -} - -static void guc_submit_request(struct i915_request *rq) -{ - struct intel_engine_cs *engine = rq->engine; - unsigned long flags; - - /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); - - queue_request(engine, rq, rq_prio(rq)); - - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - GEM_BUG_ON(list_empty(&rq->sched.link)); - - tasklet_hi_schedule(&engine->execlists.tasklet); - - spin_unlock_irqrestore(&engine->active.lock, flags); -} - static void sanitize_hwsp(struct intel_engine_cs *engine) { struct intel_timeline *tl; @@ -607,7 +579,7 @@ static int guc_resume(struct intel_engine_cs *engine) static void guc_set_default_submission(struct intel_engine_cs *engine) { - engine->submit_request = guc_submit_request; + engine->submit_request = i915_request_enqueue; } static void guc_release(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index cea5129560a5..5e99f9779309 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -460,6 +460,88 @@ void i915_request_set_priority(struct i915_request *rq, int prio) spin_unlock_irqrestore(&engine->active.lock, flags); } +static void queue_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + GEM_BUG_ON(!list_empty(&rq->sched.link)); + list_add_tail(&rq->sched.link, + i915_sched_lookup_priolist(engine, rq_prio(rq))); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); +} + +static bool submit_queue(struct intel_engine_cs *engine, + const struct i915_request *rq) +{ + struct intel_engine_execlists *execlists = &engine->execlists; + + if (rq_prio(rq) <= execlists->queue_priority_hint) + return false; + + execlists->queue_priority_hint = rq_prio(rq); + return true; +} + +static bool hold_request(const struct i915_request *rq) +{ + struct i915_dependency *p; + bool result = false; + + /* + * If one of our ancestors is on hold, we must also be put on hold, + * otherwise we will bypass it and execute before it. + */ + rcu_read_lock(); + for_each_signaler(p, rq) { + const struct i915_request *s = + container_of(p->signaler, typeof(*s), sched); + + if (s->engine != rq->engine) + continue; + + result = i915_request_on_hold(s); + if (result) + break; + } + rcu_read_unlock(); + + return result; +} + +static bool ancestor_on_hold(const struct intel_engine_cs *engine, + const struct i915_request *rq) +{ + GEM_BUG_ON(i915_request_on_hold(rq)); + return unlikely(!list_empty(&engine->active.hold)) && hold_request(rq); +} + +void i915_request_enqueue(struct i915_request *rq) +{ + struct intel_engine_cs *engine = rq->engine; + unsigned long flags; + bool kick = false; + + /* Will be called from irq-context when using foreign fences. */ + spin_lock_irqsave(&engine->active.lock, flags); + GEM_BUG_ON(test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); + + if (unlikely(ancestor_on_hold(engine, rq))) { + RQ_TRACE(rq, "ancestor on hold\n"); + list_add_tail(&rq->sched.link, &engine->active.hold); + i915_request_set_hold(rq); + } else { + queue_request(engine, rq); + + GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + + kick = submit_queue(engine, rq); + } + + GEM_BUG_ON(list_empty(&rq->sched.link)); + spin_unlock_irqrestore(&engine->active.lock, flags); + if (kick) + tasklet_hi_schedule(&engine->execlists.tasklet); +} + void i915_sched_node_init(struct i915_sched_node *node) { spin_lock_init(&node->lock); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 2870fa3e089e..89d998f226e0 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -40,6 +40,8 @@ void i915_sched_init_ipi(struct i915_sched_ipi *ipi); void i915_request_set_priority(struct i915_request *request, int prio); +void i915_request_enqueue(struct i915_request *request); + struct list_head * i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio); From patchwork Mon Feb 1 08:56:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A57FFC43381 for ; Mon, 1 Feb 2021 08:58:07 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 40C1864E3F for ; Mon, 1 Feb 2021 08:58:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40C1864E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 833FC6E49B; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2F2066E4C9 for ; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757729-1500050 for multiple; Mon, 01 Feb 2021 08:57:18 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:34 +0000 Message-Id: <20210201085715.27435-16-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 16/57] drm/i915: Extract request rewinding from execlists X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" In the process of preparing to reuse the request submission logic for other backends, lift it out of the execlists backend. While this operates on the common structs, we do have a bit of backend knowledge, which is harmless for !lrc but still unsightly. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 3 - .../drm/i915/gt/intel_execlists_submission.c | 58 ++----------------- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 3 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +- drivers/gpu/drm/i915/i915_scheduler.c | 44 ++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.h | 3 + 7 files changed, 56 insertions(+), 60 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 8d9184920c51..cc2df80eb449 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -137,9 +137,6 @@ execlists_active_unlock_bh(struct intel_engine_execlists *execlists) local_bh_enable(); /* restore softirq, and kick ksoftirqd! */ } -struct i915_request * -execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists); - static inline u32 intel_read_status_page(const struct intel_engine_cs *engine, int reg) { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 51044387a8a2..b6dea80da533 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -359,56 +359,6 @@ assert_priority_queue(const struct i915_request *prev, return rq_prio(prev) >= rq_prio(next); } -static struct i915_request * -__unwind_incomplete_requests(struct intel_engine_cs *engine) -{ - struct i915_request *rq, *rn, *active = NULL; - struct list_head *pl; - int prio = I915_PRIORITY_INVALID; - - lockdep_assert_held(&engine->active.lock); - - list_for_each_entry_safe_reverse(rq, rn, - &engine->active.requests, - sched.link) { - if (__i915_request_is_complete(rq)) { - list_del_init(&rq->sched.link); - continue; - } - - __i915_request_unsubmit(rq); - - GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); - if (rq_prio(rq) != prio) { - prio = rq_prio(rq); - pl = i915_sched_lookup_priolist(engine, prio); - } - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - - list_move(&rq->sched.link, pl); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - - /* Check in case we rollback so far we wrap [size/2] */ - if (intel_ring_direction(rq->ring, - rq->tail, - rq->ring->tail + 8) > 0) - rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; - - active = rq; - } - - return active; -} - -struct i915_request * -execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists) -{ - struct intel_engine_cs *engine = - container_of(execlists, typeof(*engine), execlists); - - return __unwind_incomplete_requests(engine); -} - static void execlists_context_status_change(struct i915_request *rq, unsigned long status) { @@ -1080,7 +1030,7 @@ static void defer_active(struct intel_engine_cs *engine) { struct i915_request *rq; - rq = __unwind_incomplete_requests(engine); + rq = __i915_sched_rewind_requests(engine); if (!rq) return; @@ -1292,7 +1242,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * the preemption, some of the unwound requests may * complete! */ - __unwind_incomplete_requests(engine); + __i915_sched_rewind_requests(engine); last = NULL; } else if (timeslice_expired(engine, last)) { @@ -2287,7 +2237,7 @@ static void execlists_capture(struct intel_engine_cs *engine) * which we return it to the queue for signaling. * * By removing them from the execlists queue, we also remove the - * requests from being processed by __unwind_incomplete_requests() + * requests from being processed by __intel_engine_rewind_requests() * during the intel_engine_reset(), and so they will *not* be replayed * afterwards. * @@ -2878,7 +2828,7 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) /* Push back any incomplete requests for replay after the reset. */ rcu_read_lock(); spin_lock_irqsave(&engine->active.lock, flags); - __unwind_incomplete_requests(engine); + __i915_sched_rewind_requests(engine); spin_unlock_irqrestore(&engine->active.lock, flags); rcu_read_unlock(); } diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..364656bedec7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -92,4 +92,7 @@ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) +#define CTX_DESC_RELOAD_PD BIT_ULL(1) +#define CTX_DESC_FORCE_RESTORE BIT_ULL(2) + #endif /* _INTEL_LRC_REG_H_ */ diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 68e1398704a4..73340a96548f 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -4601,7 +4601,7 @@ static int reset_virtual_engine(struct intel_gt *gt, /* Fake a preemption event; failed of course */ spin_lock_irq(&engine->active.lock); - __unwind_incomplete_requests(engine); + __i915_sched_rewind_requests(engine); spin_unlock_irq(&engine->active.lock); GEM_BUG_ON(rq->engine != engine); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index dc33e5751776..aecede4f0943 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -306,14 +306,13 @@ static void guc_reset_state(struct intel_context *ce, static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) { - struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_request *rq; unsigned long flags; spin_lock_irqsave(&engine->active.lock, flags); /* Push back any incomplete requests for replay after the reset. */ - rq = execlists_unwind_incomplete_requests(execlists); + rq = __i915_sched_rewind_requests(engine); if (!rq) goto out_unlock; diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 5e99f9779309..9fcfbf303ce0 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -6,6 +6,9 @@ #include +#include "gt/intel_ring.h" +#include "gt/intel_lrc_reg.h" + #include "i915_drv.h" #include "i915_globals.h" #include "i915_request.h" @@ -542,6 +545,47 @@ void i915_request_enqueue(struct i915_request *rq) tasklet_hi_schedule(&engine->execlists.tasklet); } +struct i915_request * +__i915_sched_rewind_requests(struct intel_engine_cs *engine) +{ + struct i915_request *rq, *rn, *active = NULL; + struct list_head *pl; + int prio = I915_PRIORITY_INVALID; + + lockdep_assert_held(&engine->active.lock); + + list_for_each_entry_safe_reverse(rq, rn, + &engine->active.requests, + sched.link) { + if (__i915_request_is_complete(rq)) { + list_del_init(&rq->sched.link); + continue; + } + + __i915_request_unsubmit(rq); + + GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); + if (rq_prio(rq) != prio) { + prio = rq_prio(rq); + pl = i915_sched_lookup_priolist(engine, prio); + } + GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + + list_move(&rq->sched.link, pl); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + + /* Check in case we rollback so far we wrap [size/2] */ + if (intel_ring_direction(rq->ring, + rq->tail, + rq->ring->tail + 8) > 0) + rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; + + active = rq; + } + + return active; +} + void i915_sched_node_init(struct i915_sched_node *node) { spin_lock_init(&node->lock); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 89d998f226e0..d3984f65b3a6 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -42,6 +42,9 @@ void i915_request_set_priority(struct i915_request *request, int prio); void i915_request_enqueue(struct i915_request *request); +struct i915_request * +__i915_sched_rewind_requests(struct intel_engine_cs *engine); + struct list_head * i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio); From patchwork Mon Feb 1 08:56:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40346C433DB for ; Mon, 1 Feb 2021 08:58:21 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CE73264E33 for ; Mon, 1 Feb 2021 08:58:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE73264E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A315E6E53E; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 44C8F6E506 for ; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757730-1500050 for multiple; Mon, 01 Feb 2021 08:57:18 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:35 +0000 Message-Id: <20210201085715.27435-17-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 17/57] drm/i915: Extract request suspension from the execlists X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Make the ability to suspend and resume a request and its dependents generic. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../drm/i915/gt/intel_execlists_submission.c | 167 +----------------- drivers/gpu/drm/i915/gt/selftest_execlists.c | 8 +- drivers/gpu/drm/i915/i915_scheduler.c | 153 ++++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.h | 10 ++ 4 files changed, 169 insertions(+), 169 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index b6dea80da533..853021314786 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1921,169 +1921,6 @@ static void post_process_csb(struct i915_request **port, execlists_schedule_out(*port++); } -static void __execlists_hold(struct i915_request *rq) -{ - LIST_HEAD(list); - - do { - struct i915_dependency *p; - - if (i915_request_is_active(rq)) - __i915_request_unsubmit(rq); - - clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - list_move_tail(&rq->sched.link, &rq->engine->active.hold); - i915_request_set_hold(rq); - RQ_TRACE(rq, "on hold\n"); - - for_each_waiter(p, rq) { - struct i915_request *w = - container_of(p->waiter, typeof(*w), sched); - - if (p->flags & I915_DEPENDENCY_WEAK) - continue; - - /* Leave semaphores spinning on the other engines */ - if (w->engine != rq->engine) - continue; - - if (!i915_request_is_ready(w)) - continue; - - if (__i915_request_is_complete(w)) - continue; - - if (i915_request_on_hold(w)) - continue; - - list_move_tail(&w->sched.link, &list); - } - - rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); - } while (rq); -} - -static bool execlists_hold(struct intel_engine_cs *engine, - struct i915_request *rq) -{ - if (i915_request_on_hold(rq)) - return false; - - spin_lock_irq(&engine->active.lock); - - if (__i915_request_is_complete(rq)) { /* too late! */ - rq = NULL; - goto unlock; - } - - /* - * Transfer this request onto the hold queue to prevent it - * being resumbitted to HW (and potentially completed) before we have - * released it. Since we may have already submitted following - * requests, we need to remove those as well. - */ - GEM_BUG_ON(i915_request_on_hold(rq)); - GEM_BUG_ON(rq->engine != engine); - __execlists_hold(rq); - GEM_BUG_ON(list_empty(&engine->active.hold)); - -unlock: - spin_unlock_irq(&engine->active.lock); - return rq; -} - -static bool hold_request(const struct i915_request *rq) -{ - struct i915_dependency *p; - bool result = false; - - /* - * If one of our ancestors is on hold, we must also be on hold, - * otherwise we will bypass it and execute before it. - */ - rcu_read_lock(); - for_each_signaler(p, rq) { - const struct i915_request *s = - container_of(p->signaler, typeof(*s), sched); - - if (s->engine != rq->engine) - continue; - - result = i915_request_on_hold(s); - if (result) - break; - } - rcu_read_unlock(); - - return result; -} - -static void __execlists_unhold(struct i915_request *rq) -{ - LIST_HEAD(list); - - do { - struct i915_dependency *p; - - RQ_TRACE(rq, "hold release\n"); - - GEM_BUG_ON(!i915_request_on_hold(rq)); - GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); - - i915_request_clear_hold(rq); - list_move_tail(&rq->sched.link, - i915_sched_lookup_priolist(rq->engine, - rq_prio(rq))); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - - /* Also release any children on this engine that are ready */ - for_each_waiter(p, rq) { - struct i915_request *w = - container_of(p->waiter, typeof(*w), sched); - - if (p->flags & I915_DEPENDENCY_WEAK) - continue; - - /* Propagate any change in error status */ - if (rq->fence.error) - i915_request_set_error_once(w, rq->fence.error); - - if (w->engine != rq->engine) - continue; - - if (!i915_request_on_hold(w)) - continue; - - /* Check that no other parents are also on hold */ - if (hold_request(w)) - continue; - - list_move_tail(&w->sched.link, &list); - } - - rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); - } while (rq); -} - -static void execlists_unhold(struct intel_engine_cs *engine, - struct i915_request *rq) -{ - spin_lock_irq(&engine->active.lock); - - /* - * Move this request back to the priority queue, and all of its - * children and grandchildren that were suspended along with it. - */ - __execlists_unhold(rq); - - if (rq_prio(rq) > engine->execlists.queue_priority_hint) { - engine->execlists.queue_priority_hint = rq_prio(rq); - tasklet_hi_schedule(&engine->execlists.tasklet); - } - - spin_unlock_irq(&engine->active.lock); -} - struct execlists_capture { struct work_struct work; struct i915_request *rq; @@ -2116,7 +1953,7 @@ static void execlists_capture_work(struct work_struct *work) i915_gpu_coredump_put(cap->error); /* Return this request and all that depend upon it for signaling */ - execlists_unhold(engine, cap->rq); + i915_sched_resume_request(engine, cap->rq); i915_request_put(cap->rq); kfree(cap); @@ -2250,7 +2087,7 @@ static void execlists_capture(struct intel_engine_cs *engine) * simply hold that request accountable for being non-preemptible * long enough to force the reset. */ - if (!execlists_hold(engine, cap->rq)) + if (!i915_sched_suspend_request(engine, cap->rq)) goto err_rq; INIT_WORK(&cap->work, execlists_capture_work); diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 73340a96548f..64f6a49a5c22 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -608,7 +608,7 @@ static int live_hold_reset(void *arg) GEM_BUG_ON(execlists_active(&engine->execlists) != rq); i915_request_get(rq); - execlists_hold(engine, rq); + i915_sched_suspend_request(engine, rq); GEM_BUG_ON(!i915_request_on_hold(rq)); __intel_engine_reset_bh(engine, NULL); @@ -630,7 +630,7 @@ static int live_hold_reset(void *arg) GEM_BUG_ON(!i915_request_on_hold(rq)); /* But is resubmitted on release */ - execlists_unhold(engine, rq); + i915_sched_resume_request(engine, rq); if (i915_request_wait(rq, 0, HZ / 5) < 0) { pr_err("%s: held request did not complete!\n", engine->name); @@ -4606,7 +4606,7 @@ static int reset_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(rq->engine != engine); /* Reset the engine while keeping our active request on hold */ - execlists_hold(engine, rq); + i915_sched_suspend_request(engine, rq); GEM_BUG_ON(!i915_request_on_hold(rq)); __intel_engine_reset_bh(engine, NULL); @@ -4629,7 +4629,7 @@ static int reset_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(!i915_request_on_hold(rq)); /* But is resubmitted on release */ - execlists_unhold(engine, rq); + i915_sched_resume_request(engine, rq); if (i915_request_wait(rq, 0, HZ / 5) < 0) { pr_err("%s: held request did not complete!\n", engine->name); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 9fcfbf303ce0..351c0c0055b5 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -586,6 +586,159 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) return active; } +bool __i915_sched_suspend_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + LIST_HEAD(list); + + lockdep_assert_held(&engine->active.lock); + GEM_BUG_ON(rq->engine != engine); + + if (__i915_request_is_complete(rq)) /* too late! */ + return false; + + if (i915_request_on_hold(rq)) + return false; + + ENGINE_TRACE(engine, "suspending request %llx:%lld\n", + rq->fence.context, rq->fence.seqno); + + /* + * Transfer this request onto the hold queue to prevent it + * being resumbitted to HW (and potentially completed) before we have + * released it. Since we may have already submitted following + * requests, we need to remove those as well. + */ + do { + struct i915_dependency *p; + + if (i915_request_is_active(rq)) + __i915_request_unsubmit(rq); + + list_move_tail(&rq->sched.link, &engine->active.hold); + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + i915_request_set_hold(rq); + RQ_TRACE(rq, "on hold\n"); + + for_each_waiter(p, rq) { + struct i915_request *w = + container_of(p->waiter, typeof(*w), sched); + + if (p->flags & I915_DEPENDENCY_WEAK) + continue; + + /* Leave semaphores spinning on the other engines */ + if (w->engine != engine) + continue; + + if (!i915_request_is_ready(w)) + continue; + + if (__i915_request_is_complete(w)) + continue; + + if (i915_request_on_hold(w)) /* acts as a visited bit */ + continue; + + list_move_tail(&w->sched.link, &list); + } + + rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); + } while (rq); + + GEM_BUG_ON(list_empty(&engine->active.hold)); + + return true; +} + +bool i915_sched_suspend_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + bool result; + + if (i915_request_on_hold(rq)) + return false; + + spin_lock_irq(&engine->active.lock); + result = __i915_sched_suspend_request(engine, rq); + spin_unlock_irq(&engine->active.lock); + + return result; +} + +void __i915_sched_resume_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + LIST_HEAD(list); + + lockdep_assert_held(&engine->active.lock); + + if (rq_prio(rq) > engine->execlists.queue_priority_hint) { + engine->execlists.queue_priority_hint = rq_prio(rq); + tasklet_hi_schedule(&engine->execlists.tasklet); + } + + if (!i915_request_on_hold(rq)) + return; + + ENGINE_TRACE(engine, "resuming request %llx:%lld\n", + rq->fence.context, rq->fence.seqno); + + /* + * Move this request back to the priority queue, and all of its + * children and grandchildren that were suspended along with it. + */ + do { + struct i915_dependency *p; + + RQ_TRACE(rq, "hold release\n"); + + GEM_BUG_ON(!i915_request_on_hold(rq)); + GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); + + i915_request_clear_hold(rq); + list_del_init(&rq->sched.link); + + queue_request(engine, rq); + + /* Also release any children on this engine that are ready */ + for_each_waiter(p, rq) { + struct i915_request *w = + container_of(p->waiter, typeof(*w), sched); + + if (p->flags & I915_DEPENDENCY_WEAK) + continue; + + /* Propagate any change in error status */ + if (rq->fence.error) + i915_request_set_error_once(w, rq->fence.error); + + if (w->engine != engine) + continue; + + /* We also treat the on-hold status as a visited bit */ + if (!i915_request_on_hold(w)) + continue; + + /* Check that no other parents are also on hold [BFS] */ + if (hold_request(w)) + continue; + + list_move_tail(&w->sched.link, &list); + } + + rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); + } while (rq); +} + +void i915_sched_resume_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + spin_lock_irq(&engine->active.lock); + __i915_sched_resume_request(engine, rq); + spin_unlock_irq(&engine->active.lock); +} + void i915_sched_node_init(struct i915_sched_node *node) { spin_lock_init(&node->lock); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index d3984f65b3a6..9860459fedb1 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -45,6 +45,16 @@ void i915_request_enqueue(struct i915_request *request); struct i915_request * __i915_sched_rewind_requests(struct intel_engine_cs *engine); +bool __i915_sched_suspend_request(struct intel_engine_cs *engine, + struct i915_request *rq); +void __i915_sched_resume_request(struct intel_engine_cs *engine, + struct i915_request *request); + +bool i915_sched_suspend_request(struct intel_engine_cs *engine, + struct i915_request *request); +void i915_sched_resume_request(struct intel_engine_cs *engine, + struct i915_request *rq); + struct list_head * i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio); From patchwork Mon Feb 1 08:56:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95F38C433E9 for ; Mon, 1 Feb 2021 08:58:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4C47964E40 for ; Mon, 1 Feb 2021 08:58:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C47964E40 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 39B6A6E506; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 06C3B6E48C for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757731-1500050 for multiple; Mon, 01 Feb 2021 08:57:18 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:36 +0000 Message-Id: <20210201085715.27435-18-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 18/57] drm/i915: Extract the ability to defer and rerun a request later X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Lift the ability to defer a request until later from execlists into the common layer. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../drm/i915/gt/intel_execlists_submission.c | 57 +++-------------- drivers/gpu/drm/i915/i915_scheduler.c | 63 +++++++++++++++++-- drivers/gpu/drm/i915/i915_scheduler.h | 5 +- 3 files changed, 67 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 853021314786..b56e321ef003 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -978,54 +978,6 @@ static void virtual_xfer_context(struct virtual_engine *ve, } } -static void defer_request(struct i915_request *rq, struct list_head * const pl) -{ - LIST_HEAD(list); - - /* - * We want to move the interrupted request to the back of - * the round-robin list (i.e. its priority level), but - * in doing so, we must then move all requests that were in - * flight and were waiting for the interrupted request to - * be run after it again. - */ - do { - struct i915_dependency *p; - - GEM_BUG_ON(i915_request_is_active(rq)); - list_move_tail(&rq->sched.link, pl); - - for_each_waiter(p, rq) { - struct i915_request *w = - container_of(p->waiter, typeof(*w), sched); - - if (p->flags & I915_DEPENDENCY_WEAK) - continue; - - /* Leave semaphores spinning on the other engines */ - if (w->engine != rq->engine) - continue; - - /* No waiter should start before its signaler */ - GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && - __i915_request_has_started(w) && - !__i915_request_is_complete(rq)); - - if (!i915_request_is_ready(w)) - continue; - - if (rq_prio(w) < rq_prio(rq)) - continue; - - GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); - GEM_BUG_ON(i915_request_is_active(w)); - list_move_tail(&w->sched.link, &list); - } - - rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); - } while (rq); -} - static void defer_active(struct intel_engine_cs *engine) { struct i915_request *rq; @@ -1034,7 +986,14 @@ static void defer_active(struct intel_engine_cs *engine) if (!rq) return; - defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); + /* + * We want to move the interrupted request to the back of + * the round-robin list (i.e. its priority level), but + * in doing so, we must then move all requests that were in + * flight and were waiting for the interrupted request to + * be run after it again. + */ + __i915_sched_defer_request(engine, rq); } static bool diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 351c0c0055b5..bfd37ee801fd 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -179,8 +179,8 @@ static void assert_priolists(struct intel_engine_execlists * const execlists) } } -struct list_head * -i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio) +static struct list_head * +lookup_priolist(struct intel_engine_cs *engine, int prio) { struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_priolist *p; @@ -332,7 +332,7 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) struct list_head *pos = &rq->sched.signalers_list; struct list_head *plist; - plist = i915_sched_lookup_priolist(engine, prio); + plist = lookup_priolist(engine, prio); /* * Recursively bump all dependent priorities to match the new request. @@ -463,12 +463,63 @@ void i915_request_set_priority(struct i915_request *rq, int prio) spin_unlock_irqrestore(&engine->active.lock, flags); } +void __i915_sched_defer_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + struct list_head *pl; + LIST_HEAD(list); + + lockdep_assert_held(&engine->active.lock); + GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); + + /* + * When we defer a request, we must maintain its order with respect + * to those that are waiting upon it. So we traverse its chain of + * waiters and move any that are earlier than the request to after it. + */ + pl = lookup_priolist(engine, rq_prio(rq)); + do { + struct i915_dependency *p; + + GEM_BUG_ON(i915_request_is_active(rq)); + list_move_tail(&rq->sched.link, pl); + + for_each_waiter(p, rq) { + struct i915_request *w = + container_of(p->waiter, typeof(*w), sched); + + if (p->flags & I915_DEPENDENCY_WEAK) + continue; + + /* Leave semaphores spinning on the other engines */ + if (w->engine != engine) + continue; + + /* No waiter should start before its signaler */ + GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && + __i915_request_has_started(w) && + !__i915_request_is_complete(rq)); + + if (!i915_request_is_ready(w)) + continue; + + if (rq_prio(w) < rq_prio(rq)) + continue; + + GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); + GEM_BUG_ON(i915_request_is_active(w)); + list_move_tail(&w->sched.link, &list); + } + + rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); + } while (rq); +} + static void queue_request(struct intel_engine_cs *engine, struct i915_request *rq) { GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(engine, rq_prio(rq))); + list_add_tail(&rq->sched.link, lookup_priolist(engine, rq_prio(rq))); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } @@ -567,7 +618,7 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); if (rq_prio(rq) != prio) { prio = rq_prio(rq); - pl = i915_sched_lookup_priolist(engine, prio); + pl = lookup_priolist(engine, prio); } GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 9860459fedb1..00ce0a9d519d 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -44,6 +44,8 @@ void i915_request_enqueue(struct i915_request *request); struct i915_request * __i915_sched_rewind_requests(struct intel_engine_cs *engine); +void __i915_sched_defer_request(struct intel_engine_cs *engine, + struct i915_request *request); bool __i915_sched_suspend_request(struct intel_engine_cs *engine, struct i915_request *rq); @@ -55,9 +57,6 @@ bool i915_sched_suspend_request(struct intel_engine_cs *engine, void i915_sched_resume_request(struct intel_engine_cs *engine, struct i915_request *rq); -struct list_head * -i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio); - void __i915_priolist_free(struct i915_priolist *p); static inline void i915_priolist_free(struct i915_priolist *p) { From patchwork Mon Feb 1 08:56:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 231F4C433E0 for ; Mon, 1 Feb 2021 08:58:17 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C2D9C64E40 for ; Mon, 1 Feb 2021 08:58:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2D9C64E40 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BF1FF6E4EC; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 90D236E4D4 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757732-1500050 for multiple; Mon, 01 Feb 2021 08:57:18 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:37 +0000 Message-Id: <20210201085715.27435-19-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 19/57] drm/i915: Fix the iterative dfs for defering requests X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The current implementation of walking the children of a deferred requests lacks the backtracking required to reduce the dfs to linear. Having pulled it from execlists into the common layer, we can reuse the dfs code for priority inheritance. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_scheduler.c | 56 +++++++++++++++++++-------- 1 file changed, 40 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index bfd37ee801fd..694ca3a3b563 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -466,8 +466,10 @@ void i915_request_set_priority(struct i915_request *rq, int prio) void __i915_sched_defer_request(struct intel_engine_cs *engine, struct i915_request *rq) { - struct list_head *pl; - LIST_HEAD(list); + struct list_head *pos = &rq->sched.waiters_list; + const int prio = rq_prio(rq); + struct i915_request *rn; + LIST_HEAD(dfs); lockdep_assert_held(&engine->active.lock); GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); @@ -477,14 +479,11 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, * to those that are waiting upon it. So we traverse its chain of * waiters and move any that are earlier than the request to after it. */ - pl = lookup_priolist(engine, rq_prio(rq)); + rq->sched.dfs.prev = NULL; do { - struct i915_dependency *p; - - GEM_BUG_ON(i915_request_is_active(rq)); - list_move_tail(&rq->sched.link, pl); - - for_each_waiter(p, rq) { + list_for_each_continue(pos, &rq->sched.waiters_list) { + struct i915_dependency *p = + list_entry(pos, typeof(*p), wait_link); struct i915_request *w = container_of(p->waiter, typeof(*w), sched); @@ -500,19 +499,44 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, __i915_request_has_started(w) && !__i915_request_is_complete(rq)); - if (!i915_request_is_ready(w)) + if (!i915_request_in_priority_queue(w)) continue; - if (rq_prio(w) < rq_prio(rq)) + /* + * We also need to reorder within the same priority. + * + * This is unlike priority-inheritance, where if the + * signaler already has a higher priority [earlier + * deadline] than us, we can ignore as it will be + * scheduled first. If a waiter already has the + * same priority, we still have to push it to the end + * of the list. This unfortunately means we cannot + * use the rq_deadline() itself as a 'visited' bit. + */ + if (rq_prio(w) < prio) continue; - GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); - GEM_BUG_ON(i915_request_is_active(w)); - list_move_tail(&w->sched.link, &list); + GEM_BUG_ON(rq_prio(w) != prio); + + /* Remember our position along this branch */ + rq = stack_push(w, rq, pos); + pos = &rq->sched.waiters_list; } - rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); - } while (rq); + /* Note list is reversed for waiters wrt signal hierarchy */ + GEM_BUG_ON(rq->engine != engine); + GEM_BUG_ON(!i915_request_in_priority_queue(rq)); + list_move(&rq->sched.link, &dfs); + + /* Track our visit, and prevent duplicate processing */ + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + } while ((rq = stack_pop(rq, &pos))); + + pos = lookup_priolist(engine, prio); + list_for_each_entry_safe(rq, rn, &dfs, sched.link) { + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + list_add_tail(&rq->sched.link, pos); + } } static void queue_request(struct intel_engine_cs *engine, From patchwork Mon Feb 1 08:56:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69B70C4332D for ; Mon, 1 Feb 2021 08:58:14 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1DF9D64E3F for ; Mon, 1 Feb 2021 08:58:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DF9D64E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 968326E4BA; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8B4A86E4D2 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757733-1500050 for multiple; Mon, 01 Feb 2021 08:57:19 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:38 +0000 Message-Id: <20210201085715.27435-20-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 20/57] drm/i915: Wrap access to intel_engine.active X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" As we are about to shuffle the lists around to consolidate new control objects, reduce the code movement by wrapping access to the scheduler lists ahead of time. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 17 +++--- drivers/gpu/drm/i915/gt/intel_engine_types.h | 11 +++- .../drm/i915/gt/intel_execlists_submission.c | 58 +++++++++++-------- .../gpu/drm/i915/gt/intel_ring_submission.c | 14 +++-- drivers/gpu/drm/i915/gt/mock_engine.c | 7 ++- drivers/gpu/drm/i915/gt/selftest_execlists.c | 6 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 20 ++++--- drivers/gpu/drm/i915/i915_gpu_error.c | 5 +- drivers/gpu/drm/i915/i915_request.c | 23 +++----- drivers/gpu/drm/i915/i915_request.h | 8 ++- drivers/gpu/drm/i915/i915_scheduler.c | 47 ++++++++------- drivers/gpu/drm/i915/i915_scheduler_types.h | 4 +- .../gpu/drm/i915/selftests/i915_scheduler.c | 19 +++--- 13 files changed, 141 insertions(+), 98 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e55e57b6edf6..a2916c7fcc48 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -725,6 +725,7 @@ struct measure_breadcrumb { static int measure_breadcrumb_dw(struct intel_context *ce) { struct intel_engine_cs *engine = ce->engine; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct measure_breadcrumb *frame; int dw; @@ -747,11 +748,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce) frame->rq.ring = &frame->ring; mutex_lock(&ce->timeline->mutex); - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&se->lock); dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs; - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&se->lock); mutex_unlock(&ce->timeline->mutex); GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */ @@ -1627,6 +1628,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, const char *header, ...) { struct i915_gpu_error * const error = &engine->i915->gpu_error; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq; intel_wakeref_t wakeref; unsigned long flags; @@ -1668,7 +1670,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n"); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); rq = intel_engine_find_active_request(engine); if (rq) { struct intel_timeline *tl = get_timeline(rq); @@ -1699,8 +1701,8 @@ void intel_engine_dump(struct intel_engine_cs *engine, hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); } } - drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->active.hold)); - spin_unlock_irqrestore(&engine->active.lock, flags); + drm_printf(m, "\tOn hold?: %lu\n", list_count(&se->hold)); + spin_unlock_irqrestore(&se->lock, flags); drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base); wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm); @@ -1759,6 +1761,7 @@ static bool match_ring(struct i915_request *rq) struct i915_request * intel_engine_find_active_request(struct intel_engine_cs *engine) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *request, *active = NULL; /* @@ -1772,7 +1775,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine) * At all other times, we must assume the GPU is still running, but * we only care about the snapshot of this moment. */ - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); rcu_read_lock(); request = execlists_active(&engine->execlists); @@ -1790,7 +1793,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine) if (active) return active; - list_for_each_entry(request, &engine->active.requests, sched.link) { + list_for_each_entry(request, &se->requests, sched.link) { if (__i915_request_is_complete(request)) continue; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 71ceaa5dcf40..e5637e831d28 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -329,7 +329,7 @@ struct intel_engine_cs { struct intel_sseu sseu; - struct { + struct i915_sched { spinlock_t lock; struct list_head requests; struct list_head hold; /* ready requests, but on hold */ @@ -621,5 +621,12 @@ intel_engine_has_relative_mmio(const struct intel_engine_cs * const engine) (slice_) += ((subslice_) == 0)) \ for_each_if((instdone_has_slice(dev_priv_, sseu_, slice_)) && \ (instdone_has_subslice(dev_priv_, sseu_, slice_, \ - subslice_))) + subslice_))) + +static inline struct i915_sched * +intel_engine_get_scheduler(struct intel_engine_cs *engine) +{ + return &engine->active; +} + #endif /* __INTEL_ENGINE_TYPES_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index b56e321ef003..280d84c4e4b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -293,6 +293,7 @@ static int virtual_prio(const struct intel_engine_execlists *el) static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { + const struct i915_sched *se = &engine->active; int last_prio; if (!intel_engine_has_semaphores(engine)) @@ -324,7 +325,7 @@ static bool need_preempt(const struct intel_engine_cs *engine, * Check against the first request in ELSP[1], it will, thanks to the * power of PI, be the highest priority of that context. */ - if (!list_is_last(&rq->sched.link, &engine->active.requests) && + if (!list_is_last(&rq->sched.link, &se->requests) && rq_prio(list_next_entry(rq, sched.link)) > last_prio) return true; @@ -476,15 +477,15 @@ static void execlists_schedule_in(struct i915_request *rq, int idx) static void resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) { - struct intel_engine_cs *engine = rq->engine; + struct i915_sched *se = i915_request_get_scheduler(rq); - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&se->lock); clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); WRITE_ONCE(rq->engine, &ve->base); ve->base.submit_request(rq); - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&se->lock); } static void kick_siblings(struct i915_request *rq, struct intel_context *ce) @@ -1018,6 +1019,8 @@ timeslice_yield(const struct intel_engine_execlists *el, static bool needs_timeslice(const struct intel_engine_cs *engine, const struct i915_request *rq) { + const struct i915_sched *se = &engine->active; + if (!intel_engine_has_timeslices(engine)) return false; @@ -1030,7 +1033,7 @@ static bool needs_timeslice(const struct intel_engine_cs *engine, return false; /* If ELSP[1] is occupied, always check to see if worth slicing */ - if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) { + if (!list_is_last_rcu(&rq->sched.link, &se->requests)) { ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); return true; } @@ -1133,6 +1136,7 @@ static bool completed(const struct i915_request *rq) static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; struct i915_request *last, * const *active; @@ -1162,7 +1166,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ - spin_lock(&engine->active.lock); + spin_lock(&se->lock); /* * If the queue is higher priority than the last @@ -1262,7 +1266,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * Even if ELSP[1] is occupied and not worthy * of timeslices, our queue might be. */ - spin_unlock(&engine->active.lock); + spin_unlock(&se->lock); return; } } @@ -1288,7 +1292,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last && !can_merge_rq(last, rq)) { spin_unlock(&ve->base.active.lock); - spin_unlock(&engine->active.lock); + spin_unlock(&se->lock); return; /* leave this for another sibling */ } @@ -1449,7 +1453,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * interrupt for secondary ports). */ execlists->queue_priority_hint = queue_prio(execlists); - spin_unlock(&engine->active.lock); + spin_unlock(&se->lock); /* * We can skip poking the HW if we ended up with exactly the same set @@ -2614,6 +2618,7 @@ static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) { + struct i915_sched *se = intel_engine_get_scheduler(engine); unsigned long flags; ENGINE_TRACE(engine, "\n"); @@ -2623,9 +2628,9 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) /* Push back any incomplete requests for replay after the reset. */ rcu_read_lock(); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); __i915_sched_rewind_requests(engine); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); rcu_read_unlock(); } @@ -2641,6 +2646,7 @@ static void nop_submission_tasklet(struct tasklet_struct *t) static void execlists_reset_cancel(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; @@ -2664,10 +2670,10 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) execlists_reset_csb(engine, true); rcu_read_lock(); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &engine->active.requests, sched.link) + list_for_each_entry(rq, &se->requests, sched.link) i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(engine); @@ -2687,7 +2693,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) } /* On-hold requests will be flushed to timeline upon their release */ - list_for_each_entry(rq, &engine->active.hold, sched.link) + list_for_each_entry(rq, &se->hold, sched.link) i915_request_put(i915_request_mark_eio(rq)); /* Cancel all attached virtual engines */ @@ -2721,7 +2727,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); execlists->tasklet.callback = nop_submission_tasklet; - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); rcu_read_unlock(); } @@ -2958,6 +2964,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) { struct virtual_engine *ve = container_of(wrk, typeof(*ve), rcu.work); + struct i915_sched *se = intel_engine_get_scheduler(&ve->base); unsigned int n; GEM_BUG_ON(ve->context.inflight); @@ -2966,7 +2973,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) if (unlikely(ve->request)) { struct i915_request *old; - spin_lock_irq(&ve->base.active.lock); + spin_lock_irq(&se->lock); old = fetch_and_zero(&ve->request); if (old) { @@ -2975,7 +2982,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) i915_request_put(old); } - spin_unlock_irq(&ve->base.active.lock); + spin_unlock_irq(&se->lock); } /* @@ -3161,6 +3168,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); + struct i915_sched *se = intel_engine_get_scheduler(sibling); struct ve_node * const node = &ve->nodes[sibling->id]; struct rb_node **parent, *rb; bool first; @@ -3168,7 +3176,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) if (!READ_ONCE(ve->request)) break; /* already handled by a sibling's tasklet */ - spin_lock_irq(&sibling->active.lock); + spin_lock_irq(&se->lock); if (unlikely(!(mask & sibling->mask))) { if (!RB_EMPTY_NODE(&node->rb)) { @@ -3221,7 +3229,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) tasklet_hi_schedule(&sibling->execlists.tasklet); unlock_engine: - spin_unlock_irq(&sibling->active.lock); + spin_unlock_irq(&se->lock); if (intel_context_inflight(&ve->context)) break; @@ -3231,6 +3239,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) static void virtual_submit_request(struct i915_request *rq) { struct virtual_engine *ve = to_virtual_engine(rq->engine); + struct i915_sched *se = intel_engine_get_scheduler(&ve->base); unsigned long flags; ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", @@ -3239,7 +3248,7 @@ static void virtual_submit_request(struct i915_request *rq) GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); - spin_lock_irqsave(&ve->base.active.lock, flags); + spin_lock_irqsave(&se->lock, flags); /* By the time we resubmit a request, it may be completed */ if (__i915_request_is_complete(rq)) { @@ -3262,7 +3271,7 @@ static void virtual_submit_request(struct i915_request *rq) tasklet_hi_schedule(&ve->base.execlists.tasklet); unlock: - spin_unlock_irqrestore(&ve->base.active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static struct ve_bond * @@ -3513,16 +3522,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, unsigned int max) { const struct intel_engine_execlists *execlists = &engine->execlists; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *last; unsigned long flags; unsigned int count; struct rb_node *rb; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); last = NULL; count = 0; - list_for_each_entry(rq, &engine->active.requests, sched.link) { + list_for_each_entry(rq, &se->requests, sched.link) { if (count++ < max - 1) show_request(m, rq, "\t\t", 0); else @@ -3585,7 +3595,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, show_request(m, last, "\t\t", 0); } - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 3cb2ce503544..8af1bc77e15e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -322,14 +322,15 @@ static void reset_prepare(struct intel_engine_cs *engine) static void reset_rewind(struct intel_engine_cs *engine, bool stalled) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *pos, *rq; unsigned long flags; u32 head; rq = NULL; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); rcu_read_lock(); - list_for_each_entry(pos, &engine->active.requests, sched.link) { + list_for_each_entry(pos, &se->requests, sched.link) { if (!__i915_request_is_complete(pos)) { rq = pos; break; @@ -384,7 +385,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled) } engine->legacy.ring->head = intel_ring_wrap(engine->legacy.ring, head); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static void reset_finish(struct intel_engine_cs *engine) @@ -393,19 +394,20 @@ static void reset_finish(struct intel_engine_cs *engine) static void reset_cancel(struct intel_engine_cs *engine) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *request; unsigned long flags; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); /* Mark all submitted requests as skipped. */ - list_for_each_entry(request, &engine->active.requests, sched.link) + list_for_each_entry(request, &se->requests, sched.link) i915_request_put(i915_request_mark_eio(request)); intel_engine_signal_breadcrumbs(engine); /* Remaining _unready_ requests will be nop'ed when submitted */ - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static void i9xx_submit_request(struct i915_request *request) diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index cf1269e74998..b4d26d3bf39f 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -230,15 +230,16 @@ static void mock_reset_cancel(struct intel_engine_cs *engine) { struct mock_engine *mock = container_of(engine, typeof(*mock), base); + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq; unsigned long flags; del_timer_sync(&mock->hw_delay); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); /* Mark all submitted requests as skipped. */ - list_for_each_entry(rq, &engine->active.requests, sched.link) + list_for_each_entry(rq, &se->requests, sched.link) i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(engine); @@ -251,7 +252,7 @@ static void mock_reset_cancel(struct intel_engine_cs *engine) } INIT_LIST_HEAD(&mock->hw_queue); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static void mock_reset_finish(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 64f6a49a5c22..0395f9053a43 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -4548,6 +4548,7 @@ static int reset_virtual_engine(struct intel_gt *gt, struct intel_context *ve; struct igt_spinner spin; struct i915_request *rq; + struct i915_sched *se; unsigned int n; int err = 0; @@ -4584,6 +4585,7 @@ static int reset_virtual_engine(struct intel_gt *gt, engine = rq->engine; GEM_BUG_ON(engine == ve->engine); + se = intel_engine_get_scheduler(engine); /* Take ownership of the reset and tasklet */ local_bh_disable(); @@ -4600,9 +4602,9 @@ static int reset_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(execlists_active(&engine->execlists) != rq); /* Fake a preemption event; failed of course */ - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&se->lock); __i915_sched_rewind_requests(engine); - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&se->lock); GEM_BUG_ON(rq->engine != engine); /* Reset the engine while keeping our active request on hold */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index aecede4f0943..45f6d38341ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -181,6 +181,7 @@ static void schedule_out(struct i915_request *rq) static void __guc_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request **first = execlists->inflight; struct i915_request ** const last_port = first + execlists->port_mask; struct i915_request *last = first[0]; @@ -188,7 +189,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine) bool submit = false; struct rb_node *rb; - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); if (last) { if (*++first) @@ -241,11 +242,12 @@ static void guc_submission_tasklet(struct tasklet_struct *t) { struct intel_engine_cs * const engine = from_tasklet(engine, t, execlists.tasklet); + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_request **port, *rq; unsigned long flags; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); for (port = execlists->inflight; (rq = *port); port++) { if (!i915_request_completed(rq)) @@ -261,7 +263,7 @@ static void guc_submission_tasklet(struct tasklet_struct *t) __guc_dequeue(engine); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static void guc_reset_prepare(struct intel_engine_cs *engine) @@ -306,10 +308,11 @@ static void guc_reset_state(struct intel_context *ce, static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq; unsigned long flags; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); /* Push back any incomplete requests for replay after the reset. */ rq = __i915_sched_rewind_requests(engine); @@ -323,12 +326,13 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) guc_reset_state(rq->context, engine, rq->head, stalled); out_unlock: - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static void guc_reset_cancel(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; @@ -349,10 +353,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) * submission's irq state, we also wish to remind ourselves that * it is irq state.) */ - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &engine->active.requests, sched.link) { + list_for_each_entry(rq, &se->requests, sched.link) { i915_request_set_error_once(rq, -EIO); i915_request_mark_complete(rq); } @@ -377,7 +381,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) execlists->queue_priority_hint = INT_MIN; execlists->queue = RB_ROOT_CACHED; - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static void guc_reset_finish(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0cb3686ed91d..f2e4f0232b87 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1433,6 +1433,7 @@ static struct intel_engine_coredump * capture_engine(struct intel_engine_cs *engine, struct i915_vma_compress *compress) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_engine_capture_vma *capture = NULL; struct intel_engine_coredump *ee; struct i915_request *rq; @@ -1442,12 +1443,12 @@ capture_engine(struct intel_engine_cs *engine, if (!ee) return NULL; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); rq = intel_engine_find_active_request(engine); if (rq) capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); if (!capture) { kfree(ee); return NULL; diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 916e74fbab6c..947e4fad7cf0 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -533,12 +533,12 @@ struct i915_request *i915_request_mark_eio(struct i915_request *rq) bool __i915_request_submit(struct i915_request *request) { struct intel_engine_cs *engine = request->engine; + struct i915_sched *se = intel_engine_get_scheduler(engine); bool result = false; RQ_TRACE(request, "\n"); - GEM_BUG_ON(!irqs_disabled()); - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); /* * With the advent of preempt-to-busy, we frequently encounter @@ -595,7 +595,7 @@ bool __i915_request_submit(struct i915_request *request) result = true; GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags)); - list_move_tail(&request->sched.link, &engine->active.requests); + list_move_tail(&request->sched.link, &se->requests); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags); @@ -621,30 +621,25 @@ bool __i915_request_submit(struct i915_request *request) void i915_request_submit(struct i915_request *request) { - struct intel_engine_cs *engine = request->engine; + struct i915_sched *se = i915_request_get_scheduler(request); unsigned long flags; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); __i915_request_submit(request); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } void __i915_request_unsubmit(struct i915_request *request) { - struct intel_engine_cs *engine = request->engine; - /* * Only unwind in reverse order, required so that the per-context list * is kept in seqno/ring order. */ RQ_TRACE(request, "\n"); - GEM_BUG_ON(!irqs_disabled()); - lockdep_assert_held(&engine->active.lock); - /* * Before we remove this breadcrumb from the signal list, we have * to ensure that a concurrent dma_fence_enable_signaling() does not @@ -672,15 +667,15 @@ void __i915_request_unsubmit(struct i915_request *request) void i915_request_unsubmit(struct i915_request *request) { - struct intel_engine_cs *engine = request->engine; + struct i915_sched *se = i915_request_get_scheduler(request); unsigned long flags; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); __i915_request_unsubmit(request); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); } static int __i915_sw_fence_call diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 9ce074ffc1dd..e320edd718f3 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -589,6 +589,12 @@ static inline void i915_request_clear_hold(struct i915_request *rq) clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); } +static inline struct i915_sched * +i915_request_get_scheduler(const struct i915_request *rq) +{ + return intel_engine_get_scheduler(rq->engine); +} + static inline struct intel_timeline * i915_request_timeline(const struct i915_request *rq) { @@ -613,7 +619,7 @@ i915_request_active_timeline(const struct i915_request *rq) * this submission. */ return rcu_dereference_protected(rq->timeline, - lockdep_is_held(&rq->engine->active.lock)); + lockdep_is_held(&i915_request_get_scheduler(rq)->lock)); } static inline bool i915_request_use_scheduler(const struct i915_request *rq) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 694ca3a3b563..663db3c36762 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -183,11 +183,12 @@ static struct list_head * lookup_priolist(struct intel_engine_cs *engine, int prio) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_priolist *p; struct rb_node **parent, *rb; bool first = true; - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); assert_priolists(execlists); if (unlikely(execlists->no_priolist)) @@ -467,11 +468,12 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, struct i915_request *rq) { struct list_head *pos = &rq->sched.waiters_list; + struct i915_sched *se = intel_engine_get_scheduler(engine); const int prio = rq_prio(rq); struct i915_request *rn; LIST_HEAD(dfs); - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); /* @@ -585,26 +587,27 @@ static bool hold_request(const struct i915_request *rq) return result; } -static bool ancestor_on_hold(const struct intel_engine_cs *engine, +static bool ancestor_on_hold(const struct i915_sched *se, const struct i915_request *rq) { GEM_BUG_ON(i915_request_on_hold(rq)); - return unlikely(!list_empty(&engine->active.hold)) && hold_request(rq); + return unlikely(!list_empty(&se->hold)) && hold_request(rq); } void i915_request_enqueue(struct i915_request *rq) { struct intel_engine_cs *engine = rq->engine; + struct i915_sched *se = intel_engine_get_scheduler(engine); unsigned long flags; bool kick = false; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&se->lock, flags); GEM_BUG_ON(test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); - if (unlikely(ancestor_on_hold(engine, rq))) { + if (unlikely(ancestor_on_hold(se, rq))) { RQ_TRACE(rq, "ancestor on hold\n"); - list_add_tail(&rq->sched.link, &engine->active.hold); + list_add_tail(&rq->sched.link, &se->hold); i915_request_set_hold(rq); } else { queue_request(engine, rq); @@ -615,7 +618,7 @@ void i915_request_enqueue(struct i915_request *rq) } GEM_BUG_ON(list_empty(&rq->sched.link)); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&se->lock, flags); if (kick) tasklet_hi_schedule(&engine->execlists.tasklet); } @@ -623,15 +626,14 @@ void i915_request_enqueue(struct i915_request *rq) struct i915_request * __i915_sched_rewind_requests(struct intel_engine_cs *engine) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn, *active = NULL; struct list_head *pl; int prio = I915_PRIORITY_INVALID; - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); - list_for_each_entry_safe_reverse(rq, rn, - &engine->active.requests, - sched.link) { + list_for_each_entry_safe_reverse(rq, rn, &se->requests, sched.link) { if (__i915_request_is_complete(rq)) { list_del_init(&rq->sched.link); continue; @@ -664,9 +666,10 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) bool __i915_sched_suspend_request(struct intel_engine_cs *engine, struct i915_request *rq) { + struct i915_sched *se = intel_engine_get_scheduler(engine); LIST_HEAD(list); - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); GEM_BUG_ON(rq->engine != engine); if (__i915_request_is_complete(rq)) /* too late! */ @@ -690,7 +693,7 @@ bool __i915_sched_suspend_request(struct intel_engine_cs *engine, if (i915_request_is_active(rq)) __i915_request_unsubmit(rq); - list_move_tail(&rq->sched.link, &engine->active.hold); + list_move_tail(&rq->sched.link, &se->hold); clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); i915_request_set_hold(rq); RQ_TRACE(rq, "on hold\n"); @@ -721,7 +724,7 @@ bool __i915_sched_suspend_request(struct intel_engine_cs *engine, rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); } while (rq); - GEM_BUG_ON(list_empty(&engine->active.hold)); + GEM_BUG_ON(list_empty(&se->hold)); return true; } @@ -729,14 +732,15 @@ bool __i915_sched_suspend_request(struct intel_engine_cs *engine, bool i915_sched_suspend_request(struct intel_engine_cs *engine, struct i915_request *rq) { + struct i915_sched *se = intel_engine_get_scheduler(engine); bool result; if (i915_request_on_hold(rq)) return false; - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&se->lock); result = __i915_sched_suspend_request(engine, rq); - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&se->lock); return result; } @@ -744,9 +748,10 @@ bool i915_sched_suspend_request(struct intel_engine_cs *engine, void __i915_sched_resume_request(struct intel_engine_cs *engine, struct i915_request *rq) { + struct i915_sched *se = intel_engine_get_scheduler(engine); LIST_HEAD(list); - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&se->lock); if (rq_prio(rq) > engine->execlists.queue_priority_hint) { engine->execlists.queue_priority_hint = rq_prio(rq); @@ -809,9 +814,11 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, void i915_sched_resume_request(struct intel_engine_cs *engine, struct i915_request *rq) { - spin_lock_irq(&engine->active.lock); + struct i915_sched *se = intel_engine_get_scheduler(engine); + + spin_lock_irq(&se->lock); __i915_sched_resume_request(engine, rq); - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&se->lock); } void i915_sched_node_init(struct i915_sched_node *node) diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 28138c3fcc81..f2b0ac3a05a5 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -68,12 +68,12 @@ struct i915_sched_node { struct list_head signalers_list; /* those before us, we depend upon */ struct list_head waiters_list; /* those after us, they depend upon us */ - struct list_head link; /* guarded by engine->active.lock */ + struct list_head link; /* guarded by i915_sched.lock */ struct i915_sched_stack { /* Branch memoization used during depth-first search */ struct i915_request *prev; struct list_head *pos; - } dfs; /* guarded by engine->active.lock */ + } dfs; /* guarded by i915_sched.lock */ struct i915_sched_attr attr; unsigned long flags; #define I915_SCHED_HAS_EXTERNAL_CHAIN BIT(0) diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index acc666f755d7..35a479184fee 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -77,7 +77,8 @@ static int all_engines(struct drm_i915_private *i915, return 0; } -static bool check_context_order(struct intel_engine_cs *engine) +static bool check_context_order(struct i915_sched *se, + struct intel_engine_cs *engine) { u64 last_seqno, last_context; unsigned long count; @@ -86,7 +87,7 @@ static bool check_context_order(struct intel_engine_cs *engine) int last_prio; /* We expect the execution order to follow ascending fence-context */ - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&se->lock); count = 0; last_context = 0; @@ -119,7 +120,7 @@ static bool check_context_order(struct intel_engine_cs *engine) } result = true; out_unlock: - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&se->lock); return result; } @@ -128,6 +129,7 @@ static int __single_chain(struct intel_engine_cs *engine, unsigned long length, bool (*fn)(struct i915_request *rq, unsigned long v, unsigned long e)) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_context *ce; struct igt_spinner spin; struct i915_request *rq; @@ -173,7 +175,7 @@ static int __single_chain(struct intel_engine_cs *engine, unsigned long length, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq, count, count - 1) && !check_context_order(engine)) + if (fn(rq, count, count - 1) && !check_context_order(se, engine)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); @@ -191,6 +193,7 @@ static int __wide_chain(struct intel_engine_cs *engine, unsigned long width, bool (*fn)(struct i915_request *rq, unsigned long v, unsigned long e)) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_context **ce; struct i915_request **rq; struct igt_spinner spin; @@ -257,7 +260,7 @@ static int __wide_chain(struct intel_engine_cs *engine, unsigned long width, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq[i - 1], i, count) && !check_context_order(engine)) + if (fn(rq[i - 1], i, count) && !check_context_order(se, engine)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); @@ -279,6 +282,7 @@ static int __inv_chain(struct intel_engine_cs *engine, unsigned long width, bool (*fn)(struct i915_request *rq, unsigned long v, unsigned long e)) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_context **ce; struct i915_request **rq; struct igt_spinner spin; @@ -345,7 +349,7 @@ static int __inv_chain(struct intel_engine_cs *engine, unsigned long width, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq[i - 1], i, count) && !check_context_order(engine)) + if (fn(rq[i - 1], i, count) && !check_context_order(se, engine)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); @@ -367,6 +371,7 @@ static int __sparse_chain(struct intel_engine_cs *engine, unsigned long width, bool (*fn)(struct i915_request *rq, unsigned long v, unsigned long e)) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_context **ce; struct i915_request **rq; struct igt_spinner spin; @@ -450,7 +455,7 @@ static int __sparse_chain(struct intel_engine_cs *engine, unsigned long width, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq[i - 1], i, count) && !check_context_order(engine)) + if (fn(rq[i - 1], i, count) && !check_context_order(se, engine)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); From patchwork Mon Feb 1 08:56:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B24EC43381 for ; Mon, 1 Feb 2021 08:58:23 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A014964E3F for ; Mon, 1 Feb 2021 08:58:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A014964E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 186786E554; Mon, 1 Feb 2021 08:57:46 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 812386E4CF for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757734-1500050 for multiple; Mon, 01 Feb 2021 08:57:19 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:39 +0000 Message-Id: <20210201085715.27435-21-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 21/57] drm/i915: Move common active lists from engine to i915_scheduler X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Extract the scheduler lists into a related structure, stop sprawling over struct intel_engine_cs. Also transfer the responsibility of tracing the scheduler events from ENGINE_TRACE() to SCHED_TRACE(). Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 33 ++------ drivers/gpu/drm/i915/gt/intel_engine_types.h | 10 +-- .../drm/i915/gt/intel_execlists_submission.c | 27 ++++--- drivers/gpu/drm/i915/gt/mock_engine.c | 7 +- drivers/gpu/drm/i915/i915_request.c | 8 +- drivers/gpu/drm/i915/i915_request.h | 8 +- drivers/gpu/drm/i915/i915_scheduler.c | 78 ++++++++++++++----- drivers/gpu/drm/i915/i915_scheduler.h | 13 +++- drivers/gpu/drm/i915/i915_scheduler_types.h | 31 +++++++- .../gpu/drm/i915/selftests/i915_scheduler.c | 1 + 11 files changed, 143 insertions(+), 81 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index ecacfae8412d..ca37d93ef5e7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -422,11 +422,11 @@ __active_engine(struct i915_request *rq, struct intel_engine_cs **active) * check that we have acquired the lock on the final engine. */ locked = READ_ONCE(rq->engine); - spin_lock_irq(&locked->active.lock); + spin_lock_irq(&locked->sched.lock); while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { - spin_unlock(&locked->active.lock); + spin_unlock(&locked->sched.lock); locked = engine; - spin_lock(&locked->active.lock); + spin_lock(&locked->sched.lock); } if (i915_request_is_active(rq)) { @@ -435,7 +435,7 @@ __active_engine(struct i915_request *rq, struct intel_engine_cs **active) ret = true; } - spin_unlock_irq(&locked->active.lock); + spin_unlock_irq(&locked->sched.lock); return ret; } diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index a2916c7fcc48..d7ff84d92936 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -575,8 +575,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine) execlists->queue_priority_hint = INT_MIN; execlists->queue = RB_ROOT_CACHED; - - i915_sched_init_ipi(&execlists->ipi); } static void cleanup_status_page(struct intel_engine_cs *engine) @@ -692,7 +690,12 @@ static int engine_setup_common(struct intel_engine_cs *engine) goto err_status; } - intel_engine_init_active(engine, ENGINE_PHYSICAL); + i915_sched_init(&engine->sched, + engine->i915->drm.dev, + engine->name, + engine->mask, + ENGINE_PHYSICAL); + intel_engine_init_execlists(engine); intel_engine_init_cmd_parser(engine); intel_engine_init__pm(engine); @@ -761,28 +764,6 @@ static int measure_breadcrumb_dw(struct intel_context *ce) return dw; } -void -intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass) -{ - INIT_LIST_HEAD(&engine->active.requests); - INIT_LIST_HEAD(&engine->active.hold); - - spin_lock_init(&engine->active.lock); - lockdep_set_subclass(&engine->active.lock, subclass); - - /* - * Due to an interesting quirk in lockdep's internal debug tracking, - * after setting a subclass we must ensure the lock is used. Otherwise, - * nr_unused_locks is incremented once too often. - */ -#ifdef CONFIG_DEBUG_LOCK_ALLOC - local_irq_disable(); - lock_map_acquire(&engine->active.lock.dep_map); - lock_map_release(&engine->active.lock.dep_map); - local_irq_enable(); -#endif -} - static struct intel_context * create_pinned_context(struct intel_engine_cs *engine, unsigned int hwsp, @@ -930,7 +911,7 @@ int intel_engines_init(struct intel_gt *gt) */ void intel_engine_cleanup_common(struct intel_engine_cs *engine) { - GEM_BUG_ON(!list_empty(&engine->active.requests)); + GEM_BUG_ON(!list_empty(&engine->sched.requests)); tasklet_kill(&engine->execlists.tasklet); /* flush the callback */ intel_breadcrumbs_free(engine->breadcrumbs); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e5637e831d28..0936b0699cbb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -258,8 +258,6 @@ struct intel_engine_execlists { struct rb_root_cached queue; struct rb_root_cached virtual; - struct i915_sched_ipi ipi; - /** * @csb_write: control register for Context Switch buffer * @@ -329,11 +327,7 @@ struct intel_engine_cs { struct intel_sseu sseu; - struct i915_sched { - spinlock_t lock; - struct list_head requests; - struct list_head hold; /* ready requests, but on hold */ - } active; + struct i915_sched sched; /* keep a request in reserve for a [pm] barrier under oom */ struct i915_request *request_pool; @@ -626,7 +620,7 @@ intel_engine_has_relative_mmio(const struct intel_engine_cs * const engine) static inline struct i915_sched * intel_engine_get_scheduler(struct intel_engine_cs *engine) { - return &engine->active; + return &engine->sched; } #endif /* __INTEL_ENGINE_TYPES_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 280d84c4e4b7..dd1429a476d5 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -293,7 +293,7 @@ static int virtual_prio(const struct intel_engine_execlists *el) static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { - const struct i915_sched *se = &engine->active; + const struct i915_sched *se = &engine->sched; int last_prio; if (!intel_engine_has_semaphores(engine)) @@ -1019,7 +1019,7 @@ timeslice_yield(const struct intel_engine_execlists *el, static bool needs_timeslice(const struct intel_engine_cs *engine, const struct i915_request *rq) { - const struct i915_sched *se = &engine->active; + const struct i915_sched *se = &engine->sched; if (!intel_engine_has_timeslices(engine)) return false; @@ -1276,7 +1276,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) while ((ve = first_virtual_engine(engine))) { struct i915_request *rq; - spin_lock(&ve->base.active.lock); + spin_lock(&ve->base.sched.lock); rq = ve->request; if (unlikely(!virtual_matches(ve, rq, engine))) @@ -1286,12 +1286,12 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(rq->context != &ve->context); if (unlikely(rq_prio(rq) < queue_prio(execlists))) { - spin_unlock(&ve->base.active.lock); + spin_unlock(&ve->base.sched.lock); break; } if (last && !can_merge_rq(last, rq)) { - spin_unlock(&ve->base.active.lock); + spin_unlock(&ve->base.sched.lock); spin_unlock(&se->lock); return; /* leave this for another sibling */ } @@ -1338,7 +1338,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) i915_request_put(rq); unlock: - spin_unlock(&ve->base.active.lock); + spin_unlock(&ve->base.sched.lock); /* * Hmm, we have a bunch of virtual engine requests, @@ -2704,7 +2704,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) rb_erase_cached(rb, &execlists->virtual); RB_CLEAR_NODE(rb); - spin_lock(&ve->base.active.lock); + spin_lock(&ve->base.sched.lock); rq = fetch_and_zero(&ve->request); if (rq) { if (i915_request_mark_eio(rq)) { @@ -2716,7 +2716,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) ve->base.execlists.queue_priority_hint = INT_MIN; } - spin_unlock(&ve->base.active.lock); + spin_unlock(&ve->base.sched.lock); } /* Remaining _unready_ requests will be nop'ed when submitted */ @@ -3002,13 +3002,13 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) if (RB_EMPTY_NODE(node)) continue; - spin_lock_irq(&sibling->active.lock); + spin_lock_irq(&sibling->sched.lock); /* Detachment is lazily performed in the execlists tasklet */ if (!RB_EMPTY_NODE(node)) rb_erase_cached(node, &sibling->execlists.virtual); - spin_unlock_irq(&sibling->active.lock); + spin_unlock_irq(&sibling->sched.lock); } GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); GEM_BUG_ON(!list_empty(virtual_queue(ve))); @@ -3355,7 +3355,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); - intel_engine_init_active(&ve->base, ENGINE_VIRTUAL); intel_engine_init_execlists(&ve->base); ve->base.cops = &virtual_context_ops; @@ -3441,6 +3440,12 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.flags |= I915_ENGINE_IS_VIRTUAL; + i915_sched_init(&ve->base.sched, + ve->base.i915->drm.dev, + ve->base.name, + ve->base.mask, + ENGINE_VIRTUAL); + virtual_engine_initial_hint(ve); return &ve->context; diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index b4d26d3bf39f..8b1c2727d25c 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -328,7 +328,12 @@ int mock_engine_init(struct intel_engine_cs *engine) { struct intel_context *ce; - intel_engine_init_active(engine, ENGINE_MOCK); + i915_sched_init(&engine->sched, + engine->i915->drm.dev, + engine->name, + engine->mask, + ENGINE_MOCK); + intel_engine_init_execlists(engine); intel_engine_init__pm(engine); intel_engine_init_retire(engine); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 947e4fad7cf0..d736c1aae6e5 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -255,10 +255,10 @@ static void remove_from_engine(struct i915_request *rq) * check that the rq still belongs to the newly locked engine. */ locked = READ_ONCE(rq->engine); - spin_lock_irq(&locked->active.lock); + spin_lock_irq(&locked->sched.lock); while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { - spin_unlock(&locked->active.lock); - spin_lock(&engine->active.lock); + spin_unlock(&locked->sched.lock); + spin_lock(&engine->sched.lock); locked = engine; } list_del_init(&rq->sched.link); @@ -269,7 +269,7 @@ static void remove_from_engine(struct i915_request *rq) /* Prevent further __await_execution() registering a cb, then flush */ set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); - spin_unlock_irq(&locked->active.lock); + spin_unlock_irq(&locked->sched.lock); __notify_execute_cb_imm(rq); } diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index e320edd718f3..3a5d6bdcd8dd 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -51,11 +51,13 @@ struct i915_capture_list { struct i915_vma *vma; }; +#define RQ_FMT "%llx:%lld" +#define RQ_ARG(rq) (rq) ? (rq)->fence.context : 0, (rq) ? (rq)->fence.seqno : 0 + #define RQ_TRACE(rq, fmt, ...) do { \ const struct i915_request *rq__ = (rq); \ - ENGINE_TRACE(rq__->engine, "fence %llx:%lld, current %d " fmt, \ - rq__->fence.context, rq__->fence.seqno, \ - hwsp_seqno(rq__), ##__VA_ARGS__); \ + ENGINE_TRACE(rq__->engine, "fence " RQ_FMT ", current %d " fmt, \ + RQ_ARG(rq__), hwsp_seqno(rq__), ##__VA_ARGS__); \ } while (0) enum { diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 663db3c36762..5eea8c6b85a8 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -85,16 +85,48 @@ static void ipi_schedule(struct work_struct *wrk) } while (rq); } -void i915_sched_init_ipi(struct i915_sched_ipi *ipi) +static void i915_sched_init_ipi(struct i915_sched_ipi *ipi) { INIT_WORK(&ipi->work, ipi_schedule); ipi->list = NULL; } +void i915_sched_init(struct i915_sched *se, + struct device *dev, + const char *name, + unsigned long mask, + unsigned int subclass) +{ + spin_lock_init(&se->lock); + lockdep_set_subclass(&se->lock, subclass); + + se->dbg.dev = dev; + se->dbg.name = name; + + se->mask = mask; + + INIT_LIST_HEAD(&se->requests); + INIT_LIST_HEAD(&se->hold); + + i915_sched_init_ipi(&se->ipi); + + /* + * Due to an interesting quirk in lockdep's internal debug tracking, + * after setting a subclass we must ensure the lock is used. Otherwise, + * nr_unused_locks is incremented once too often. + */ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + local_irq_disable(); + lock_map_acquire(&se->lock.dep_map); + lock_map_release(&se->lock.dep_map); + local_irq_enable(); +#endif +} + static void __ipi_add(struct i915_request *rq) { #define STUB ((struct i915_request *)1) - struct intel_engine_cs *engine = READ_ONCE(rq->engine); + struct i915_sched *se = i915_request_get_scheduler(rq); struct i915_request *first; if (!i915_request_get_rcu(rq)) @@ -114,13 +146,13 @@ static void __ipi_add(struct i915_request *rq) } /* Carefully insert ourselves into the head of the llist */ - first = READ_ONCE(engine->execlists.ipi.list); + first = READ_ONCE(se->ipi.list); do { rq->sched.ipi_link = ptr_pack_bits(first, 1, 1); - } while (!try_cmpxchg(&engine->execlists.ipi.list, &first, rq)); + } while (!try_cmpxchg(&se->ipi.list, &first, rq)); if (!first) - queue_work(system_unbound_wq, &engine->execlists.ipi.work); + queue_work(system_unbound_wq, &se->ipi.work); } /* @@ -133,11 +165,11 @@ static void __ipi_add(struct i915_request *rq) struct i915_request * const rq__ = (rq); \ struct intel_engine_cs *engine__ = READ_ONCE(rq__->engine); \ \ - spin_lock_irqsave(&engine__->active.lock, (flags)); \ + spin_lock_irqsave(&engine__->sched.lock, (flags)); \ while (engine__ != READ_ONCE((rq__)->engine)) { \ - spin_unlock(&engine__->active.lock); \ + spin_unlock(&engine__->sched.lock); \ engine__ = READ_ONCE(rq__->engine); \ - spin_lock(&engine__->active.lock); \ + spin_lock(&engine__->sched.lock); \ } \ \ engine__; \ @@ -303,12 +335,11 @@ static void kick_submission(struct intel_engine_cs *engine, if (inflight->context == rq->context) return; - ENGINE_TRACE(engine, - "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n", - prio, - rq->fence.context, rq->fence.seqno, - inflight->fence.context, inflight->fence.seqno, - inflight->sched.attr.priority); + SCHED_TRACE(&engine->sched, + "bumping queue-priority-hint:%d for rq:" RQ_FMT ", inflight:" RQ_FMT " prio %d\n", + prio, + RQ_ARG(rq), RQ_ARG(inflight), + inflight->sched.attr.priority); engine->execlists.queue_priority_hint = prio; if (need_preempt(prio, rq_prio(inflight))) @@ -333,6 +364,9 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) struct list_head *pos = &rq->sched.signalers_list; struct list_head *plist; + SCHED_TRACE(&engine->sched, "PI for " RQ_FMT ", prio:%d\n", + RQ_ARG(rq), prio); + plist = lookup_priolist(engine, prio); /* @@ -461,7 +495,7 @@ void i915_request_set_priority(struct i915_request *rq, int prio) GEM_BUG_ON(rq_prio(rq) != prio); unlock: - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched.lock, flags); } void __i915_sched_defer_request(struct intel_engine_cs *engine, @@ -473,6 +507,8 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, struct i915_request *rn; LIST_HEAD(dfs); + SCHED_TRACE(se, "defer request " RQ_FMT "\n", RQ_ARG(rq)); + lockdep_assert_held(&se->lock); GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); @@ -601,6 +637,8 @@ void i915_request_enqueue(struct i915_request *rq) unsigned long flags; bool kick = false; + SCHED_TRACE(se, "queue request " RQ_FMT "\n", RQ_ARG(rq)); + /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&se->lock, flags); GEM_BUG_ON(test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); @@ -660,6 +698,10 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) active = rq; } + SCHED_TRACE(se, + "rewind requests, active request " RQ_FMT "\n", + RQ_ARG(active)); + return active; } @@ -678,8 +720,7 @@ bool __i915_sched_suspend_request(struct intel_engine_cs *engine, if (i915_request_on_hold(rq)) return false; - ENGINE_TRACE(engine, "suspending request %llx:%lld\n", - rq->fence.context, rq->fence.seqno); + SCHED_TRACE(se, "suspending request " RQ_FMT "\n", RQ_ARG(rq)); /* * Transfer this request onto the hold queue to prevent it @@ -761,8 +802,7 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, if (!i915_request_on_hold(rq)) return; - ENGINE_TRACE(engine, "resuming request %llx:%lld\n", - rq->fence.context, rq->fence.seqno); + SCHED_TRACE(se, "resuming request " RQ_FMT "\n", RQ_ARG(rq)); /* * Move this request back to the priority queue, and all of its diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 00ce0a9d519d..ebd93ae303b4 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -16,6 +16,13 @@ struct drm_printer; struct intel_engine_cs; +#define SCHED_TRACE(se, fmt, ...) do { \ + const struct i915_sched *se__ __maybe_unused = (se); \ + GEM_TRACE("%s sched:%s: " fmt, \ + dev_name(se__->dbg.dev), se__->dbg.name, \ + ##__VA_ARGS__); \ +} while (0) + #define priolist_for_each_request(it, plist) \ list_for_each_entry(it, &(plist)->requests, sched.link) @@ -36,7 +43,11 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, void i915_sched_node_retire(struct i915_sched_node *node); -void i915_sched_init_ipi(struct i915_sched_ipi *ipi); +void i915_sched_init(struct i915_sched *se, + struct device *dev, + const char *name, + unsigned long mask, + unsigned int subclass); void i915_request_set_priority(struct i915_request *request, int prio); diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index f2b0ac3a05a5..b7ee122d4f28 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -14,10 +14,33 @@ struct i915_request; -/* Inter-engine scheduling delegation */ -struct i915_sched_ipi { - struct i915_request *list; - struct work_struct work; +/** + * struct i915_sched - funnels requests towards hardware + * + * The struct i915_sched captures all the requests as they become ready + * to execute (on waking the i915_request.submit fence) puts them into + * a queue where they may be reordered according to priority and then + * wakes the backend tasklet to feed the queue to HW. + */ +struct i915_sched { + spinlock_t lock; /* protects the scheduling lists and queue */ + + unsigned long mask; /* available scheduling channels */ + + struct list_head requests; /* active request, on HW */ + struct list_head hold; /* ready requests, but on hold */ + + /* Inter-engine scheduling delegate */ + struct i915_sched_ipi { + struct i915_request *list; + struct work_struct work; + } ipi; + + /* Pretty device names for debug messages */ + struct { + struct device *dev; + const char *name; + } dbg; }; struct i915_sched_attr { diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index 35a479184fee..b1a0a711e01f 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -887,6 +887,7 @@ int i915_scheduler_perf_selftests(struct drm_i915_private *i915) } types[] = { #define T(t) { #t, sizeof(struct t) } T(i915_priolist), + T(i915_sched), T(i915_sched_attr), T(i915_sched_node), T(i915_dependency), From patchwork Mon Feb 1 08:56:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AE48C43381 for ; Mon, 1 Feb 2021 08:58:18 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3721564E3F for ; Mon, 1 Feb 2021 08:58:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3721564E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C050C6E51D; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 742186E4CF for ; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757735-1500050 for multiple; Mon, 01 Feb 2021 08:57:19 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:40 +0000 Message-Id: <20210201085715.27435-22-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 22/57] drm/i915: Move scheduler queue X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Extract the scheduling queue from "execlists" into the per-engine scheduling structs, for reuse by other backends. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 +- drivers/gpu/drm/i915/gem/i915_gem_wait.c | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 7 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 3 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 14 ----- .../drm/i915/gt/intel_execlists_submission.c | 29 +++++----- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_request.h | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 57 ++++++++++++------- drivers/gpu/drm/i915/i915_scheduler.h | 15 +++++ drivers/gpu/drm/i915/i915_scheduler_types.h | 14 +++++ .../gpu/drm/i915/selftests/i915_scheduler.c | 13 ++--- 13 files changed, 100 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 085f6a3735e8..d5bc75508048 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -19,7 +19,7 @@ #include "gt/intel_context_types.h" -#include "i915_scheduler.h" +#include "i915_scheduler_types.h" #include "i915_sw_fence.h" struct pid; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c index d79bf16083bd..4d1897c347b9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -13,6 +13,7 @@ #include "dma_resv_utils.h" #include "i915_gem_ioctls.h" #include "i915_gem_object.h" +#include "i915_scheduler.h" static long i915_gem_object_wait_fence(struct dma_fence *fence, diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index d7ff84d92936..4c07c6f61924 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -574,7 +574,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine) memset(execlists->inflight, 0, sizeof(execlists->inflight)); execlists->queue_priority_hint = INT_MIN; - execlists->queue = RB_ROOT_CACHED; } static void cleanup_status_page(struct intel_engine_cs *engine) @@ -911,7 +910,7 @@ int intel_engines_init(struct intel_gt *gt) */ void intel_engine_cleanup_common(struct intel_engine_cs *engine) { - GEM_BUG_ON(!list_empty(&engine->sched.requests)); + i915_sched_fini(intel_engine_get_scheduler(engine)); tasklet_kill(&engine->execlists.tasklet); /* flush the callback */ intel_breadcrumbs_free(engine->breadcrumbs); @@ -1225,6 +1224,8 @@ void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync) */ bool intel_engine_is_idle(struct intel_engine_cs *engine) { + struct i915_sched *se = intel_engine_get_scheduler(engine); + /* More white lies, if wedged, hw state is inconsistent */ if (intel_gt_is_wedged(engine->gt)) return true; @@ -1237,7 +1238,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) intel_engine_flush_submission(engine); /* ELSP is empty, but there are ready requests? E.g. after reset */ - if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) + if (!i915_sched_is_idle(se)) return false; /* Ring stopped? */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 6372d7826bc9..3510c9236334 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -4,6 +4,7 @@ */ #include "i915_drv.h" +#include "i915_scheduler.h" #include "intel_breadcrumbs.h" #include "intel_context.h" @@ -276,7 +277,7 @@ static int __engine_park(struct intel_wakeref *wf) if (engine->park) engine->park(engine); - engine->execlists.no_priolist = false; + i915_sched_park(intel_engine_get_scheduler(engine)); /* While gt calls i915_vma_parked(), we have to break the lock cycle */ intel_gt_pm_put_async(engine->gt); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 0936b0699cbb..c36bdd957f8f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -153,11 +153,6 @@ struct intel_engine_execlists { */ struct timer_list preempt; - /** - * @default_priolist: priority list for I915_PRIORITY_NORMAL - */ - struct i915_priolist default_priolist; - /** * @ccid: identifier for contexts submitted to this engine */ @@ -192,11 +187,6 @@ struct intel_engine_execlists { */ u32 reset_ccid; - /** - * @no_priolist: priority lists disabled - */ - bool no_priolist; - /** * @submit_reg: gen-specific execlist submission register * set to the ExecList Submission Port (elsp) register pre-Gen11 and to @@ -252,10 +242,6 @@ struct intel_engine_execlists { */ int queue_priority_hint; - /** - * @queue: queue of requests, in priority lists - */ - struct rb_root_cached queue; struct rb_root_cached virtual; /** diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index dd1429a476d5..95208d45ffb1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -272,11 +272,11 @@ static int effective_prio(const struct i915_request *rq) return prio; } -static int queue_prio(const struct intel_engine_execlists *execlists) +static int queue_prio(const struct i915_sched *se) { struct rb_node *rb; - rb = rb_first_cached(&execlists->queue); + rb = rb_first_cached(&se->queue); if (!rb) return INT_MIN; @@ -340,7 +340,7 @@ static bool need_preempt(const struct intel_engine_cs *engine, * context, it's priority would not exceed ELSP[0] aka last_prio. */ return max(virtual_prio(&engine->execlists), - queue_prio(&engine->execlists)) > last_prio; + queue_prio(se)) > last_prio; } __maybe_unused static bool @@ -1033,13 +1033,13 @@ static bool needs_timeslice(const struct intel_engine_cs *engine, return false; /* If ELSP[1] is occupied, always check to see if worth slicing */ - if (!list_is_last_rcu(&rq->sched.link, &se->requests)) { + if (!i915_sched_is_last_request(se, rq)) { ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); return true; } /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ - if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) { + if (!i915_sched_is_idle(se)) { ENGINE_TRACE(engine, "timeslice required for queue\n"); return true; } @@ -1285,7 +1285,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(rq->engine != &ve->base); GEM_BUG_ON(rq->context != &ve->context); - if (unlikely(rq_prio(rq) < queue_prio(execlists))) { + if (unlikely(rq_prio(rq) < queue_prio(se))) { spin_unlock(&ve->base.sched.lock); break; } @@ -1351,7 +1351,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) break; } - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&se->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn; @@ -1430,7 +1430,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &se->queue); i915_priolist_free(p); } done: @@ -1452,7 +1452,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * request triggering preemption on the next dequeue (or subsequent * interrupt for secondary ports). */ - execlists->queue_priority_hint = queue_prio(execlists); + execlists->queue_priority_hint = queue_prio(se); spin_unlock(&se->lock); /* @@ -2678,7 +2678,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) intel_engine_signal_breadcrumbs(engine); /* Flush the queued requests to the timeline list (for retiring). */ - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&se->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { @@ -2688,9 +2688,10 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) } } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &se->queue); i915_priolist_free(p); } + GEM_BUG_ON(!i915_sched_is_idle(se)); /* On-hold requests will be flushed to timeline upon their release */ list_for_each_entry(rq, &se->hold, sched.link) @@ -2722,7 +2723,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) /* Remaining _unready_ requests will be nop'ed when submitted */ execlists->queue_priority_hint = INT_MIN; - execlists->queue = RB_ROOT_CACHED; + se->queue = RB_ROOT_CACHED; GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); execlists->tasklet.callback = nop_submission_tasklet; @@ -2957,7 +2958,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) static struct list_head *virtual_queue(struct virtual_engine *ve) { - return &ve->base.execlists.default_priolist.requests; + return &ve->base.sched.default_priolist.requests; } static void rcu_virtual_context_destroy(struct work_struct *wrk) @@ -3558,7 +3559,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, last = NULL; count = 0; - for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) { + for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { struct i915_priolist *p = rb_entry(rb, typeof(*p), node); priolist_for_each_request(rq, p) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 45f6d38341ef..6f07f1124a13 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -204,7 +204,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine) * event. */ port = first; - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&se->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn; @@ -224,7 +224,7 @@ static void __guc_dequeue(struct intel_engine_cs *engine) last = rq; } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &se->queue); i915_priolist_free(p); } done: @@ -362,7 +362,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) } /* Flush the queued requests to the timeline list (for retiring). */ - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&se->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { @@ -372,14 +372,15 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) i915_request_mark_complete(rq); } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &se->queue); i915_priolist_free(p); } + GEM_BUG_ON(!i915_sched_is_idle(se)); /* Remaining _unready_ requests will be nop'ed when submitted */ execlists->queue_priority_hint = INT_MIN; - execlists->queue = RB_ROOT_CACHED; + se->queue = RB_ROOT_CACHED; spin_unlock_irqrestore(&se->lock, flags); } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index f684147290cb..0e4d7998be53 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -99,7 +99,6 @@ #include "i915_gpu_error.h" #include "i915_perf_types.h" #include "i915_request.h" -#include "i915_scheduler.h" #include "gt/intel_timeline.h" #include "i915_vma.h" #include "i915_irq.h" diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 3a5d6bdcd8dd..c41582b96b46 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -35,7 +35,7 @@ #include "gt/intel_timeline_types.h" #include "i915_gem.h" -#include "i915_scheduler.h" +#include "i915_scheduler_types.h" #include "i915_selftest.h" #include "i915_sw_fence.h" diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 5eea8c6b85a8..aef14e4463c3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -107,6 +107,7 @@ void i915_sched_init(struct i915_sched *se, INIT_LIST_HEAD(&se->requests); INIT_LIST_HEAD(&se->hold); + se->queue = RB_ROOT_CACHED; i915_sched_init_ipi(&se->ipi); @@ -123,6 +124,19 @@ void i915_sched_init(struct i915_sched *se, #endif } +void i915_sched_park(struct i915_sched *se) +{ + GEM_BUG_ON(!i915_sched_is_idle(se)); + se->no_priolist = false; +} + +void i915_sched_fini(struct i915_sched *se) +{ + GEM_BUG_ON(!list_empty(&se->requests)); + + i915_sched_park(se); +} + static void __ipi_add(struct i915_request *rq) { #define STUB ((struct i915_request *)1) @@ -191,7 +205,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); } -static void assert_priolists(struct intel_engine_execlists * const execlists) +static void assert_priolists(struct i915_sched * const se) { struct rb_node *rb; long last_prio; @@ -199,11 +213,11 @@ static void assert_priolists(struct intel_engine_execlists * const execlists) if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) return; - GEM_BUG_ON(rb_first_cached(&execlists->queue) != - rb_first(&execlists->queue.rb_root)); + GEM_BUG_ON(rb_first_cached(&se->queue) != + rb_first(&se->queue.rb_root)); last_prio = INT_MAX; - for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) { + for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { const struct i915_priolist *p = to_priolist(rb); GEM_BUG_ON(p->priority > last_prio); @@ -212,24 +226,22 @@ static void assert_priolists(struct intel_engine_execlists * const execlists) } static struct list_head * -lookup_priolist(struct intel_engine_cs *engine, int prio) +lookup_priolist(struct i915_sched *se, int prio) { - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_priolist *p; struct rb_node **parent, *rb; bool first = true; lockdep_assert_held(&se->lock); - assert_priolists(execlists); + assert_priolists(se); - if (unlikely(execlists->no_priolist)) + if (unlikely(se->no_priolist)) prio = I915_PRIORITY_NORMAL; find_priolist: /* most positive priority is scheduled first, equal priorities fifo */ rb = NULL; - parent = &execlists->queue.rb_root.rb_node; + parent = &se->queue.rb_root.rb_node; while (*parent) { rb = *parent; p = to_priolist(rb); @@ -244,7 +256,7 @@ lookup_priolist(struct intel_engine_cs *engine, int prio) } if (prio == I915_PRIORITY_NORMAL) { - p = &execlists->default_priolist; + p = &se->default_priolist; } else { p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC); /* Convert an allocation failure to a priority bump */ @@ -259,7 +271,7 @@ lookup_priolist(struct intel_engine_cs *engine, int prio) * requests, so if userspace lied about their * dependencies that reordering may be visible. */ - execlists->no_priolist = true; + se->no_priolist = true; goto find_priolist; } } @@ -268,7 +280,7 @@ lookup_priolist(struct intel_engine_cs *engine, int prio) INIT_LIST_HEAD(&p->requests); rb_link_node(&p->node, rb, parent); - rb_insert_color_cached(&p->node, &execlists->queue, first); + rb_insert_color_cached(&p->node, &se->queue, first); return &p->requests; } @@ -361,13 +373,14 @@ static void ipi_priority(struct i915_request *rq, int prio) static void __i915_request_set_priority(struct i915_request *rq, int prio) { struct intel_engine_cs *engine = rq->engine; + struct i915_sched *se = intel_engine_get_scheduler(engine); struct list_head *pos = &rq->sched.signalers_list; struct list_head *plist; SCHED_TRACE(&engine->sched, "PI for " RQ_FMT ", prio:%d\n", RQ_ARG(rq), prio); - plist = lookup_priolist(engine, prio); + plist = lookup_priolist(se, prio); /* * Recursively bump all dependent priorities to match the new request. @@ -570,18 +583,18 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } while ((rq = stack_pop(rq, &pos))); - pos = lookup_priolist(engine, prio); + pos = lookup_priolist(se, prio); list_for_each_entry_safe(rq, rn, &dfs, sched.link) { set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); list_add_tail(&rq->sched.link, pos); } } -static void queue_request(struct intel_engine_cs *engine, +static void queue_request(struct i915_sched *se, struct i915_request *rq) { GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, lookup_priolist(engine, rq_prio(rq))); + list_add_tail(&rq->sched.link, lookup_priolist(se, rq_prio(rq))); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } @@ -648,9 +661,9 @@ void i915_request_enqueue(struct i915_request *rq) list_add_tail(&rq->sched.link, &se->hold); i915_request_set_hold(rq); } else { - queue_request(engine, rq); + queue_request(se, rq); - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + GEM_BUG_ON(i915_sched_is_idle(se)); kick = submit_queue(engine, rq); } @@ -682,9 +695,9 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); if (rq_prio(rq) != prio) { prio = rq_prio(rq); - pl = lookup_priolist(engine, prio); + pl = lookup_priolist(se, prio); } - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + GEM_BUG_ON(i915_sched_is_idle(se)); list_move(&rq->sched.link, pl); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); @@ -819,7 +832,7 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, i915_request_clear_hold(rq); list_del_init(&rq->sched.link); - queue_request(engine, rq); + queue_request(se, rq); /* Also release any children on this engine that are ready */ for_each_waiter(p, rq) { diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index ebd93ae303b4..71bef75859b4 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -12,6 +12,7 @@ #include #include "i915_scheduler_types.h" +#include "i915_request.h" struct drm_printer; struct intel_engine_cs; @@ -48,6 +49,8 @@ void i915_sched_init(struct i915_sched *se, const char *name, unsigned long mask, unsigned int subclass); +void i915_sched_park(struct i915_sched *se); +void i915_sched_fini(struct i915_sched *se); void i915_request_set_priority(struct i915_request *request, int prio); @@ -75,6 +78,18 @@ static inline void i915_priolist_free(struct i915_priolist *p) __i915_priolist_free(p); } +static inline bool i915_sched_is_idle(const struct i915_sched *se) +{ + return RB_EMPTY_ROOT(&se->queue.rb_root); +} + +static inline bool +i915_sched_is_last_request(const struct i915_sched *se, + const struct i915_request *rq) +{ + return list_is_last_rcu(&rq->sched.link, &se->requests); +} + void i915_request_show_with_schedule(struct drm_printer *m, const struct i915_request *rq, const char *prefix, diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index b7ee122d4f28..44dae932e5af 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -29,6 +29,10 @@ struct i915_sched { struct list_head requests; /* active request, on HW */ struct list_head hold; /* ready requests, but on hold */ + /** + * @queue: queue of requests, in priority lists + */ + struct rb_root_cached queue; /* Inter-engine scheduling delegate */ struct i915_sched_ipi { @@ -36,6 +40,16 @@ struct i915_sched { struct work_struct work; } ipi; + /** + * @default_priolist: priority list for I915_PRIORITY_NORMAL + */ + struct i915_priolist default_priolist; + + /** + * @no_priolist: priority lists disabled + */ + bool no_priolist; + /* Pretty device names for debug messages */ struct { struct device *dev; diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index b1a0a711e01f..56d785581535 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -77,8 +77,7 @@ static int all_engines(struct drm_i915_private *i915, return 0; } -static bool check_context_order(struct i915_sched *se, - struct intel_engine_cs *engine) +static bool check_context_order(struct i915_sched *se) { u64 last_seqno, last_context; unsigned long count; @@ -93,7 +92,7 @@ static bool check_context_order(struct i915_sched *se, last_context = 0; last_seqno = 0; last_prio = 0; - for (rb = rb_first_cached(&engine->execlists.queue); rb; rb = rb_next(rb)) { + for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { struct i915_priolist *p = rb_entry(rb, typeof(*p), node); struct i915_request *rq; @@ -175,7 +174,7 @@ static int __single_chain(struct intel_engine_cs *engine, unsigned long length, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq, count, count - 1) && !check_context_order(se, engine)) + if (fn(rq, count, count - 1) && !check_context_order(se)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); @@ -260,7 +259,7 @@ static int __wide_chain(struct intel_engine_cs *engine, unsigned long width, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq[i - 1], i, count) && !check_context_order(se, engine)) + if (fn(rq[i - 1], i, count) && !check_context_order(se)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); @@ -349,7 +348,7 @@ static int __inv_chain(struct intel_engine_cs *engine, unsigned long width, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq[i - 1], i, count) && !check_context_order(se, engine)) + if (fn(rq[i - 1], i, count) && !check_context_order(se)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); @@ -455,7 +454,7 @@ static int __sparse_chain(struct intel_engine_cs *engine, unsigned long width, intel_engine_flush_submission(engine); execlists_active_lock_bh(&engine->execlists); - if (fn(rq[i - 1], i, count) && !check_context_order(se, engine)) + if (fn(rq[i - 1], i, count) && !check_context_order(se)) err = -EINVAL; execlists_active_unlock_bh(&engine->execlists); From patchwork Mon Feb 1 08:56:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C176DC4332D for ; Mon, 1 Feb 2021 08:58:08 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7396764E3F for ; Mon, 1 Feb 2021 08:58:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7396764E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4B8EC6E4BB; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7759E6E4C9 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757736-1500050 for multiple; Mon, 01 Feb 2021 08:57:19 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:41 +0000 Message-Id: <20210201085715.27435-23-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 23/57] drm/i915: Move tasklet from execlists to sched X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Move the scheduling tasklists out of the execlists backend into the per-engine scheduling bookkeeping. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 33 +++----- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 33 ++------ .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 5 -- .../drm/i915/gt/intel_execlists_submission.c | 82 +++++++------------ drivers/gpu/drm/i915/gt/intel_gt_irq.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/selftest_execlists.c | 49 +++++------ drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 3 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 13 +-- drivers/gpu/drm/i915/gt/selftest_reset.c | 3 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 23 ++---- drivers/gpu/drm/i915/i915_request.c | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 45 +++++++++- drivers/gpu/drm/i915/i915_scheduler.h | 34 ++++++++ drivers/gpu/drm/i915/i915_scheduler_types.h | 9 ++ drivers/gpu/drm/i915/selftests/i915_request.c | 10 +-- .../gpu/drm/i915/selftests/i915_scheduler.c | 24 +++--- drivers/gpu/drm/i915/selftests/igt_spinner.c | 2 +- 21 files changed, 200 insertions(+), 180 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index cc2df80eb449..52bba16c62e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -12,6 +12,7 @@ #include "i915_pmu.h" #include "i915_reg.h" #include "i915_request.h" +#include "i915_scheduler.h" #include "i915_selftest.h" #include "intel_engine_types.h" #include "intel_gt_types.h" @@ -123,20 +124,6 @@ execlists_active(const struct intel_engine_execlists *execlists) return active; } -static inline void -execlists_active_lock_bh(struct intel_engine_execlists *execlists) -{ - local_bh_disable(); /* prevent local softirq and lock recursion */ - tasklet_lock(&execlists->tasklet); -} - -static inline void -execlists_active_unlock_bh(struct intel_engine_execlists *execlists) -{ - tasklet_unlock(&execlists->tasklet); - local_bh_enable(); /* restore softirq, and kick ksoftirqd! */ -} - static inline u32 intel_read_status_page(const struct intel_engine_cs *engine, int reg) { @@ -231,12 +218,6 @@ static inline void __intel_engine_reset(struct intel_engine_cs *engine, bool intel_engines_are_idle(struct intel_gt *gt); bool intel_engine_is_idle(struct intel_engine_cs *engine); -void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync); -static inline void intel_engine_flush_submission(struct intel_engine_cs *engine) -{ - __intel_engine_flush_submission(engine, true); -} - void intel_engines_reset_default_submission(struct intel_gt *gt); bool intel_engine_can_store_dword(struct intel_engine_cs *engine); @@ -283,4 +264,16 @@ intel_engine_has_heartbeat(const struct intel_engine_cs *engine) return READ_ONCE(engine->props.heartbeat_interval_ms); } +static inline void +intel_engine_kick_scheduler(struct intel_engine_cs *engine) +{ + i915_sched_kick(intel_engine_get_scheduler(engine)); +} + +static inline void +intel_engine_flush_scheduler(struct intel_engine_cs *engine) +{ + i915_sched_flush(intel_engine_get_scheduler(engine)); +} + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 4c07c6f61924..b5b957283f2c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -911,7 +911,6 @@ int intel_engines_init(struct intel_gt *gt) void intel_engine_cleanup_common(struct intel_engine_cs *engine) { i915_sched_fini(intel_engine_get_scheduler(engine)); - tasklet_kill(&engine->execlists.tasklet); /* flush the callback */ intel_breadcrumbs_free(engine->breadcrumbs); @@ -1194,27 +1193,6 @@ static bool ring_is_idle(struct intel_engine_cs *engine) return idle; } -void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync) -{ - struct tasklet_struct *t = &engine->execlists.tasklet; - - if (!t->callback) - return; - - local_bh_disable(); - if (tasklet_trylock(t)) { - /* Must wait for any GPU reset in progress. */ - if (__tasklet_is_enabled(t)) - t->callback(t); - tasklet_unlock(t); - } - local_bh_enable(); - - /* Synchronise and wait for the tasklet on another CPU */ - if (sync) - tasklet_unlock_wait(t); -} - /** * intel_engine_is_idle() - Report if the engine has finished process all work * @engine: the intel_engine_cs @@ -1235,7 +1213,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) /* Waiting to drain ELSP? */ synchronize_hardirq(engine->i915->drm.pdev->irq); - intel_engine_flush_submission(engine); + i915_sched_flush(se); /* ELSP is empty, but there are ready requests? E.g. after reset */ if (!i915_sched_is_idle(se)) @@ -1450,6 +1428,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, if (intel_engine_uses_guc(engine)) { /* nothing to print yet */ } else if (HAS_EXECLISTS(dev_priv)) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request * const *port, *rq; const u32 *hws = &engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; @@ -1459,8 +1438,8 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n", yesno(test_bit(TASKLET_STATE_SCHED, - &engine->execlists.tasklet.state)), - enableddisabled(!atomic_read(&engine->execlists.tasklet.count)), + &se->tasklet.state)), + enableddisabled(!atomic_read(&se->tasklet.count)), repr_timer(&engine->execlists.preempt), repr_timer(&engine->execlists.timer)); @@ -1484,7 +1463,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, idx, hws[idx * 2], hws[idx * 2 + 1]); } - execlists_active_lock_bh(execlists); + i915_sched_lock_bh(se); rcu_read_lock(); for (port = execlists->active; (rq = *port); port++) { char hdr[160]; @@ -1515,7 +1494,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, i915_request_show(m, rq, hdr, 0); } rcu_read_unlock(); - execlists_active_unlock_bh(execlists); + i915_sched_unlock_bh(se); } else if (INTEL_GEN(dev_priv) > 6) { drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n", ENGINE_READ(engine, RING_PP_DIR_BASE)); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 48a91c0dbad6..fce86bd4b47f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -88,7 +88,7 @@ static void heartbeat(struct work_struct *wrk) unsigned long serial; /* Just in case everything has gone horribly wrong, give it a kick */ - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); rq = engine->heartbeat.systole; if (rq && i915_request_completed(rq)) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 3510c9236334..27d9d17b35cb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -53,7 +53,7 @@ static int __engine_unpark(struct intel_wakeref *wf) /* Flush all pending HW writes before we touch the context */ while (unlikely(intel_context_inflight(ce))) - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); /* First poison the image to verify we never fully trust it */ dbg_poison_ce(ce); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index c36bdd957f8f..97fe5e395a85 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -138,11 +138,6 @@ struct st_preempt_hang { * driver and the hardware state for execlist mode of submission. */ struct intel_engine_execlists { - /** - * @tasklet: softirq tasklet for bottom handler - */ - struct tasklet_struct tasklet; - /** * @timer: kick the current context if its timeslice expires */ diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 95208d45ffb1..be79a352e512 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -514,7 +514,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) resubmit_virtual_request(rq, ve); if (READ_ONCE(ve->request)) - tasklet_hi_schedule(&ve->base.execlists.tasklet); + intel_engine_kick_scheduler(&ve->base); } static void __execlists_schedule_out(struct i915_request * const rq, @@ -680,12 +680,6 @@ trace_ports(const struct intel_engine_execlists *execlists, dump_port(p1, sizeof(p1), ", ", ports[1])); } -static bool -reset_in_progress(const struct intel_engine_execlists *execlists) -{ - return unlikely(!__tasklet_is_enabled(&execlists->tasklet)); -} - static __maybe_unused noinline bool assert_pending_valid(const struct intel_engine_execlists *execlists, const char *msg) @@ -700,7 +694,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists, trace_ports(execlists, msg, execlists->pending); /* We may be messing around with the lists during reset, lalala */ - if (reset_in_progress(execlists)) + if (i915_sched_is_disabled(intel_engine_get_scheduler(engine))) return true; if (!execlists->pending[0]) { @@ -1087,7 +1081,7 @@ static void start_timeslice(struct intel_engine_cs *engine) * its timeslice, so recheck. */ if (!timer_pending(&el->timer)) - tasklet_hi_schedule(&el->tasklet); + intel_engine_kick_scheduler(engine); return; } @@ -1663,14 +1657,6 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) struct i915_request **prev; u8 head, tail; - /* - * As we modify our execlists state tracking we require exclusive - * access. Either we are inside the tasklet, or the tasklet is disabled - * and we assume that is only inside the reset paths and so serialised. - */ - GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) && - !reset_in_progress(execlists)); - /* * Note that csb_write, csb_status may be either in HWSP or mmio. * When reading from the csb_write mmio register, we have to be @@ -2066,6 +2052,7 @@ static void execlists_capture(struct intel_engine_cs *engine) static noinline void execlists_reset(struct intel_engine_cs *engine) { + struct i915_sched *se = intel_engine_get_scheduler(engine); const unsigned int bit = I915_RESET_ENGINE + engine->id; unsigned long *lock = &engine->gt->reset.flags; unsigned long eir = fetch_and_zero(&engine->execlists.error_interrupt); @@ -2089,13 +2076,13 @@ static noinline void execlists_reset(struct intel_engine_cs *engine) ENGINE_TRACE(engine, "reset for %s\n", msg); /* Mark this tasklet as disabled to avoid waiting for it to complete */ - tasklet_disable_nosync(&engine->execlists.tasklet); + tasklet_disable_nosync(&se->tasklet); ring_set_paused(engine, 1); /* Freeze the current request in place */ execlists_capture(engine); intel_engine_reset(engine, msg); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&se->tasklet); clear_and_wake_up_bit(bit, lock); } @@ -2119,10 +2106,16 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) static void execlists_submission_tasklet(struct tasklet_struct *t) { struct intel_engine_cs * const engine = - from_tasklet(engine, t, execlists.tasklet); + from_tasklet(engine, t, sched.tasklet); struct i915_request *post[2 * EXECLIST_MAX_PORTS]; struct i915_request **inactive; + /* + * As we modify our execlists state tracking we require exclusive + * access. Either we are inside the tasklet, or the tasklet is disabled + * and we assume that is only inside the reset paths and so serialised. + */ + rcu_read_lock(); inactive = process_csb(engine, post); GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); @@ -2146,8 +2139,10 @@ static void execlists_submission_tasklet(struct tasklet_struct *t) static void __execlists_kick(struct intel_engine_execlists *execlists) { - /* Kick the tasklet for some interrupt coalescing and reset handling */ - tasklet_hi_schedule(&execlists->tasklet); + struct intel_engine_cs *engine = + container_of(execlists, typeof(*engine), execlists); + + intel_engine_kick_scheduler(engine); } #define execlists_kick(t, member) \ @@ -2470,11 +2465,6 @@ static int execlists_resume(struct intel_engine_cs *engine) static void execlists_reset_prepare(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - - ENGINE_TRACE(engine, "depth<-%d\n", - atomic_read(&execlists->tasklet.count)); - /* * Prevent request submission to the hardware until we have * completed the reset in i915_gem_reset_finish(). If a request @@ -2484,8 +2474,7 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) * Turning off the execlists->tasklet until the reset is over * prevents the race. */ - __tasklet_disable_sync_once(&execlists->tasklet); - GEM_BUG_ON(!reset_in_progress(execlists)); + i915_sched_disable(intel_engine_get_scheduler(engine)); /* * We stop engines, otherwise we might get failed reset and a @@ -2637,7 +2626,7 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) static void nop_submission_tasklet(struct tasklet_struct *t) { struct intel_engine_cs * const engine = - from_tasklet(engine, t, execlists.tasklet); + from_tasklet(engine, t, sched.tasklet); /* The driver is wedged; don't process any more events. */ WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN); @@ -2725,8 +2714,8 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) execlists->queue_priority_hint = INT_MIN; se->queue = RB_ROOT_CACHED; - GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); - execlists->tasklet.callback = nop_submission_tasklet; + GEM_BUG_ON(__tasklet_is_enabled(&se->tasklet)); + se->tasklet.callback = nop_submission_tasklet; spin_unlock_irqrestore(&se->lock, flags); rcu_read_unlock(); @@ -2734,8 +2723,6 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) static void execlists_reset_finish(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - /* * After a GPU reset, we may have requests to replay. Do so now while * we still have the forcewake to be sure that the GPU is not allowed @@ -2746,14 +2733,8 @@ static void execlists_reset_finish(struct intel_engine_cs *engine) * reset as the next level of recovery, and as a final resort we * will declare the device wedged. */ - GEM_BUG_ON(!reset_in_progress(execlists)); - /* And kick in case we missed a new request submission. */ - if (__tasklet_enable(&execlists->tasklet)) - __execlists_kick(execlists); - - ENGINE_TRACE(engine, "depth->%d\n", - atomic_read(&execlists->tasklet.count)); + i915_sched_enable(intel_engine_get_scheduler(engine)); } static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) @@ -2786,7 +2767,7 @@ static bool can_preempt(struct intel_engine_cs *engine) static void execlists_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = i915_request_enqueue; - engine->execlists.tasklet.callback = execlists_submission_tasklet; + engine->sched.tasklet.callback = execlists_submission_tasklet; } static void execlists_shutdown(struct intel_engine_cs *engine) @@ -2794,7 +2775,6 @@ static void execlists_shutdown(struct intel_engine_cs *engine) /* Synchronise with residual timers and any softirq they raise */ del_timer_sync(&engine->execlists.timer); del_timer_sync(&engine->execlists.preempt); - tasklet_kill(&engine->execlists.tasklet); } static void execlists_release(struct intel_engine_cs *engine) @@ -2910,7 +2890,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) struct intel_uncore *uncore = engine->uncore; u32 base = engine->mmio_base; - tasklet_setup(&engine->execlists.tasklet, execlists_submission_tasklet); + tasklet_setup(&engine->sched.tasklet, execlists_submission_tasklet); timer_setup(&engine->execlists.timer, execlists_timeslice, 0); timer_setup(&engine->execlists.preempt, execlists_preempt, 0); @@ -2993,7 +2973,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) * rbtrees as in the case it is running in parallel, it may reinsert * the rb_node into a sibling. */ - tasklet_kill(&ve->base.execlists.tasklet); + i915_sched_fini(se); /* Decouple ourselves from the siblings, no more access allowed. */ for (n = 0; n < ve->num_siblings; n++) { @@ -3011,7 +2991,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) spin_unlock_irq(&sibling->sched.lock); } - GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); + GEM_BUG_ON(__tasklet_is_scheduled(&se->tasklet)); GEM_BUG_ON(!list_empty(virtual_queue(ve))); lrc_fini(&ve->context); @@ -3156,7 +3136,7 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) static void virtual_submission_tasklet(struct tasklet_struct *t) { struct virtual_engine * const ve = - from_tasklet(ve, t, base.execlists.tasklet); + from_tasklet(ve, t, base.sched.tasklet); const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint); intel_engine_mask_t mask; unsigned int n; @@ -3227,7 +3207,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); node->prio = prio; if (first && prio > sibling->execlists.queue_priority_hint) - tasklet_hi_schedule(&sibling->execlists.tasklet); + i915_sched_kick(se); unlock_engine: spin_unlock_irq(&se->lock); @@ -3269,7 +3249,7 @@ static void virtual_submit_request(struct i915_request *rq) GEM_BUG_ON(!list_empty(virtual_queue(ve))); list_move_tail(&rq->sched.link, virtual_queue(ve)); - tasklet_hi_schedule(&ve->base.execlists.tasklet); + intel_engine_kick_scheduler(&ve->base); unlock: spin_unlock_irqrestore(&se->lock, flags); @@ -3366,7 +3346,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, INIT_LIST_HEAD(virtual_queue(ve)); ve->base.execlists.queue_priority_hint = INT_MIN; - tasklet_setup(&ve->base.execlists.tasklet, virtual_submission_tasklet); + tasklet_setup(&ve->base.sched.tasklet, virtual_submission_tasklet); intel_context_init(&ve->context, &ve->base); @@ -3394,7 +3374,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, * layering if we handle cloning of the requests and * submitting a copy into each backend. */ - if (sibling->execlists.tasklet.callback != + if (sibling->sched.tasklet.callback != execlists_submission_tasklet) { err = -ENODEV; goto err_put; diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index 9fc6c912a4e5..6ce5bd28a23d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -59,7 +59,7 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir) } if (tasklet) - tasklet_hi_schedule(&engine->execlists.tasklet); + intel_engine_kick_scheduler(engine); } static u32 diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 14c7b18090f3..36ec97f79174 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -43,7 +43,7 @@ static bool flush_submission(struct intel_gt *gt, long timeout) return false; for_each_engine(engine, gt, id) { - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); /* Flush the background retirement and idle barriers */ flush_work(&engine->retire_work); diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c index 41dc1a542cd6..3ce8cb3329f3 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c @@ -103,7 +103,7 @@ static int __measure_timestamps(struct intel_context *ce, intel_ring_advance(rq, cs); i915_request_get(rq); i915_request_add(rq); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); /* Wait for the request to start executing, that then waits for us */ while (READ_ONCE(sema[2]) == 0) diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 0395f9053a43..cfc0f4b9fbc5 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -43,7 +43,7 @@ static int wait_for_submit(struct intel_engine_cs *engine, unsigned long timeout) { /* Ignore our own attempts to suppress excess tasklets */ - tasklet_hi_schedule(&engine->execlists.tasklet); + intel_engine_kick_scheduler(engine); timeout += jiffies; do { @@ -53,7 +53,7 @@ static int wait_for_submit(struct intel_engine_cs *engine, return 0; /* Wait until the HW has acknowleged the submission (or err) */ - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); if (!READ_ONCE(engine->execlists.pending[0]) && is_active(rq)) return 0; @@ -72,7 +72,7 @@ static int wait_for_reset(struct intel_engine_cs *engine, do { cond_resched(); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); if (READ_ONCE(engine->execlists.pending[0])) continue; @@ -288,7 +288,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio) i915_request_put(rq[0]); err_ce: - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); igt_spinner_end(&spin); for (n = 0; n < ARRAY_SIZE(ce); n++) { if (IS_ERR_OR_NULL(ce[n])) @@ -409,10 +409,10 @@ static int live_unlite_ring(void *arg) } i915_request_add(tmp); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); n++; } - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); pr_debug("%s: Filled ring with %d nop tails {size:%x, tail:%x, emit:%x, rq.tail:%x}\n", engine->name, n, ce[0]->ring->size, @@ -449,7 +449,7 @@ static int live_unlite_ring(void *arg) ce[1]->ring->tail, ce[1]->ring->emit); err_ce: - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); igt_spinner_end(&spin); for (n = 0; n < ARRAY_SIZE(ce); n++) { if (IS_ERR_OR_NULL(ce[n])) @@ -568,6 +568,7 @@ static int live_hold_reset(void *arg) return -ENOMEM; for_each_engine(engine, gt, id) { + struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_context *ce; struct i915_request *rq; @@ -602,9 +603,9 @@ static int live_hold_reset(void *arg) err = -EBUSY; goto out; } - tasklet_disable(&engine->execlists.tasklet); + tasklet_disable(&se->tasklet); - engine->execlists.tasklet.callback(&engine->execlists.tasklet); + se->tasklet.callback(&se->tasklet); GEM_BUG_ON(execlists_active(&engine->execlists) != rq); i915_request_get(rq); @@ -614,7 +615,7 @@ static int live_hold_reset(void *arg) __intel_engine_reset_bh(engine, NULL); GEM_BUG_ON(rq->fence.error != -EIO); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&se->tasklet); clear_and_wake_up_bit(I915_RESET_ENGINE + id, >->reset.flags); local_bh_enable(); @@ -762,7 +763,7 @@ static int live_error_interrupt(void *arg) } /* Kick the tasklet to process the error */ - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); if (client[i]->fence.error != p->error[i]) { pr_err("%s: %s request (%s) with wrong error code: %d\n", engine->name, @@ -1180,8 +1181,8 @@ static int live_timeslice_rewind(void *arg) time_before(jiffies, timeout)) { /* semaphore yield! */ /* Wait for the timeslice to kick in */ del_timer(&engine->execlists.timer); - tasklet_hi_schedule(&engine->execlists.tasklet); - intel_engine_flush_submission(engine); + intel_engine_kick_scheduler(engine); + intel_engine_flush_scheduler(engine); /* * Unfortunately this assumes that during the @@ -1369,7 +1370,7 @@ static int live_timeslice_queue(void *arg) /* Wait until we ack the release_queue and start timeslicing */ do { cond_resched(); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); } while (READ_ONCE(engine->execlists.pending[0])); /* Timeslice every jiffy, so within 2 we should signal */ @@ -2339,9 +2340,9 @@ static int __cancel_fail(struct live_preempt_cancel *arg) /* force preempt reset [failure] */ while (!engine->execlists.pending[0]) - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); del_timer_sync(&engine->execlists.preempt); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); cancel_reset_timeout(engine); @@ -2845,10 +2846,10 @@ static int __live_preempt_ring(struct intel_engine_cs *engine, } i915_request_add(tmp); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); n++; } - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); pr_debug("%s: Filled %d with %d nop tails {size:%x, tail:%x, emit:%x, rq.tail:%x}\n", engine->name, queue_sz, n, ce[0]->ring->size, @@ -2882,7 +2883,7 @@ static int __live_preempt_ring(struct intel_engine_cs *engine, ce[1]->ring->tail, ce[1]->ring->emit); err_ce: - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); igt_spinner_end(spin); for (n = 0; n < ARRAY_SIZE(ce); n++) { if (IS_ERR_OR_NULL(ce[n])) @@ -3417,7 +3418,7 @@ static int live_preempt_timeout(void *arg) i915_request_get(rq); i915_request_add(rq); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); engine->props.preempt_timeout_ms = saved_timeout; if (i915_request_wait(rq, 0, HZ / 10) < 0) { @@ -4457,7 +4458,7 @@ static int bond_virtual_engine(struct intel_gt *gt, } } onstack_fence_fini(&fence); - intel_engine_flush_submission(master); + intel_engine_flush_scheduler(master); igt_spinner_end(&spin); if (i915_request_wait(rq[0], 0, HZ / 10) < 0) { @@ -4596,9 +4597,9 @@ static int reset_virtual_engine(struct intel_gt *gt, err = -EBUSY; goto out_heartbeat; } - tasklet_disable(&engine->execlists.tasklet); + tasklet_disable(&se->tasklet); - engine->execlists.tasklet.callback(&engine->execlists.tasklet); + se->tasklet.callback(&se->tasklet); GEM_BUG_ON(execlists_active(&engine->execlists) != rq); /* Fake a preemption event; failed of course */ @@ -4615,7 +4616,7 @@ static int reset_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(rq->fence.error != -EIO); /* Release our grasp on the engine, letting CS flow again */ - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&se->tasklet); clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, >->reset.flags); local_bh_enable(); diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 8cad102922e7..cdb0ceff3be1 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1701,7 +1701,8 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine, const struct igt_atomic_section *p, const char *mode) { - struct tasklet_struct * const t = &engine->execlists.tasklet; + struct tasklet_struct * const t = + &intel_engine_get_scheduler(engine)->tasklet; int err; GEM_TRACE("i915_reset_engine(%s:%s) under %s\n", diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index e97adf1b7729..279091e41b41 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -49,7 +49,7 @@ static int wait_for_submit(struct intel_engine_cs *engine, unsigned long timeout) { /* Ignore our own attempts to suppress excess tasklets */ - tasklet_hi_schedule(&engine->execlists.tasklet); + intel_engine_kick_scheduler(engine); timeout += jiffies; do { @@ -59,7 +59,7 @@ static int wait_for_submit(struct intel_engine_cs *engine, return 0; /* Wait until the HW has acknowleged the submission (or err) */ - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); if (!READ_ONCE(engine->execlists.pending[0]) && is_active(rq)) return 0; @@ -417,7 +417,7 @@ static int __live_lrc_state(struct intel_engine_cs *engine, if (err) goto err_rq; - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); expected[RING_TAIL_IDX] = ce->ring->tail; if (i915_request_wait(rq, 0, HZ / 5) < 0) { @@ -1852,17 +1852,18 @@ static int live_lrc_indirect_ctx_bb(void *arg) static void garbage_reset(struct intel_engine_cs *engine, struct i915_request *rq) { + struct i915_sched *se = intel_engine_get_scheduler(engine); const unsigned int bit = I915_RESET_ENGINE + engine->id; unsigned long *lock = &engine->gt->reset.flags; local_bh_disable(); if (!test_and_set_bit(bit, lock)) { - tasklet_disable(&engine->execlists.tasklet); + tasklet_disable(&se->tasklet); if (!rq->fence.error) __intel_engine_reset_bh(engine, NULL); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&se->tasklet); clear_and_wake_up_bit(bit, lock); } local_bh_enable(); @@ -1923,7 +1924,7 @@ static int __lrc_garbage(struct intel_engine_cs *engine, struct rnd_state *prng) intel_context_set_banned(ce); garbage_reset(engine, hang); - intel_engine_flush_submission(engine); + intel_engine_flush_scheduler(engine); if (!hang->fence.error) { i915_request_put(hang); pr_err("%s: corrupted context was not reset\n", diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index 8784257ec808..08594309a96d 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -321,7 +321,8 @@ static int igt_atomic_engine_reset(void *arg) goto out_unlock; for_each_engine(engine, gt, id) { - struct tasklet_struct *t = &engine->execlists.tasklet; + struct tasklet_struct *t = + &intel_engine_get_scheduler(engine)->tasklet; if (t->func) tasklet_disable(t); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 6f07f1124a13..15e4ec5ae73a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -240,9 +240,9 @@ static void __guc_dequeue(struct intel_engine_cs *engine) static void guc_submission_tasklet(struct tasklet_struct *t) { + struct i915_sched *se = from_tasklet(se, t, tasklet); struct intel_engine_cs * const engine = - from_tasklet(engine, t, execlists.tasklet); - struct i915_sched *se = intel_engine_get_scheduler(engine); + container_of(se, typeof(*engine), sched); struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_request **port, *rq; unsigned long flags; @@ -268,10 +268,6 @@ static void guc_submission_tasklet(struct tasklet_struct *t) static void guc_reset_prepare(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - - ENGINE_TRACE(engine, "\n"); - /* * Prevent request submission to the hardware until we have * completed the reset in i915_gem_reset_finish(). If a request @@ -281,7 +277,7 @@ static void guc_reset_prepare(struct intel_engine_cs *engine) * Turning off the execlists->tasklet until the reset is over * prevents the race. */ - __tasklet_disable_sync_once(&execlists->tasklet); + i915_sched_enable(intel_engine_get_scheduler(engine)); } static void guc_reset_state(struct intel_context *ce, @@ -387,14 +383,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) static void guc_reset_finish(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - - if (__tasklet_enable(&execlists->tasklet)) - /* And kick in case we missed a new request submission. */ - tasklet_hi_schedule(&execlists->tasklet); - - ENGINE_TRACE(engine, "depth->%d\n", - atomic_read(&execlists->tasklet.count)); + i915_sched_enable(intel_engine_get_scheduler(engine)); } /* @@ -590,8 +579,6 @@ static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ - tasklet_kill(&engine->execlists.tasklet); - intel_engine_cleanup_common(engine); lrc_fini_wa_ctx(engine); } @@ -668,7 +655,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) */ GEM_BUG_ON(INTEL_GEN(i915) < 11); - tasklet_setup(&engine->execlists.tasklet, guc_submission_tasklet); + tasklet_setup(&engine->sched.tasklet, guc_submission_tasklet); guc_default_vfuncs(engine); guc_default_irqs(engine); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d736c1aae6e5..1b52dcaa023d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1847,7 +1847,7 @@ long i915_request_wait(struct i915_request *rq, * for unhappy HW. */ if (i915_request_is_ready(rq)) - __intel_engine_flush_submission(rq->engine, false); + __i915_sched_flush(i915_request_get_scheduler(rq), false); for (;;) { set_current_state(state); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index aef14e4463c3..697127981249 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -134,6 +134,7 @@ void i915_sched_fini(struct i915_sched *se) { GEM_BUG_ON(!list_empty(&se->requests)); + tasklet_kill(&se->tasklet); /* flush the callback */ i915_sched_park(se); } @@ -355,7 +356,7 @@ static void kick_submission(struct intel_engine_cs *engine, engine->execlists.queue_priority_hint = prio; if (need_preempt(prio, rq_prio(inflight))) - tasklet_hi_schedule(&engine->execlists.tasklet); + intel_engine_kick_scheduler(engine); } static void ipi_priority(struct i915_request *rq, int prio) @@ -671,7 +672,7 @@ void i915_request_enqueue(struct i915_request *rq) GEM_BUG_ON(list_empty(&rq->sched.link)); spin_unlock_irqrestore(&se->lock, flags); if (kick) - tasklet_hi_schedule(&engine->execlists.tasklet); + i915_sched_kick(se); } struct i915_request * @@ -809,7 +810,7 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, if (rq_prio(rq) > engine->execlists.queue_priority_hint) { engine->execlists.queue_priority_hint = rq_prio(rq); - tasklet_hi_schedule(&engine->execlists.tasklet); + i915_sched_kick(se); } if (!i915_request_on_hold(rq)) @@ -1005,6 +1006,44 @@ void i915_sched_node_retire(struct i915_sched_node *node) } } +void i915_sched_disable(struct i915_sched *se) +{ + __tasklet_disable_sync_once(&se->tasklet); + GEM_BUG_ON(!i915_sched_is_disabled(se)); + SCHED_TRACE(se, "disable:%d\n", atomic_read(&se->tasklet.count)); +} + +void i915_sched_enable(struct i915_sched *se) +{ + SCHED_TRACE(se, "enable:%d\n", atomic_read(&se->tasklet.count)); + GEM_BUG_ON(!i915_sched_is_disabled(se)); + + /* And kick in case we missed a new request submission. */ + if (__tasklet_enable(&se->tasklet)) + i915_sched_kick(se); +} + +void __i915_sched_flush(struct i915_sched *se, bool sync) +{ + struct tasklet_struct *t = &se->tasklet; + + if (!t->callback) + return; + + local_bh_disable(); + if (tasklet_trylock(t)) { + /* Must wait for any GPU reset in progress. */ + if (__tasklet_is_enabled(t)) + t->callback(t); + tasklet_unlock(t); + } + local_bh_enable(); + + /* Synchronise and wait for the tasklet on another CPU */ + if (sync) + tasklet_unlock_wait(t); +} + void i915_request_show_with_schedule(struct drm_printer *m, const struct i915_request *rq, const char *prefix, diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 71bef75859b4..e2e8b90adb66 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -90,6 +90,40 @@ i915_sched_is_last_request(const struct i915_sched *se, return list_is_last_rcu(&rq->sched.link, &se->requests); } +static inline void +i915_sched_lock_bh(struct i915_sched *se) +{ + local_bh_disable(); /* prevent local softirq and lock recursion */ + tasklet_lock(&se->tasklet); +} + +static inline void +i915_sched_unlock_bh(struct i915_sched *se) +{ + tasklet_unlock(&se->tasklet); + local_bh_enable(); /* restore softirq, and kick ksoftirqd! */ +} + +static inline void i915_sched_kick(struct i915_sched *se) +{ + /* Kick the tasklet for some interrupt coalescing and reset handling */ + tasklet_hi_schedule(&se->tasklet); +} + +static inline bool i915_sched_is_disabled(const struct i915_sched *se) +{ + return unlikely(!__tasklet_is_enabled(&se->tasklet)); +} + +void i915_sched_disable(struct i915_sched *se); +void i915_sched_enable(struct i915_sched *se); + +void __i915_sched_flush(struct i915_sched *se, bool sync); +static inline void i915_sched_flush(struct i915_sched *se) +{ + __i915_sched_flush(se, true); +} + void i915_request_show_with_schedule(struct drm_printer *m, const struct i915_request *rq, const char *prefix, diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 44dae932e5af..9b09749358ad 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -7,6 +7,7 @@ #ifndef _I915_SCHEDULER_TYPES_H_ #define _I915_SCHEDULER_TYPES_H_ +#include #include #include @@ -34,6 +35,14 @@ struct i915_sched { */ struct rb_root_cached queue; + /** + * @tasklet: softirq tasklet for bottom half + * + * The tasklet is responsible for transferring the priority queue + * to HW, and for handling responses from HW. + */ + struct tasklet_struct tasklet; + /* Inter-engine scheduling delegate */ struct i915_sched_ipi { struct i915_request *list; diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index d2a678a2497e..39c619bccb74 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1517,7 +1517,7 @@ static int switch_to_kernel_sync(struct intel_context *ce, int err) i915_request_put(rq); while (!err && !intel_engine_is_idle(ce->engine)) - intel_engine_flush_submission(ce->engine); + intel_engine_flush_scheduler(ce->engine); return err; } @@ -1902,7 +1902,7 @@ static int measure_inter_request(struct intel_context *ce) return -ENOMEM; } - intel_engine_flush_submission(ce->engine); + intel_engine_flush_scheduler(ce->engine); for (i = 1; i <= ARRAY_SIZE(elapsed); i++) { struct i915_request *rq; u32 *cs; @@ -1934,7 +1934,7 @@ static int measure_inter_request(struct intel_context *ce) i915_request_add(rq); } i915_sw_fence_commit(submit); - intel_engine_flush_submission(ce->engine); + intel_engine_flush_scheduler(ce->engine); heap_fence_put(submit); semaphore_set(sema, 1); @@ -2030,7 +2030,7 @@ static int measure_context_switch(struct intel_context *ce) } } i915_request_put(fence); - intel_engine_flush_submission(ce->engine); + intel_engine_flush_scheduler(ce->engine); semaphore_set(sema, 1); err = intel_gt_wait_for_idle(ce->engine->gt, HZ / 2); @@ -2221,7 +2221,7 @@ static int measure_completion(struct intel_context *ce) dma_fence_add_callback(&rq->fence, &cb.base, signal_cb); i915_request_add(rq); - intel_engine_flush_submission(ce->engine); + intel_engine_flush_scheduler(ce->engine); if (wait_for(READ_ONCE(sema[i]) == -1, 50)) { err = -EIO; goto err; diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index 56d785581535..dbbefd0da2f2 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -171,12 +171,12 @@ static int __single_chain(struct intel_engine_cs *engine, unsigned long length, i915_request_add(rq); count++; } - intel_engine_flush_submission(engine); + i915_sched_flush(se); - execlists_active_lock_bh(&engine->execlists); + i915_sched_lock_bh(se); if (fn(rq, count, count - 1) && !check_context_order(se)) err = -EINVAL; - execlists_active_unlock_bh(&engine->execlists); + i915_sched_unlock_bh(se); igt_spinner_end(&spin); err_context: @@ -256,12 +256,12 @@ static int __wide_chain(struct intel_engine_cs *engine, unsigned long width, } i915_request_add(rq[i]); } - intel_engine_flush_submission(engine); + i915_sched_flush(se); - execlists_active_lock_bh(&engine->execlists); + i915_sched_lock_bh(se); if (fn(rq[i - 1], i, count) && !check_context_order(se)) err = -EINVAL; - execlists_active_unlock_bh(&engine->execlists); + i915_sched_unlock_bh(se); igt_spinner_end(&spin); err_free: @@ -345,12 +345,12 @@ static int __inv_chain(struct intel_engine_cs *engine, unsigned long width, } i915_request_add(rq[i]); } - intel_engine_flush_submission(engine); + i915_sched_flush(se); - execlists_active_lock_bh(&engine->execlists); + i915_sched_lock_bh(se); if (fn(rq[i - 1], i, count) && !check_context_order(se)) err = -EINVAL; - execlists_active_unlock_bh(&engine->execlists); + i915_sched_unlock_bh(se); igt_spinner_end(&spin); err_free: @@ -451,12 +451,12 @@ static int __sparse_chain(struct intel_engine_cs *engine, unsigned long width, if (err) break; } - intel_engine_flush_submission(engine); + i915_sched_flush(se); - execlists_active_lock_bh(&engine->execlists); + i915_sched_lock_bh(se); if (fn(rq[i - 1], i, count) && !check_context_order(se)) err = -EINVAL; - execlists_active_unlock_bh(&engine->execlists); + i915_sched_unlock_bh(se); igt_spinner_end(&spin); err_free: diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index 83f6e5f31fb3..0e6c1ea0082a 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -221,7 +221,7 @@ void igt_spinner_fini(struct igt_spinner *spin) bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq) { if (i915_request_is_ready(rq)) - intel_engine_flush_submission(rq->engine); + __i915_sched_flush(i915_request_get_scheduler(rq), false); return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq), rq->fence.seqno), From patchwork Mon Feb 1 08:56:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C92D6C433E9 for ; Mon, 1 Feb 2021 08:58:10 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5D98E64E33 for ; Mon, 1 Feb 2021 08:58:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D98E64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0E2266E4D4; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 53D586E499 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757737-1500050 for multiple; Mon, 01 Feb 2021 08:57:19 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:42 +0000 Message-Id: <20210201085715.27435-24-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 24/57] drm/i915/gt: Only kick the scheduler on timeslice/preemption change X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Kick the scheduler to allow it to see the timeslice duration change, don't peek into execlists. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/sysfs_engines.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c index 57ef5383dd4e..526f8402cfb7 100644 --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c @@ -9,6 +9,7 @@ #include "i915_drv.h" #include "intel_engine.h" #include "intel_engine_heartbeat.h" +#include "intel_engine_pm.h" #include "sysfs_engines.h" struct kobj_engine { @@ -222,9 +223,8 @@ timeslice_store(struct kobject *kobj, struct kobj_attribute *attr, return -EINVAL; WRITE_ONCE(engine->props.timeslice_duration_ms, duration); - - if (execlists_active(&engine->execlists)) - set_timer_ms(&engine->execlists.timer, duration); + if (intel_engine_pm_is_awake(engine)) + intel_engine_kick_scheduler(engine); return count; } @@ -326,9 +326,8 @@ preempt_timeout_store(struct kobject *kobj, struct kobj_attribute *attr, return -EINVAL; WRITE_ONCE(engine->props.preempt_timeout_ms, timeout); - - if (READ_ONCE(engine->execlists.pending[0])) - set_timer_ms(&engine->execlists.preempt, timeout); + if (intel_engine_pm_is_awake(engine)) + intel_engine_kick_scheduler(engine); return count; } From patchwork Mon Feb 1 08:56:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02C57C43381 for ; Mon, 1 Feb 2021 08:58:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9A9D464E3F for ; Mon, 1 Feb 2021 08:58:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9A9D464E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BD74E6E4CF; Mon, 1 Feb 2021 08:57:43 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4E7986E4BB for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757738-1500050 for multiple; Mon, 01 Feb 2021 08:57:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:43 +0000 Message-Id: <20210201085715.27435-25-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 25/57] drm/i915: Move submit_request to i915_sched_engine X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Claim the submit_request vfunc as the entry point into the scheduler backend for ready requests. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 -------- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 11 ++++++----- drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- drivers/gpu/drm/i915/gt/intel_ring_submission.c | 4 ++-- drivers/gpu/drm/i915/gt/mock_engine.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_request.c | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 2 ++ drivers/gpu/drm/i915/i915_scheduler_types.h | 9 +++++++++ drivers/gpu/drm/i915/selftests/i915_request.c | 3 +-- 10 files changed, 26 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 97fe5e395a85..6b0bde292916 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -416,14 +416,6 @@ struct intel_engine_cs { u32 *cs); unsigned int emit_fini_breadcrumb_dw; - /* Pass the request to the hardware queue (e.g. directly into - * the legacy ringbuffer or to the end of an execlist). - * - * This is called from an atomic context with irqs disabled; must - * be irq safe. - */ - void (*submit_request)(struct i915_request *rq); - /* * Called on signaling of a SUBMIT_FENCE, passing along the signaling * request down to the bonded pairs. diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index be79a352e512..33c1a833df20 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -483,7 +483,7 @@ resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); WRITE_ONCE(rq->engine, &ve->base); - ve->base.submit_request(rq); + ve->base.sched.submit_request(rq); spin_unlock_irq(&se->lock); } @@ -2766,7 +2766,7 @@ static bool can_preempt(struct intel_engine_cs *engine) static void execlists_set_default_submission(struct intel_engine_cs *engine) { - engine->submit_request = i915_request_enqueue; + engine->sched.submit_request = i915_request_enqueue; engine->sched.tasklet.callback = execlists_submission_tasklet; } @@ -3227,7 +3227,7 @@ static void virtual_submit_request(struct i915_request *rq) rq->fence.context, rq->fence.seqno); - GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); + GEM_BUG_ON(ve->base.sched.submit_request != virtual_submit_request); spin_lock_irqsave(&se->lock, flags); @@ -3341,12 +3341,10 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.cops = &virtual_context_ops; ve->base.request_alloc = execlists_request_alloc; - ve->base.submit_request = virtual_submit_request; ve->base.bond_execute = virtual_bond_execute; INIT_LIST_HEAD(virtual_queue(ve)); ve->base.execlists.queue_priority_hint = INT_MIN; - tasklet_setup(&ve->base.sched.tasklet, virtual_submission_tasklet); intel_context_init(&ve->context, &ve->base); @@ -3427,6 +3425,9 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.mask, ENGINE_VIRTUAL); + ve->base.sched.submit_request = virtual_submit_request; + tasklet_setup(&ve->base.sched.tasklet, virtual_submission_tasklet); + virtual_engine_initial_hint(ve); return &ve->context; diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 4a8f982a1a4f..e5cb92c7d0f8 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -820,7 +820,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) __intel_gt_reset(gt, ALL_ENGINES); for_each_engine(engine, gt, id) - engine->submit_request = nop_submit_request; + engine->sched.submit_request = nop_submit_request; /* * Make sure no request can slip through without getting completed by diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 8af1bc77e15e..a7d49ea71900 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -970,12 +970,12 @@ static void gen6_bsd_submit_request(struct i915_request *request) static void i9xx_set_default_submission(struct intel_engine_cs *engine) { - engine->submit_request = i9xx_submit_request; + engine->sched.submit_request = i9xx_submit_request; } static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine) { - engine->submit_request = gen6_bsd_submit_request; + engine->sched.submit_request = gen6_bsd_submit_request; } static void ring_release(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 8b1c2727d25c..cae736e34bda 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -302,7 +302,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.request_alloc = mock_request_alloc; engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; - engine->base.submit_request = mock_submit_request; + + engine->base.sched.submit_request = mock_submit_request; engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind; @@ -333,6 +334,7 @@ int mock_engine_init(struct intel_engine_cs *engine) engine->name, engine->mask, ENGINE_MOCK); + engine->sched.submit_request = mock_submit_request; intel_engine_init_execlists(engine); intel_engine_init__pm(engine); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 15e4ec5ae73a..db6ac5a12834 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -572,7 +572,7 @@ static int guc_resume(struct intel_engine_cs *engine) static void guc_set_default_submission(struct intel_engine_cs *engine) { - engine->submit_request = i915_request_enqueue; + engine->sched.submit_request = i915_request_enqueue; } static void guc_release(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 1b52dcaa023d..c03d3cedf497 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -700,7 +700,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) * proceeding. */ rcu_read_lock(); - request->engine->submit_request(request); + i915_request_get_scheduler(request)->submit_request(request); rcu_read_unlock(); break; diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 697127981249..620db6430a10 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -111,6 +111,8 @@ void i915_sched_init(struct i915_sched *se, i915_sched_init_ipi(&se->ipi); + se->submit_request = i915_request_enqueue; + /* * Due to an interesting quirk in lockdep's internal debug tracking, * after setting a subclass we must ensure the lock is used. Otherwise, diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 9b09749358ad..effd035dcb78 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -28,6 +28,15 @@ struct i915_sched { unsigned long mask; /* available scheduling channels */ + /* + * Pass the request to the hardware queue (e.g. directly into + * the legacy ringbuffer or to the end of an execlist). + * + * This is called from an atomic context with irqs disabled; must + * be irq safe. + */ + void (*submit_request)(struct i915_request *rq); + struct list_head requests; /* active request, on HW */ struct list_head hold; /* ready requests, but on hold */ /** diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index 39c619bccb74..8035ea7565ed 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -242,10 +242,9 @@ static int igt_request_rewind(void *arg) i915_request_get(vip); i915_request_add(vip); rcu_read_lock(); - request->engine->submit_request(request); + i915_request_get_scheduler(request)->submit_request(request); rcu_read_unlock(); - if (i915_request_wait(vip, 0, HZ) == -ETIME) { pr_err("timed out waiting for high priority request\n"); goto err; From patchwork Mon Feb 1 08:56:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62598C433E6 for ; Mon, 1 Feb 2021 08:58:15 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 029C864E40 for ; Mon, 1 Feb 2021 08:58:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 029C864E40 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B120B6E51B; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 426366E4BA for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757739-1500050 for multiple; Mon, 01 Feb 2021 08:57:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:44 +0000 Message-Id: <20210201085715.27435-26-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 26/57] drm/i915: Move finding the current active request to the scheduler X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since finding the currently active request starts by walking the scheduler lists under the scheduler lock, move the routine to the scheduler. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine.h | 3 - drivers/gpu/drm/i915/gt/intel_engine_cs.c | 71 ++-------------- .../drm/i915/gt/intel_execlists_submission.c | 83 ++++++++++++++++++- drivers/gpu/drm/i915/i915_gpu_error.c | 18 ++-- drivers/gpu/drm/i915/i915_gpu_error.h | 4 +- drivers/gpu/drm/i915/i915_request.c | 71 +--------------- drivers/gpu/drm/i915/i915_request.h | 8 ++ drivers/gpu/drm/i915/i915_scheduler.c | 50 +++++++++++ drivers/gpu/drm/i915/i915_scheduler_types.h | 4 + 9 files changed, 162 insertions(+), 150 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 52bba16c62e8..c530839627bb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -230,9 +230,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now); -struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine); - u32 intel_engine_context_size(struct intel_gt *gt, u8 class); void intel_engine_init_active(struct intel_engine_cs *engine, diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index b5b957283f2c..5751a529b2df 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1277,7 +1277,7 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine) } } -static struct intel_timeline *get_timeline(struct i915_request *rq) +static struct intel_timeline *get_timeline(const struct i915_request *rq) { struct intel_timeline *tl; @@ -1505,7 +1505,8 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, } } -static void print_request_ring(struct drm_printer *m, struct i915_request *rq) +static void +print_request_ring(struct drm_printer *m, const struct i915_request *rq) { void *ring; int size; @@ -1590,7 +1591,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, { struct i915_gpu_error * const error = &engine->i915->gpu_error; struct i915_sched *se = intel_engine_get_scheduler(engine); - struct i915_request *rq; + const struct i915_request *rq; intel_wakeref_t wakeref; unsigned long flags; ktime_t dummy; @@ -1631,8 +1632,9 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n"); + rcu_read_lock(); spin_lock_irqsave(&se->lock, flags); - rq = intel_engine_find_active_request(engine); + rq = se->active_request(se); if (rq) { struct intel_timeline *tl = get_timeline(rq); @@ -1664,6 +1666,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, } drm_printf(m, "\tOn hold?: %lu\n", list_count(&se->hold)); spin_unlock_irqrestore(&se->lock, flags); + rcu_read_unlock(); drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base); wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm); @@ -1712,66 +1715,6 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now) return ktime_add(total, start); } -static bool match_ring(struct i915_request *rq) -{ - u32 ring = ENGINE_READ(rq->engine, RING_START); - - return ring == i915_ggtt_offset(rq->ring->vma); -} - -struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine) -{ - struct i915_sched *se = intel_engine_get_scheduler(engine); - struct i915_request *request, *active = NULL; - - /* - * We are called by the error capture, reset and to dump engine - * state at random points in time. In particular, note that neither is - * crucially ordered with an interrupt. After a hang, the GPU is dead - * and we assume that no more writes can happen (we waited long enough - * for all writes that were in transaction to be flushed) - adding an - * extra delay for a recent interrupt is pointless. Hence, we do - * not need an engine->irq_seqno_barrier() before the seqno reads. - * At all other times, we must assume the GPU is still running, but - * we only care about the snapshot of this moment. - */ - lockdep_assert_held(&se->lock); - - rcu_read_lock(); - request = execlists_active(&engine->execlists); - if (request) { - struct intel_timeline *tl = request->context->timeline; - - list_for_each_entry_from_reverse(request, &tl->requests, link) { - if (__i915_request_is_complete(request)) - break; - - active = request; - } - } - rcu_read_unlock(); - if (active) - return active; - - list_for_each_entry(request, &se->requests, sched.link) { - if (__i915_request_is_complete(request)) - continue; - - if (!__i915_request_has_started(request)) - continue; - - /* More than one preemptible request may match! */ - if (!match_ring(request)) - continue; - - active = request; - break; - } - - return active; -} - #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "mock_engine.c" #include "selftest_engine.c" diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 33c1a833df20..8b848adb65b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2336,7 +2336,7 @@ static void sanitize_hwsp(struct intel_engine_cs *engine) static void execlists_sanitize(struct intel_engine_cs *engine) { - GEM_BUG_ON(execlists_active(&engine->execlists)); + GEM_BUG_ON(*engine->execlists.active); /* * Poison residual state on resume, in case the suspend didn't! @@ -2755,6 +2755,85 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static const struct i915_request * +execlists_active_request(struct i915_sched *se) +{ + struct intel_engine_cs *engine = + container_of(se, typeof(*engine), sched); + struct i915_request *rq; + + rq = execlists_active(&engine->execlists); + if (rq) + rq = active_request(rq->context->timeline, rq); + + return rq; +} + +static bool execlists_is_executing(const struct i915_request *rq) +{ + struct i915_sched *se = i915_request_get_scheduler(rq); + struct intel_engine_execlists *el = + &container_of(se, struct intel_engine_cs, sched)->execlists; + struct i915_request * const *port, *p; + bool inflight = false; + + if (!i915_request_is_ready(rq)) + return false; + + /* + * Even if we have unwound the request, it may still be on + * the GPU (preempt-to-busy). If that request is inside an + * unpreemptible critical section, it will not be removed. Some + * GPU functions may even be stuck waiting for the paired request + * (__await_execution) to be submitted and cannot be preempted + * until the bond is executing. + * + * As we know that there are always preemption points between + * requests, we know that only the currently executing request + * may be still active even though we have cleared the flag. + * However, we can't rely on our tracking of ELSP[0] to know + * which request is currently active and so maybe stuck, as + * the tracking maybe an event behind. Instead assume that + * if the context is still inflight, then it is still active + * even if the active flag has been cleared. + * + * To further complicate matters, if there a pending promotion, the HW + * may either perform a context switch to the second inflight execlists, + * or it may switch to the pending set of execlists. In the case of the + * latter, it may send the ACK and we process the event copying the + * pending[] over top of inflight[], _overwriting_ our *active. Since + * this implies the HW is arbitrating and not struck in *active, we do + * not worry about complete accuracy, but we do require no read/write + * tearing of the pointer [the read of the pointer must be valid, even + * as the array is being overwritten, for which we require the writes + * to avoid tearing.] + * + * Note that the read of *execlists->active may race with the promotion + * of execlists->pending[] to execlists->inflight[], overwriting + * the value at *execlists->active. This is fine. The promotion implies + * that we received an ACK from the HW, and so the context is not + * stuck -- if we do not see ourselves in *active, the inflight status + * is valid. If instead we see ourselves being copied into *active, + * we are inflight and may signal the callback. + */ + if (!intel_context_inflight(rq->context)) + return false; + + rcu_read_lock(); + for (port = READ_ONCE(el->active); + (p = READ_ONCE(*port)); /* may race with promotion of pending[] */ + port++) { + if (p->context == rq->context) { + inflight = i915_seqno_passed(p->fence.seqno, + rq->fence.seqno); + break; + } + } + rcu_read_unlock(); + + return inflight; +} + static bool can_preempt(struct intel_engine_cs *engine) { if (INTEL_GEN(engine->i915) > 8) @@ -2890,6 +2969,8 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) struct intel_uncore *uncore = engine->uncore; u32 base = engine->mmio_base; + engine->sched.active_request = execlists_active_request; + engine->sched.is_executing = execlists_is_executing; tasklet_setup(&engine->sched.tasklet, execlists_submission_tasklet); timer_setup(&engine->execlists.timer, execlists_timeslice, 0); timer_setup(&engine->execlists.preempt, execlists_preempt, 0); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index f2e4f0232b87..0c5adca7994f 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1262,15 +1262,11 @@ static bool record_context(struct i915_gem_context_coredump *e, struct i915_gem_context *ctx; bool simulated; - rcu_read_lock(); - ctx = rcu_dereference(rq->context->gem_context); if (ctx && !kref_get_unless_zero(&ctx->ref)) ctx = NULL; - if (!ctx) { - rcu_read_unlock(); + if (!ctx) return true; - } if (I915_SELFTEST_ONLY(!ctx->client)) { strcpy(e->comm, "[kernel]"); @@ -1279,8 +1275,6 @@ static bool record_context(struct i915_gem_context_coredump *e, e->pid = pid_nr(i915_drm_client_pid(ctx->client)); } - rcu_read_unlock(); - e->sched_attr = ctx->sched; e->guilty = atomic_read(&ctx->guilty_count); e->active = atomic_read(&ctx->active_count); @@ -1368,12 +1362,14 @@ intel_engine_coredump_alloc(struct intel_engine_cs *engine, gfp_t gfp) struct intel_engine_capture_vma * intel_engine_coredump_add_request(struct intel_engine_coredump *ee, - struct i915_request *rq, + const struct i915_request *rq, gfp_t gfp) { struct intel_engine_capture_vma *vma = NULL; + rcu_read_lock(); ee->simulated |= record_context(&ee->context, rq); + rcu_read_unlock(); if (ee->simulated) return NULL; @@ -1436,19 +1432,21 @@ capture_engine(struct intel_engine_cs *engine, struct i915_sched *se = intel_engine_get_scheduler(engine); struct intel_engine_capture_vma *capture = NULL; struct intel_engine_coredump *ee; - struct i915_request *rq; + const struct i915_request *rq; unsigned long flags; ee = intel_engine_coredump_alloc(engine, GFP_KERNEL); if (!ee) return NULL; + rcu_read_lock(); spin_lock_irqsave(&se->lock, flags); - rq = intel_engine_find_active_request(engine); + rq = se->active_request(se); if (rq) capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); spin_unlock_irqrestore(&se->lock, flags); + rcu_read_unlock(); if (!capture) { kfree(ee); return NULL; diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index 1764fd254df3..2d8debabfe28 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -235,7 +235,7 @@ intel_engine_coredump_alloc(struct intel_engine_cs *engine, gfp_t gfp); struct intel_engine_capture_vma * intel_engine_coredump_add_request(struct intel_engine_coredump *ee, - struct i915_request *rq, + const struct i915_request *rq, gfp_t gfp); void intel_engine_coredump_add_vma(struct intel_engine_coredump *ee, @@ -299,7 +299,7 @@ intel_engine_coredump_alloc(struct intel_engine_cs *engine, gfp_t gfp) static inline struct intel_engine_capture_vma * intel_engine_coredump_add_request(struct intel_engine_coredump *ee, - struct i915_request *rq, + const struct i915_request *rq, gfp_t gfp) { return NULL; diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c03d3cedf497..792dd0bbea3b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -349,74 +349,6 @@ void i915_request_retire_upto(struct i915_request *rq) } while (i915_request_retire(tmp) && tmp != rq); } -static struct i915_request * const * -__engine_active(struct intel_engine_cs *engine) -{ - return READ_ONCE(engine->execlists.active); -} - -static bool __request_in_flight(const struct i915_request *signal) -{ - struct i915_request * const *port, *rq; - bool inflight = false; - - if (!i915_request_is_ready(signal)) - return false; - - /* - * Even if we have unwound the request, it may still be on - * the GPU (preempt-to-busy). If that request is inside an - * unpreemptible critical section, it will not be removed. Some - * GPU functions may even be stuck waiting for the paired request - * (__await_execution) to be submitted and cannot be preempted - * until the bond is executing. - * - * As we know that there are always preemption points between - * requests, we know that only the currently executing request - * may be still active even though we have cleared the flag. - * However, we can't rely on our tracking of ELSP[0] to know - * which request is currently active and so maybe stuck, as - * the tracking maybe an event behind. Instead assume that - * if the context is still inflight, then it is still active - * even if the active flag has been cleared. - * - * To further complicate matters, if there a pending promotion, the HW - * may either perform a context switch to the second inflight execlists, - * or it may switch to the pending set of execlists. In the case of the - * latter, it may send the ACK and we process the event copying the - * pending[] over top of inflight[], _overwriting_ our *active. Since - * this implies the HW is arbitrating and not struck in *active, we do - * not worry about complete accuracy, but we do require no read/write - * tearing of the pointer [the read of the pointer must be valid, even - * as the array is being overwritten, for which we require the writes - * to avoid tearing.] - * - * Note that the read of *execlists->active may race with the promotion - * of execlists->pending[] to execlists->inflight[], overwritting - * the value at *execlists->active. This is fine. The promotion implies - * that we received an ACK from the HW, and so the context is not - * stuck -- if we do not see ourselves in *active, the inflight status - * is valid. If instead we see ourselves being copied into *active, - * we are inflight and may signal the callback. - */ - if (!intel_context_inflight(signal->context)) - return false; - - rcu_read_lock(); - for (port = __engine_active(signal->engine); - (rq = READ_ONCE(*port)); /* may race with promotion of pending[] */ - port++) { - if (rq->context == signal->context) { - inflight = i915_seqno_passed(rq->fence.seqno, - signal->fence.seqno); - break; - } - } - rcu_read_unlock(); - - return inflight; -} - static int __await_execution(struct i915_request *rq, struct i915_request *signal, @@ -460,8 +392,7 @@ __await_execution(struct i915_request *rq, * the completed/retired request. */ if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { - if (i915_request_is_active(signal) || - __request_in_flight(signal)) + if (i915_request_is_executing(signal)) __notify_execute_cb_imm(signal); } diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index c41582b96b46..8322f308b906 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -629,4 +629,12 @@ static inline bool i915_request_use_scheduler(const struct i915_request *rq) return intel_engine_has_scheduler(rq->engine); } +static inline bool i915_request_is_executing(const struct i915_request *rq) +{ + if (i915_request_is_active(rq)) + return true; + + return i915_request_get_scheduler(rq)->is_executing(rq); +} + #endif /* I915_REQUEST_H */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 620db6430a10..cb27bcb7a1f6 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -91,6 +91,54 @@ static void i915_sched_init_ipi(struct i915_sched_ipi *ipi) ipi->list = NULL; } +static bool match_ring(struct i915_request *rq) +{ + const struct intel_engine_cs *engine = rq->engine; + const struct intel_ring *ring = rq->ring; + + return ENGINE_READ(engine, RING_START) == i915_ggtt_offset(ring->vma); +} + +static const struct i915_request *active_request(struct i915_sched *se) +{ + struct i915_request *request, *active = NULL; + + /* + * We are called by the error capture, reset and to dump engine + * state at random points in time. In particular, note that neither is + * crucially ordered with an interrupt. After a hang, the GPU is dead + * and we assume that no more writes can happen (we waited long enough + * for all writes that were in transaction to be flushed) - adding an + * extra delay for a recent interrupt is pointless. Hence, we do + * not need an engine->irq_seqno_barrier() before the seqno reads. + * At all other times, we must assume the GPU is still running, but + * we only care about the snapshot of this moment. + */ + lockdep_assert_held(&se->lock); + + list_for_each_entry(request, &se->requests, sched.link) { + if (__i915_request_is_complete(request)) + continue; + + if (!__i915_request_has_started(request)) + continue; + + /* More than one preemptible request may match! */ + if (!match_ring(request)) + continue; + + active = request; + break; + } + + return active; +} + +static bool not_executing(const struct i915_request *rq) +{ + return false; +} + void i915_sched_init(struct i915_sched *se, struct device *dev, const char *name, @@ -112,6 +160,8 @@ void i915_sched_init(struct i915_sched *se, i915_sched_init_ipi(&se->ipi); se->submit_request = i915_request_enqueue; + se->active_request = active_request; + se->is_executing = not_executing; /* * Due to an interesting quirk in lockdep's internal debug tracking, diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index effd035dcb78..9a9b8e0d78ae 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -37,6 +37,10 @@ struct i915_sched { */ void (*submit_request)(struct i915_request *rq); + const struct i915_request *(*active_request)(struct i915_sched *se); + + bool (*is_executing)(const struct i915_request *rq); + struct list_head requests; /* active request, on HW */ struct list_head hold; /* ready requests, but on hold */ /** From patchwork Mon Feb 1 08:56:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61B47C433DB for ; Mon, 1 Feb 2021 08:58:11 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 067B164E3F for ; Mon, 1 Feb 2021 08:58:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 067B164E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F179A6E4D2; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2E91B6E4B6 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757740-1500050 for multiple; Mon, 01 Feb 2021 08:57:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:45 +0000 Message-Id: <20210201085715.27435-27-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 27/57] drm/i915: Show execlists queues when dumping state X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Move the scheduler pretty printer from out of the execlists register state to and push it to the schduler. v2: It's not common to all, so shove it out of intel_engine_cs and split it between scheduler front/back ends Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 233 +----------------- .../drm/i915/gt/intel_execlists_submission.c | 174 ++++++++----- drivers/gpu/drm/i915/i915_request.c | 6 + drivers/gpu/drm/i915/i915_scheduler.c | 180 ++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.h | 8 + drivers/gpu/drm/i915/i915_scheduler_types.h | 9 + 6 files changed, 331 insertions(+), 279 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 5751a529b2df..9ff597ef5aca 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1277,49 +1277,6 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine) } } -static struct intel_timeline *get_timeline(const struct i915_request *rq) -{ - struct intel_timeline *tl; - - /* - * Even though we are holding the engine->active.lock here, there - * is no control over the submission queue per-se and we are - * inspecting the active state at a random point in time, with an - * unknown queue. Play safe and make sure the timeline remains valid. - * (Only being used for pretty printing, one extra kref shouldn't - * cause a camel stampede!) - */ - rcu_read_lock(); - tl = rcu_dereference(rq->timeline); - if (!kref_get_unless_zero(&tl->kref)) - tl = NULL; - rcu_read_unlock(); - - return tl; -} - -static int print_ring(char *buf, int sz, struct i915_request *rq) -{ - int len = 0; - - if (!i915_request_signaled(rq)) { - struct intel_timeline *tl = get_timeline(rq); - - len = scnprintf(buf, sz, - "ring:{start:%08x, hwsp:%08x, seqno:%08x, runtime:%llums}, ", - i915_ggtt_offset(rq->ring->vma), - tl ? tl->hwsp_offset : 0, - hwsp_seqno(rq), - DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context), - 1000 * 1000)); - - if (tl) - intel_timeline_put(tl); - } - - return len; -} - static void hexdump(struct drm_printer *m, const void *buf, size_t len) { const size_t rowsize = 8 * sizeof(u32); @@ -1349,27 +1306,15 @@ static void hexdump(struct drm_printer *m, const void *buf, size_t len) } } -static const char *repr_timer(const struct timer_list *t) -{ - if (!READ_ONCE(t->expires)) - return "inactive"; - - if (timer_pending(t)) - return "active"; - - return "expired"; -} - static void intel_engine_print_registers(struct intel_engine_cs *engine, struct drm_printer *m) { - struct drm_i915_private *dev_priv = engine->i915; - struct intel_engine_execlists * const execlists = &engine->execlists; + struct drm_i915_private *i915 = engine->i915; u64 addr; - if (engine->id == RENDER_CLASS && IS_GEN_RANGE(dev_priv, 4, 7)) + if (engine->id == RENDER_CLASS && IS_GEN_RANGE(i915, 4, 7)) drm_printf(m, "\tCCID: 0x%08x\n", ENGINE_READ(engine, CCID)); - if (HAS_EXECLISTS(dev_priv)) { + if (HAS_EXECLISTS(i915)) { drm_printf(m, "\tEL_STAT_HI: 0x%08x\n", ENGINE_READ(engine, RING_EXECLIST_STATUS_HI)); drm_printf(m, "\tEL_STAT_LO: 0x%08x\n", @@ -1390,7 +1335,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, ENGINE_READ(engine, RING_MI_MODE) & (MODE_IDLE) ? " [idle]" : ""); } - if (INTEL_GEN(dev_priv) >= 6) { + if (INTEL_GEN(i915) >= 6) { drm_printf(m, "\tRING_IMR: 0x%08x\n", ENGINE_READ(engine, RING_IMR)); drm_printf(m, "\tRING_ESR: 0x%08x\n", @@ -1407,15 +1352,15 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, addr = intel_engine_get_last_batch_head(engine); drm_printf(m, "\tBBADDR: 0x%08x_%08x\n", upper_32_bits(addr), lower_32_bits(addr)); - if (INTEL_GEN(dev_priv) >= 8) + if (INTEL_GEN(i915) >= 8) addr = ENGINE_READ64(engine, RING_DMA_FADD, RING_DMA_FADD_UDW); - else if (INTEL_GEN(dev_priv) >= 4) + else if (INTEL_GEN(i915) >= 4) addr = ENGINE_READ(engine, RING_DMA_FADD); else addr = ENGINE_READ(engine, DMA_FADD_I8XX); drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n", upper_32_bits(addr), lower_32_bits(addr)); - if (INTEL_GEN(dev_priv) >= 4) { + if (INTEL_GEN(i915) >= 4) { drm_printf(m, "\tIPEIR: 0x%08x\n", ENGINE_READ(engine, RING_IPEIR)); drm_printf(m, "\tIPEHR: 0x%08x\n", @@ -1424,130 +1369,6 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, drm_printf(m, "\tIPEIR: 0x%08x\n", ENGINE_READ(engine, IPEIR)); drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR)); } - - if (intel_engine_uses_guc(engine)) { - /* nothing to print yet */ - } else if (HAS_EXECLISTS(dev_priv)) { - struct i915_sched *se = intel_engine_get_scheduler(engine); - struct i915_request * const *port, *rq; - const u32 *hws = - &engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; - const u8 num_entries = execlists->csb_size; - unsigned int idx; - u8 read, write; - - drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n", - yesno(test_bit(TASKLET_STATE_SCHED, - &se->tasklet.state)), - enableddisabled(!atomic_read(&se->tasklet.count)), - repr_timer(&engine->execlists.preempt), - repr_timer(&engine->execlists.timer)); - - read = execlists->csb_head; - write = READ_ONCE(*execlists->csb_write); - - drm_printf(m, "\tExeclist status: 0x%08x %08x; CSB read:%d, write:%d, entries:%d\n", - ENGINE_READ(engine, RING_EXECLIST_STATUS_LO), - ENGINE_READ(engine, RING_EXECLIST_STATUS_HI), - read, write, num_entries); - - if (read >= num_entries) - read = 0; - if (write >= num_entries) - write = 0; - if (read > write) - write += num_entries; - while (read < write) { - idx = ++read % num_entries; - drm_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n", - idx, hws[idx * 2], hws[idx * 2 + 1]); - } - - i915_sched_lock_bh(se); - rcu_read_lock(); - for (port = execlists->active; (rq = *port); port++) { - char hdr[160]; - int len; - - len = scnprintf(hdr, sizeof(hdr), - "\t\tActive[%d]: ccid:%08x%s%s, ", - (int)(port - execlists->active), - rq->context->lrc.ccid, - intel_context_is_closed(rq->context) ? "!" : "", - intel_context_is_banned(rq->context) ? "*" : ""); - len += print_ring(hdr + len, sizeof(hdr) - len, rq); - scnprintf(hdr + len, sizeof(hdr) - len, "rq: "); - i915_request_show(m, rq, hdr, 0); - } - for (port = execlists->pending; (rq = *port); port++) { - char hdr[160]; - int len; - - len = scnprintf(hdr, sizeof(hdr), - "\t\tPending[%d]: ccid:%08x%s%s, ", - (int)(port - execlists->pending), - rq->context->lrc.ccid, - intel_context_is_closed(rq->context) ? "!" : "", - intel_context_is_banned(rq->context) ? "*" : ""); - len += print_ring(hdr + len, sizeof(hdr) - len, rq); - scnprintf(hdr + len, sizeof(hdr) - len, "rq: "); - i915_request_show(m, rq, hdr, 0); - } - rcu_read_unlock(); - i915_sched_unlock_bh(se); - } else if (INTEL_GEN(dev_priv) > 6) { - drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n", - ENGINE_READ(engine, RING_PP_DIR_BASE)); - drm_printf(m, "\tPP_DIR_BASE_READ: 0x%08x\n", - ENGINE_READ(engine, RING_PP_DIR_BASE_READ)); - drm_printf(m, "\tPP_DIR_DCLV: 0x%08x\n", - ENGINE_READ(engine, RING_PP_DIR_DCLV)); - } -} - -static void -print_request_ring(struct drm_printer *m, const struct i915_request *rq) -{ - void *ring; - int size; - - drm_printf(m, - "[head %04x, postfix %04x, tail %04x, batch 0x%08x_%08x]:\n", - rq->head, rq->postfix, rq->tail, - rq->batch ? upper_32_bits(rq->batch->node.start) : ~0u, - rq->batch ? lower_32_bits(rq->batch->node.start) : ~0u); - - size = rq->tail - rq->head; - if (rq->tail < rq->head) - size += rq->ring->size; - - ring = kmalloc(size, GFP_ATOMIC); - if (ring) { - const void *vaddr = rq->ring->vaddr; - unsigned int head = rq->head; - unsigned int len = 0; - - if (rq->tail < head) { - len = rq->ring->size - head; - memcpy(ring, vaddr + head, len); - head = 0; - } - memcpy(ring + len, vaddr + head, size - len); - - hexdump(m, ring, size); - kfree(ring); - } -} - -static unsigned long list_count(struct list_head *list) -{ - struct list_head *pos; - unsigned long count = 0; - - list_for_each(pos, list) - count++; - - return count; } static unsigned long read_ul(void *p, size_t x) @@ -1590,10 +1411,8 @@ void intel_engine_dump(struct intel_engine_cs *engine, const char *header, ...) { struct i915_gpu_error * const error = &engine->i915->gpu_error; - struct i915_sched *se = intel_engine_get_scheduler(engine); const struct i915_request *rq; intel_wakeref_t wakeref; - unsigned long flags; ktime_t dummy; if (header) { @@ -1632,41 +1451,9 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n"); - rcu_read_lock(); - spin_lock_irqsave(&se->lock, flags); - rq = se->active_request(se); - if (rq) { - struct intel_timeline *tl = get_timeline(rq); + i915_sched_show(m, intel_engine_get_scheduler(engine), + i915_request_show, 8); - i915_request_show(m, rq, "\t\tactive ", 0); - - drm_printf(m, "\t\tring->start: 0x%08x\n", - i915_ggtt_offset(rq->ring->vma)); - drm_printf(m, "\t\tring->head: 0x%08x\n", - rq->ring->head); - drm_printf(m, "\t\tring->tail: 0x%08x\n", - rq->ring->tail); - drm_printf(m, "\t\tring->emit: 0x%08x\n", - rq->ring->emit); - drm_printf(m, "\t\tring->space: 0x%08x\n", - rq->ring->space); - - if (tl) { - drm_printf(m, "\t\tring->hwsp: 0x%08x\n", - tl->hwsp_offset); - intel_timeline_put(tl); - } - - print_request_ring(m, rq); - - if (rq->context->lrc_reg_state) { - drm_printf(m, "Logical Ring Context:\n"); - hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); - } - } - drm_printf(m, "\tOn hold?: %lu\n", list_count(&se->hold)); - spin_unlock_irqrestore(&se->lock, flags); - rcu_read_unlock(); drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base); wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm); @@ -1677,8 +1464,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tDevice is asleep; skipping register dump\n"); } - intel_execlists_show_requests(engine, m, i915_request_show, 8); - drm_printf(m, "HWSP:\n"); hexdump(m, engine->status_page.addr, PAGE_SIZE); diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 8b848adb65b7..b1007e560527 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -198,6 +198,14 @@ struct virtual_engine { struct intel_engine_cs *siblings[]; }; +static void execlists_show(struct drm_printer *m, + struct i915_sched *se, + void (*show_request)(struct drm_printer *m, + const struct i915_request *rq, + const char *prefix, + int indent), + unsigned int max); + static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) { GEM_BUG_ON(!intel_engine_is_virtual(engine)); @@ -2971,6 +2979,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) engine->sched.active_request = execlists_active_request; engine->sched.is_executing = execlists_is_executing; + engine->sched.show = execlists_show; tasklet_setup(&engine->sched.tasklet, execlists_submission_tasklet); timer_setup(&engine->execlists.timer, execlists_timeslice, 0); timer_setup(&engine->execlists.preempt, execlists_preempt, 0); @@ -3581,68 +3590,65 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, return 0; } -void intel_execlists_show_requests(struct intel_engine_cs *engine, - struct drm_printer *m, - void (*show_request)(struct drm_printer *m, - const struct i915_request *rq, - const char *prefix, - int indent), - unsigned int max) +static const char *repr_timer(const struct timer_list *t) { - const struct intel_engine_execlists *execlists = &engine->execlists; - struct i915_sched *se = intel_engine_get_scheduler(engine); + if (!READ_ONCE(t->expires)) + return "inactive"; + + if (timer_pending(t)) + return "active"; + + return "expired"; +} + +static int print_ring(char *buf, int sz, struct i915_request *rq) +{ + int len = 0; + + rcu_read_lock(); + if (!i915_request_signaled(rq)) { + struct intel_timeline *tl = rcu_dereference(rq->timeline); + + len = scnprintf(buf, sz, + "ring:{start:%08x, hwsp:%08x, seqno:%08x, runtime:%llums}, ", + i915_ggtt_offset(rq->ring->vma), + tl ? tl->hwsp_offset : 0, + hwsp_seqno(rq), + DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context), + 1000 * 1000)); + } + rcu_read_unlock(); + + return len; +} + +static void execlists_show(struct drm_printer *m, + struct i915_sched *se, + void (*show_request)(struct drm_printer *m, + const struct i915_request *rq, + const char *prefix, + int indent), + unsigned int max) +{ + const struct intel_engine_cs *engine = + container_of(se, typeof(*engine), sched); + const struct intel_engine_execlists *el = &engine->execlists; + const u64 *hws = el->csb_status; + const u8 num_entries = el->csb_size; + struct i915_request * const *port; struct i915_request *rq, *last; - unsigned long flags; + intel_wakeref_t wakeref; unsigned int count; struct rb_node *rb; + unsigned int idx; + u8 read, write; - spin_lock_irqsave(&se->lock, flags); + wakeref = intel_runtime_pm_get(engine->uncore->rpm); + rcu_read_lock(); last = NULL; count = 0; - list_for_each_entry(rq, &se->requests, sched.link) { - if (count++ < max - 1) - show_request(m, rq, "\t\t", 0); - else - last = rq; - } - if (last) { - if (count > max) { - drm_printf(m, - "\t\t...skipping %d executing requests...\n", - count - max); - } - show_request(m, last, "\t\t", 0); - } - - if (execlists->queue_priority_hint != INT_MIN) - drm_printf(m, "\t\tQueue priority hint: %d\n", - READ_ONCE(execlists->queue_priority_hint)); - - last = NULL; - count = 0; - for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { - struct i915_priolist *p = rb_entry(rb, typeof(*p), node); - - priolist_for_each_request(rq, p) { - if (count++ < max - 1) - show_request(m, rq, "\t\t", 0); - else - last = rq; - } - } - if (last) { - if (count > max) { - drm_printf(m, - "\t\t...skipping %d queued requests...\n", - count - max); - } - show_request(m, last, "\t\t", 0); - } - - last = NULL; - count = 0; - for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) { + for (rb = rb_first_cached(&el->virtual); rb; rb = rb_next(rb)) { struct virtual_engine *ve = rb_entry(rb, typeof(*ve), nodes[engine->id].rb); struct i915_request *rq = READ_ONCE(ve->request); @@ -3663,7 +3669,65 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, show_request(m, last, "\t\t", 0); } - spin_unlock_irqrestore(&se->lock, flags); + drm_printf(m, "\tExeclists preempt? %s, timeslice? %s\n", + repr_timer(&el->preempt), + repr_timer(&el->timer)); + + read = el->csb_head; + write = READ_ONCE(*el->csb_write); + + drm_printf(m, "\tExeclist status: 0x%08x %08x; CSB read:%d, write:%d, entries:%d\n", + ENGINE_READ(engine, RING_EXECLIST_STATUS_LO), + ENGINE_READ(engine, RING_EXECLIST_STATUS_HI), + read, write, num_entries); + + if (read >= num_entries) + read = 0; + if (write >= num_entries) + write = 0; + if (read > write) + write += num_entries; + while (read < write) { + idx = ++read % num_entries; + drm_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n", + idx, + lower_32_bits(hws[idx]), + upper_32_bits(hws[idx])); + } + + i915_sched_lock_bh(se); + for (port = el->active; (rq = *port); port++) { + char hdr[160]; + int len; + + len = scnprintf(hdr, sizeof(hdr), + "\t\tActive[%d]: ccid:%08x%s%s, ", + (int)(port - el->active), + rq->context->lrc.ccid, + intel_context_is_closed(rq->context) ? "!" : "", + intel_context_is_banned(rq->context) ? "*" : ""); + len += print_ring(hdr + len, sizeof(hdr) - len, rq); + scnprintf(hdr + len, sizeof(hdr) - len, "rq: "); + i915_request_show(m, rq, hdr, 0); + } + for (port = el->pending; (rq = *port); port++) { + char hdr[160]; + int len; + + len = scnprintf(hdr, sizeof(hdr), + "\t\tPending[%d]: ccid:%08x%s%s, ", + (int)(port - el->pending), + rq->context->lrc.ccid, + intel_context_is_closed(rq->context) ? "!" : "", + intel_context_is_banned(rq->context) ? "*" : ""); + len += print_ring(hdr + len, sizeof(hdr) - len, rq); + scnprintf(hdr + len, sizeof(hdr) - len, "rq: "); + i915_request_show(m, rq, hdr, 0); + } + i915_sched_unlock_bh(se); + + rcu_read_unlock(); + intel_runtime_pm_put(engine->uncore->rpm, wakeref); } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 792dd0bbea3b..459f727b03cd 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1827,6 +1827,9 @@ static char queue_status(const struct i915_request *rq) if (i915_request_is_active(rq)) return 'E'; + if (i915_request_on_hold(rq)) + return 'S'; + if (i915_request_is_ready(rq)) return intel_engine_is_virtual(rq->engine) ? 'V' : 'R'; @@ -1895,6 +1898,9 @@ void i915_request_show(struct drm_printer *m, * - a completed request may still be regarded as executing, its * status may not be updated until it is retired and removed * from the lists + * + * S [Suspended] + * - the request has been temporarily suspended from execution */ x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf)); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index cb27bcb7a1f6..af3a12d6f6d2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -1124,6 +1124,186 @@ void i915_request_show_with_schedule(struct drm_printer *m, rcu_read_unlock(); } +static unsigned long list_count(struct list_head *list) +{ + struct list_head *pos; + unsigned long count = 0; + + list_for_each(pos, list) + count++; + + return count; +} + +static void hexdump(struct drm_printer *m, const void *buf, size_t len) +{ + const size_t rowsize = 8 * sizeof(u32); + const void *prev = NULL; + bool skip = false; + size_t pos; + + for (pos = 0; pos < len; pos += rowsize) { + char line[128]; + + if (prev && !memcmp(prev, buf + pos, rowsize)) { + if (!skip) { + drm_printf(m, "*\n"); + skip = true; + } + continue; + } + + WARN_ON_ONCE(hex_dump_to_buffer(buf + pos, len - pos, + rowsize, sizeof(u32), + line, sizeof(line), + false) >= sizeof(line)); + drm_printf(m, "[%04zx] %s\n", pos, line); + + prev = buf + pos; + skip = false; + } +} + +static void +print_request_ring(struct drm_printer *m, const struct i915_request *rq) +{ + void *ring; + int size; + + drm_printf(m, + "[head %04x, postfix %04x, tail %04x, batch 0x%08x_%08x]:\n", + rq->head, rq->postfix, rq->tail, + rq->batch ? upper_32_bits(rq->batch->node.start) : ~0u, + rq->batch ? lower_32_bits(rq->batch->node.start) : ~0u); + + size = rq->tail - rq->head; + if (rq->tail < rq->head) + size += rq->ring->size; + + ring = kmalloc(size, GFP_ATOMIC); + if (ring) { + const void *vaddr = rq->ring->vaddr; + unsigned int head = rq->head; + unsigned int len = 0; + + if (rq->tail < head) { + len = rq->ring->size - head; + memcpy(ring, vaddr + head, len); + head = 0; + } + memcpy(ring + len, vaddr + head, size - len); + + hexdump(m, ring, size); + kfree(ring); + } +} + +void i915_sched_show(struct drm_printer *m, + struct i915_sched *se, + void (*show_request)(struct drm_printer *m, + const struct i915_request *rq, + const char *prefix, + int indent), + unsigned int max) +{ + const struct i915_request *rq, *last; + unsigned long flags; + unsigned int count; + struct rb_node *rb; + + rcu_read_lock(); + spin_lock_irqsave(&se->lock, flags); + + rq = se->active_request(se); + if (rq) { + i915_request_show(m, rq, "\t\tactive ", 0); + + drm_printf(m, "\t\tring->start: 0x%08x\n", + i915_ggtt_offset(rq->ring->vma)); + drm_printf(m, "\t\tring->head: 0x%08x\n", + rq->ring->head); + drm_printf(m, "\t\tring->tail: 0x%08x\n", + rq->ring->tail); + drm_printf(m, "\t\tring->emit: 0x%08x\n", + rq->ring->emit); + drm_printf(m, "\t\tring->space: 0x%08x\n", + rq->ring->space); + drm_printf(m, "\t\tring->hwsp: 0x%08x\n", + i915_request_active_timeline(rq)->hwsp_offset); + + print_request_ring(m, rq); + + if (rq->context->lrc_reg_state) { + drm_printf(m, "Logical Ring Context:\n"); + hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); + } + } + drm_printf(m, "\tOn hold?: %lu\n", list_count(&se->hold)); + + drm_printf(m, "\tTasklet queued? %s (%s)\n", + yesno(test_bit(TASKLET_STATE_SCHED, &se->tasklet.state)), + enableddisabled(!atomic_read(&se->tasklet.count))); + + last = NULL; + count = 0; + list_for_each_entry(rq, &se->requests, sched.link) { + if (count++ < max - 1) + show_request(m, rq, "\t\t", 0); + else + last = rq; + } + if (last) { + if (count > max) { + drm_printf(m, + "\t\t...skipping %d executing requests...\n", + count - max); + } + show_request(m, last, "\t\t", 0); + } + + last = NULL; + count = 0; + for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { + struct i915_priolist *p = rb_entry(rb, typeof(*p), node); + + priolist_for_each_request(rq, p) { + if (count++ < max - 1) + show_request(m, rq, "\t\t", 0); + else + last = rq; + } + } + if (last) { + if (count > max) { + drm_printf(m, + "\t\t...skipping %d queued requests...\n", + count - max); + } + show_request(m, last, "\t\t", 0); + } + + list_for_each_entry(rq, &se->hold, sched.link) { + if (count++ < max - 1) + show_request(m, rq, "\t\t", 0); + else + last = rq; + } + if (last) { + if (count > max) { + drm_printf(m, + "\t\t...skipping %d suspended requests...\n", + count - max); + } + show_request(m, last, "\t\t", 0); + } + + spin_unlock_irqrestore(&se->lock, flags); + rcu_read_unlock(); + + if (se->show) + se->show(m, se, show_request, max); +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_scheduler.c" #endif diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index e2e8b90adb66..51bca23a5617 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -129,4 +129,12 @@ void i915_request_show_with_schedule(struct drm_printer *m, const char *prefix, int indent); +void i915_sched_show(struct drm_printer *m, + struct i915_sched *se, + void (*show_request)(struct drm_printer *m, + const struct i915_request *rq, + const char *prefix, + int indent), + unsigned int max); + #endif /* _I915_SCHEDULER_H_ */ diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 9a9b8e0d78ae..685280d61581 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -13,6 +13,7 @@ #include "i915_priolist_types.h" +struct drm_printer; struct i915_request; /** @@ -41,6 +42,14 @@ struct i915_sched { bool (*is_executing)(const struct i915_request *rq); + void (*show)(struct drm_printer *m, + struct i915_sched *se, + void (*show_request)(struct drm_printer *m, + const struct i915_request *rq, + const char *prefix, + int indent), + unsigned int max); + struct list_head requests; /* active request, on HW */ struct list_head hold; /* ready requests, but on hold */ /** From patchwork Mon Feb 1 08:56:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14D24C4332E for ; Mon, 1 Feb 2021 08:58:18 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AEB7664E3F for ; Mon, 1 Feb 2021 08:58:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AEB7664E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7A92F6E52D; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2FDA66E4B7 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757741-1500050 for multiple; Mon, 01 Feb 2021 08:57:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:46 +0000 Message-Id: <20210201085715.27435-28-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 28/57] drm/i915: Wrap i915_request_use_semaphores() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Wrap the query on whether the backend engine supports us emitting semaphores to coordinate multiple requests. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 2 +- drivers/gpu/drm/i915/i915_request.h | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 459f727b03cd..e7b4c4bc41a6 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1141,7 +1141,7 @@ __i915_request_await_execution(struct i915_request *to, * immediate execution, and so we must wait until it reaches the * active slot. */ - if (intel_engine_has_semaphores(to->engine) && + if (i915_request_use_semaphores(to) && !i915_request_has_initial_breadcrumb(to)) { err = __emit_semaphore_wait(to, from, from->fence.seqno - 1); if (err < 0) diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 8322f308b906..8d9e59e3cdcb 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -637,4 +637,9 @@ static inline bool i915_request_is_executing(const struct i915_request *rq) return i915_request_get_scheduler(rq)->is_executing(rq); } +static inline bool i915_request_use_semaphores(const struct i915_request *rq) +{ + return intel_engine_has_semaphores(rq->engine); +} + #endif /* I915_REQUEST_H */ From patchwork Mon Feb 1 08:56:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFB89C433E9 for ; Mon, 1 Feb 2021 08:58:21 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 64C6364E33 for ; Mon, 1 Feb 2021 08:58:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64C6364E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3FF706E528; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 97E646E508 for ; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757742-1500050 for multiple; Mon, 01 Feb 2021 08:57:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:47 +0000 Message-Id: <20210201085715.27435-29-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 29/57] drm/i915: Move scheduler flags X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Start extracting the scheduling flags from the engine. We begin with its own existence. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 21 +++++++------------ .../drm/i915/gt/intel_execlists_submission.c | 6 +++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_request.h | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 2 +- drivers/gpu/drm/i915/i915_scheduler_types.h | 10 +++++++++ 7 files changed, 31 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index c530839627bb..4f0163457aed 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -261,6 +261,12 @@ intel_engine_has_heartbeat(const struct intel_engine_cs *engine) return READ_ONCE(engine->props.heartbeat_interval_ms); } +static inline bool +intel_engine_has_scheduler(struct intel_engine_cs *engine) +{ + return i915_sched_is_active(intel_engine_get_scheduler(engine)); +} + static inline void intel_engine_kick_scheduler(struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 6b0bde292916..a3024a0de1de 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -440,14 +440,13 @@ struct intel_engine_cs { #define I915_ENGINE_USING_CMD_PARSER BIT(0) #define I915_ENGINE_SUPPORTS_STATS BIT(1) -#define I915_ENGINE_HAS_SCHEDULER BIT(2) -#define I915_ENGINE_HAS_PREEMPTION BIT(3) -#define I915_ENGINE_HAS_SEMAPHORES BIT(4) -#define I915_ENGINE_HAS_TIMESLICES BIT(5) -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(6) -#define I915_ENGINE_IS_VIRTUAL BIT(7) -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(8) -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(9) +#define I915_ENGINE_HAS_PREEMPTION BIT(2) +#define I915_ENGINE_HAS_SEMAPHORES BIT(3) +#define I915_ENGINE_HAS_TIMESLICES BIT(4) +#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5) +#define I915_ENGINE_IS_VIRTUAL BIT(6) +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7) +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8) unsigned int flags; /* @@ -530,12 +529,6 @@ intel_engine_supports_stats(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_SUPPORTS_STATS; } -static inline bool -intel_engine_has_scheduler(const struct intel_engine_cs *engine) -{ - return engine->flags & I915_ENGINE_HAS_SCHEDULER; -} - static inline bool intel_engine_has_preemption(const struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index b1007e560527..3217cb4369ad 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2913,7 +2913,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) */ } - engine->flags |= I915_ENGINE_HAS_SCHEDULER; engine->flags |= I915_ENGINE_SUPPORTS_STATS; if (!intel_vgpu_active(engine->i915)) { engine->flags |= I915_ENGINE_HAS_SEMAPHORES; @@ -2981,6 +2980,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) engine->sched.is_executing = execlists_is_executing; engine->sched.show = execlists_show; tasklet_setup(&engine->sched.tasklet, execlists_submission_tasklet); + __set_bit(I915_SCHED_ACTIVE_BIT, &engine->sched.flags); timer_setup(&engine->execlists.timer, execlists_timeslice, 0); timer_setup(&engine->execlists.preempt, execlists_preempt, 0); @@ -3386,6 +3386,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) { struct virtual_engine *ve; + unsigned long sched; unsigned int n; int err; @@ -3444,6 +3445,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, goto err_put; } + sched = ~0U; for (n = 0; n < count; n++) { struct intel_engine_cs *sibling = siblings[n]; @@ -3473,6 +3475,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->siblings[ve->num_siblings++] = sibling; ve->base.mask |= sibling->mask; + sched &= sibling->sched.flags; /* * All physical engines must be compatible for their emission @@ -3514,6 +3517,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.name, ve->base.mask, ENGINE_VIRTUAL); + ve->base.sched.flags = sched; ve->base.sched.submit_request = virtual_submit_request; tasklet_setup(&ve->base.sched.tasklet, virtual_submission_tasklet); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index db6ac5a12834..887f38fb671f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -606,7 +606,6 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) } engine->set_default_submission = guc_set_default_submission; - engine->flags |= I915_ENGINE_HAS_SCHEDULER; engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; engine->flags |= I915_ENGINE_HAS_PREEMPTION; @@ -656,6 +655,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) GEM_BUG_ON(INTEL_GEN(i915) < 11); tasklet_setup(&engine->sched.tasklet, guc_submission_tasklet); + __set_bit(I915_SCHED_ACTIVE_BIT, &engine->sched.flags); guc_default_vfuncs(engine); guc_default_irqs(engine); diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 8d9e59e3cdcb..8eea25cb043e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -626,7 +626,7 @@ i915_request_active_timeline(const struct i915_request *rq) static inline bool i915_request_use_scheduler(const struct i915_request *rq) { - return intel_engine_has_scheduler(rq->engine); + return i915_sched_is_active(i915_request_get_scheduler(rq)); } static inline bool i915_request_is_executing(const struct i915_request *rq) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index af3a12d6f6d2..48336434bff3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -550,7 +550,7 @@ void i915_request_set_priority(struct i915_request *rq, int prio) if (__i915_request_is_complete(rq)) goto unlock; - if (!intel_engine_has_scheduler(engine)) { + if (!i915_sched_is_active(&engine->sched)) { rq->sched.attr.priority = prio; goto unlock; } diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 685280d61581..cb1eddb7edc8 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -16,6 +16,10 @@ struct drm_printer; struct i915_request; +enum { + I915_SCHED_ACTIVE_BIT = 0, +}; + /** * struct i915_sched - funnels requests towards hardware * @@ -27,6 +31,7 @@ struct i915_request; struct i915_sched { spinlock_t lock; /* protects the scheduling lists and queue */ + unsigned long flags; unsigned long mask; /* available scheduling channels */ /* @@ -174,4 +179,9 @@ struct i915_dependency { &(rq__)->sched.signalers_list, \ signal_link) +static inline bool i915_sched_is_active(const struct i915_sched *se) +{ + return test_bit(I915_SCHED_ACTIVE_BIT, &se->flags); +} + #endif /* _I915_SCHEDULER_TYPES_H_ */ From patchwork Mon Feb 1 08:56:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4ED2C43381 for ; Mon, 1 Feb 2021 08:58:15 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8B6CB64E3F for ; Mon, 1 Feb 2021 08:58:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B6CB64E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 540B96E508; Mon, 1 Feb 2021 08:57:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 281986E4B5 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757743-1500050 for multiple; Mon, 01 Feb 2021 08:57:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:48 +0000 Message-Id: <20210201085715.27435-30-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 30/57] drm/i915: Move timeslicing flag to scheduler X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Whether a scheduler chooses to implement timeslicing is up to it, and not an underlying property of the HW engine. The scheduler does depend on the HW supporting preemption. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 18 ++++-------------- .../drm/i915/gt/intel_execlists_submission.c | 9 ++++++--- drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 +- drivers/gpu/drm/i915/i915_scheduler_types.h | 10 ++++++++++ 5 files changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 4f0163457aed..ca3a9cb06328 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -279,4 +279,10 @@ intel_engine_flush_scheduler(struct intel_engine_cs *engine) i915_sched_flush(intel_engine_get_scheduler(engine)); } +static inline bool +intel_engine_has_timeslices(struct intel_engine_cs *engine) +{ + return i915_sched_has_timeslices(intel_engine_get_scheduler(engine)); +} + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index a3024a0de1de..96a0aec29672 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -442,11 +442,10 @@ struct intel_engine_cs { #define I915_ENGINE_SUPPORTS_STATS BIT(1) #define I915_ENGINE_HAS_PREEMPTION BIT(2) #define I915_ENGINE_HAS_SEMAPHORES BIT(3) -#define I915_ENGINE_HAS_TIMESLICES BIT(4) -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5) -#define I915_ENGINE_IS_VIRTUAL BIT(6) -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7) -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8) +#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4) +#define I915_ENGINE_IS_VIRTUAL BIT(5) +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) unsigned int flags; /* @@ -541,15 +540,6 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_HAS_SEMAPHORES; } -static inline bool -intel_engine_has_timeslices(const struct intel_engine_cs *engine) -{ - if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) - return false; - - return engine->flags & I915_ENGINE_HAS_TIMESLICES; -} - static inline bool intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 3217cb4369ad..d4b6d262265a 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1023,7 +1023,7 @@ static bool needs_timeslice(const struct intel_engine_cs *engine, { const struct i915_sched *se = &engine->sched; - if (!intel_engine_has_timeslices(engine)) + if (!i915_sched_has_timeslices(se)) return false; /* If not currently active, or about to switch, wait for next event */ @@ -2918,8 +2918,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->flags |= I915_ENGINE_HAS_SEMAPHORES; if (can_preempt(engine)) { engine->flags |= I915_ENGINE_HAS_PREEMPTION; - if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) - engine->flags |= I915_ENGINE_HAS_TIMESLICES; } } @@ -2927,6 +2925,11 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->emit_bb_start = gen8_emit_bb_start; else engine->emit_bb_start = gen8_emit_bb_start_noarb; + + if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION) && + intel_engine_has_preemption(engine)) + __set_bit(I915_SCHED_HAS_TIMESLICES_BIT, + &engine->sched.flags); } static void logical_ring_default_irqs(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index cfc0f4b9fbc5..147cbfd6dec0 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -3825,7 +3825,7 @@ static unsigned int __select_siblings(struct intel_gt *gt, unsigned int class, struct intel_engine_cs **siblings, - bool (*filter)(const struct intel_engine_cs *)) + bool (*filter)(struct intel_engine_cs *)) { unsigned int n = 0; unsigned int inst; diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index cb1eddb7edc8..dfb29b8c2bee 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -12,12 +12,14 @@ #include #include "i915_priolist_types.h" +#include "i915_utils.h" struct drm_printer; struct i915_request; enum { I915_SCHED_ACTIVE_BIT = 0, + I915_SCHED_HAS_TIMESLICES_BIT, }; /** @@ -184,4 +186,12 @@ static inline bool i915_sched_is_active(const struct i915_sched *se) return test_bit(I915_SCHED_ACTIVE_BIT, &se->flags); } +static inline bool i915_sched_has_timeslices(const struct i915_sched *se) +{ + if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) + return false; + + return test_bit(I915_SCHED_HAS_TIMESLICES_BIT, &se->flags); +} + #endif /* _I915_SCHEDULER_TYPES_H_ */ From patchwork Mon Feb 1 08:56:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82D19C433E6 for ; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DFFB064E33 for ; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DFFB064E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5E2776E4AA; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 74F936E48C for ; Mon, 1 Feb 2021 08:57:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757744-1500050 for multiple; Mon, 01 Feb 2021 08:57:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:49 +0000 Message-Id: <20210201085715.27435-31-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 31/57] drm/i915/gt: Declare when we enabled timeslicing X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Let userspace know if they can trust timeslicing by including it as part of the I915_PARAM_HAS_SCHEDULER::I915_SCHEDULER_CAP_TIMESLICING v2: Only declare timeslicing if we can safely preempt userspace. Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_user.c | 26 +++++++++++++++------ include/uapi/drm/i915_drm.h | 1 + 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 64eccdf32a22..50911fbe6368 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -90,13 +90,17 @@ static void sort_engines(struct drm_i915_private *i915, static void set_scheduler_caps(struct drm_i915_private *i915) { static const struct { - u8 engine; - u8 sched; - } map[] = { + u8 flag; + u8 cap; + } engine_map[] = { #define MAP(x, y) { ilog2(I915_ENGINE_##x), ilog2(I915_SCHEDULER_CAP_##y) } MAP(HAS_PREEMPTION, PREEMPTION), MAP(HAS_SEMAPHORES, SEMAPHORES), MAP(SUPPORTS_STATS, ENGINE_BUSY_STATS), +#undef MAP + }, sched_map[] = { +#define MAP(x, y) { ilog2(I915_SCHED_##x), ilog2(I915_SCHEDULER_CAP_##y) } + MAP(HAS_TIMESLICES_BIT, TIMESLICING), #undef MAP }; struct intel_engine_cs *engine; @@ -105,6 +109,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915) enabled = 0; disabled = 0; for_each_uabi_engine(engine, i915) { /* all engines must agree! */ + struct i915_sched *se = intel_engine_get_scheduler(engine); int i; if (intel_engine_has_scheduler(engine)) @@ -114,11 +119,18 @@ static void set_scheduler_caps(struct drm_i915_private *i915) disabled |= (I915_SCHEDULER_CAP_ENABLED | I915_SCHEDULER_CAP_PRIORITY); - for (i = 0; i < ARRAY_SIZE(map); i++) { - if (engine->flags & BIT(map[i].engine)) - enabled |= BIT(map[i].sched); + for (i = 0; i < ARRAY_SIZE(engine_map); i++) { + if (engine->flags & BIT(engine_map[i].flag)) + enabled |= BIT(engine_map[i].cap); else - disabled |= BIT(map[i].sched); + disabled |= BIT(engine_map[i].cap); + } + + for (i = 0; i < ARRAY_SIZE(sched_map); i++) { + if (se->flags & BIT(sched_map[i].flag)) + enabled |= BIT(sched_map[i].cap); + else + disabled |= BIT(sched_map[i].cap); } } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 1987e2ea79a3..cda0f391d965 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -524,6 +524,7 @@ typedef struct drm_i915_irq_wait { #define I915_SCHEDULER_CAP_PREEMPTION (1ul << 2) #define I915_SCHEDULER_CAP_SEMAPHORES (1ul << 3) #define I915_SCHEDULER_CAP_ENGINE_BUSY_STATS (1ul << 4) +#define I915_SCHEDULER_CAP_TIMESLICING (1ul << 5) #define I915_PARAM_HUC_STATUS 42 From patchwork Mon Feb 1 08:56:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F18EC43381 for ; Mon, 1 Feb 2021 08:58:04 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD95C64E3F for ; Mon, 1 Feb 2021 08:58:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD95C64E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8FC126E511; Mon, 1 Feb 2021 08:57:38 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 25A8F6E48B for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757747-1500050 for multiple; Mon, 01 Feb 2021 08:57:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:50 +0000 Message-Id: <20210201085715.27435-32-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 32/57] drm/i915: Move needs-breadcrumb flags to scheduler X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Whether the scheduler depends on interrupt delivery for forward progress is a property of the scheduler backend not of the underlying engine, so move the flag from inside the engine to i915_sched_engine. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 13 +++---------- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_scheduler_types.h | 7 +++++++ 4 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index ca3a9cb06328..db5419ba1dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -285,4 +285,10 @@ intel_engine_has_timeslices(struct intel_engine_cs *engine) return i915_sched_has_timeslices(intel_engine_get_scheduler(engine)); } +static inline bool +intel_engine_needs_breadcrumb_tasklet(struct intel_engine_cs *engine) +{ + return i915_sched_needs_breadcrumb_tasklet(intel_engine_get_scheduler(engine)); +} + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 96a0aec29672..f856bd9b7dae 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -442,10 +442,9 @@ struct intel_engine_cs { #define I915_ENGINE_SUPPORTS_STATS BIT(1) #define I915_ENGINE_HAS_PREEMPTION BIT(2) #define I915_ENGINE_HAS_SEMAPHORES BIT(3) -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4) -#define I915_ENGINE_IS_VIRTUAL BIT(5) -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) +#define I915_ENGINE_IS_VIRTUAL BIT(4) +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(5) +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(6) unsigned int flags; /* @@ -540,12 +539,6 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_HAS_SEMAPHORES; } -static inline bool -intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine) -{ - return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; -} - static inline bool intel_engine_is_virtual(const struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 887f38fb671f..e8c66d868c59 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -606,7 +606,6 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) } engine->set_default_submission = guc_set_default_submission; - engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; engine->flags |= I915_ENGINE_HAS_PREEMPTION; /* @@ -656,6 +655,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) tasklet_setup(&engine->sched.tasklet, guc_submission_tasklet); __set_bit(I915_SCHED_ACTIVE_BIT, &engine->sched.flags); + __set_bit(I915_SCHED_NEEDS_BREADCRUMB_BIT, &engine->sched.flags); guc_default_vfuncs(engine); guc_default_irqs(engine); diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index dfb29b8c2bee..b4a0e4e26bfd 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -20,6 +20,7 @@ struct i915_request; enum { I915_SCHED_ACTIVE_BIT = 0, I915_SCHED_HAS_TIMESLICES_BIT, + I915_SCHED_NEEDS_BREADCRUMB_BIT, }; /** @@ -194,4 +195,10 @@ static inline bool i915_sched_has_timeslices(const struct i915_sched *se) return test_bit(I915_SCHED_HAS_TIMESLICES_BIT, &se->flags); } +static inline bool +i915_sched_needs_breadcrumb_tasklet(const struct i915_sched *se) +{ + return test_bit(I915_SCHED_NEEDS_BREADCRUMB_BIT, &se->flags); +} + #endif /* _I915_SCHEDULER_TYPES_H_ */ From patchwork Mon Feb 1 08:56:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79C88C433DB for ; Mon, 1 Feb 2021 08:58:06 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 10F6764E33 for ; Mon, 1 Feb 2021 08:58:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 10F6764E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D52976E4AD; Mon, 1 Feb 2021 08:57:41 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 21B5F6E45D for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757748-1500050 for multiple; Mon, 01 Feb 2021 08:57:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:51 +0000 Message-Id: <20210201085715.27435-33-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 33/57] drm/i915: Move busywaiting control to the scheduler X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Busy-waiting is used for preempt-to-busy by schedulers, if they so choose. Since it is not a property of the engine, but that of the submission backend, move the flag from out of the engine to i915_sched_engine. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 4 ++-- .../drm/i915/gt/intel_execlists_submission.c | 6 +++++- drivers/gpu/drm/i915/gt/selftest_lrc.c | 19 +++++++++++++------ drivers/gpu/drm/i915/i915_request.h | 5 +++++ drivers/gpu/drm/i915/i915_scheduler_types.h | 6 ++++++ 5 files changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index cac80af7ad1c..8791e03ebe61 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -507,7 +507,7 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT; *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (i915_request_use_busywait(rq)) cs = emit_preempt_busywait(rq, cs); rq->tail = intel_ring_offset(rq, cs); @@ -599,7 +599,7 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT; *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (i915_request_use_busywait(rq)) cs = gen12_emit_preempt_busywait(rq, cs); rq->tail = intel_ring_offset(rq, cs); diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index d4b6d262265a..9245499d2082 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -304,7 +304,7 @@ static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_sched *se = &engine->sched; int last_prio; - if (!intel_engine_has_semaphores(engine)) + if (!i915_sched_use_busywait(se)) return false; /* @@ -2930,6 +2930,10 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) intel_engine_has_preemption(engine)) __set_bit(I915_SCHED_HAS_TIMESLICES_BIT, &engine->sched.flags); + + if (intel_engine_has_preemption(engine)) + __set_bit(I915_SCHED_USE_BUSYWAIT_BIT, + &engine->sched.flags); } static void logical_ring_default_irqs(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 279091e41b41..6d73add47109 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -679,9 +679,11 @@ static int live_lrc_gpr(void *arg) if (err) goto err; - err = __live_lrc_gpr(engine, scratch, true); - if (err) - goto err; + if (intel_engine_has_preemption(engine)) { + err = __live_lrc_gpr(engine, scratch, true); + if (err) + goto err; + } err: st_engine_heartbeat_enable(engine); @@ -859,9 +861,11 @@ static int live_lrc_timestamp(void *arg) if (err) break; - err = __lrc_timestamp(&data, true); - if (err) - break; + if (intel_engine_has_preemption(data.engine)) { + err = __lrc_timestamp(&data, true); + if (err) + break; + } } err: @@ -1508,6 +1512,9 @@ static int live_lrc_isolation(void *arg) skip_isolation(engine)) continue; + if (!intel_engine_has_preemption(engine)) + continue; + intel_engine_pm_get(engine); for (i = 0; i < ARRAY_SIZE(poison); i++) { int result; diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 8eea25cb043e..7c29d33e7d51 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -642,4 +642,9 @@ static inline bool i915_request_use_semaphores(const struct i915_request *rq) return intel_engine_has_semaphores(rq->engine); } +static inline bool i915_request_use_busywait(const struct i915_request *rq) +{ + return i915_sched_use_busywait(i915_request_get_scheduler(rq)); +} + #endif /* I915_REQUEST_H */ diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index b4a0e4e26bfd..37475024c0de 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -21,6 +21,7 @@ enum { I915_SCHED_ACTIVE_BIT = 0, I915_SCHED_HAS_TIMESLICES_BIT, I915_SCHED_NEEDS_BREADCRUMB_BIT, + I915_SCHED_USE_BUSYWAIT_BIT, }; /** @@ -201,4 +202,9 @@ i915_sched_needs_breadcrumb_tasklet(const struct i915_sched *se) return test_bit(I915_SCHED_NEEDS_BREADCRUMB_BIT, &se->flags); } +static inline bool i915_sched_use_busywait(const struct i915_sched *se) +{ + return test_bit(I915_SCHED_USE_BUSYWAIT_BIT, &se->flags); +} + #endif /* _I915_SCHEDULER_TYPES_H_ */ From patchwork Mon Feb 1 08:56:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 313C8C433E6 for ; Mon, 1 Feb 2021 08:58:00 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BB97C64E33 for ; Mon, 1 Feb 2021 08:57:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB97C64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 766A36E4FB; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id A9C256E4B1 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757749-1500050 for multiple; Mon, 01 Feb 2021 08:57:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:52 +0000 Message-Id: <20210201085715.27435-34-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 34/57] drm/i915: Move preempt-reset flag to the scheduler X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" While the HW may support preemption, whether or not the scheduler enforces preemption by forcibly resetting the current context is ultimately up to the scheduler. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h | 7 ++----- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 5 ++++- drivers/gpu/drm/i915/i915_scheduler_types.h | 9 +++++++++ 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index db5419ba1dc8..33a29623571d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -244,12 +244,9 @@ static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine) } static inline bool -intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) +intel_engine_has_preempt_reset(struct intel_engine_cs *engine) { - if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT)) - return false; - - return intel_engine_has_preemption(engine); + return i915_sched_has_preempt_reset(intel_engine_get_scheduler(engine)); } static inline bool diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 9245499d2082..7ec33bd73d95 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2931,9 +2931,12 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) __set_bit(I915_SCHED_HAS_TIMESLICES_BIT, &engine->sched.flags); - if (intel_engine_has_preemption(engine)) + if (intel_engine_has_preemption(engine)) { __set_bit(I915_SCHED_USE_BUSYWAIT_BIT, &engine->sched.flags); + __set_bit(I915_SCHED_HAS_PREEMPT_RESET_BIT, + &engine->sched.flags); + } } static void logical_ring_default_irqs(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 37475024c0de..7271a0259a56 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -20,6 +20,7 @@ struct i915_request; enum { I915_SCHED_ACTIVE_BIT = 0, I915_SCHED_HAS_TIMESLICES_BIT, + I915_SCHED_HAS_PREEMPT_RESET_BIT, I915_SCHED_NEEDS_BREADCRUMB_BIT, I915_SCHED_USE_BUSYWAIT_BIT, }; @@ -207,4 +208,12 @@ static inline bool i915_sched_use_busywait(const struct i915_sched *se) return test_bit(I915_SCHED_USE_BUSYWAIT_BIT, &se->flags); } +static inline bool i915_sched_has_preempt_reset(const struct i915_sched *se) +{ + if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT)) + return false; + + return test_bit(I915_SCHED_HAS_PREEMPT_RESET_BIT, &se->flags); +} + #endif /* _I915_SCHEDULER_TYPES_H_ */ From patchwork Mon Feb 1 08:56:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C34EC433E9 for ; Mon, 1 Feb 2021 08:57:51 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 82F2664E41 for ; Mon, 1 Feb 2021 08:57:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82F2664E41 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A0566E4F9; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 741AA6E4B0 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757750-1500050 for multiple; Mon, 01 Feb 2021 08:57:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:53 +0000 Message-Id: <20210201085715.27435-35-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 35/57] drm/i915: Replace priolist rbtree with a skiplist X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Replace the priolist rbtree with a skiplist. The crucial difference is that walking and removing the first element of a skiplist is O(1), but O(lgN) for an rbtree, as we need to rebalance on remove. This is a hindrance for submission latency as it occurs between picking a request for the priolist and submitting it to hardware, as well effectively tripling the number of O(lgN) operations required under the irqoff lock. This is critical to reducing the latency jitter with multiple clients. The downsides to skiplists are that lookup/insertion is only probabilistically O(lgN) and there is a significant memory penalty to as each skip node is larger than the rbtree equivalent. Furthermore, we don't use dynamic arrays for the skiplist, so the allocation is fixed, and imposes an upper bound on the scalability wrt to the number of inflight requests. In the following patches, we introduce a new sort key to the scheduler, a virtual deadline. This imposes a different structure to the tree. Using a priority sort, we have very few priority levels active at any time, most likely just the default priority and so the rbtree degenerates to a single elements containing the list of all ready requests. The deadlines in contrast are very sparse, and typically each request has a unique deadline. Instead of being able to simply walk the list during dequeue, with the deadline scheduler we have to iterate through the bst on the critical submission path. Skiplists are vastly superior in this instance due to the O(1) iteration during dequeue, with very similar characteristics [on average] to the rbtree for insertion. This means that by using skiplists we can introduce a sparse sort key without degrading latency on the critical submission path. As an example, one simple case where we try to do lots of semi-independent work without any priority management (gem_exec_parallel), the lock hold times were: [worst] [total] [avg] 973.05 6301584.84 0.35 # plain rbtree 559.82 5424915.25 0.33 # best rbtree with pruning 208.21 3898784.09 0.24 # skiplist 34.05 5784106.01 0.32 # rbtree without deadlines 23.35 4152999.80 0.24 # skiplist without deadlines Based on the skiplist implementation by Dr Con Kolivas for MuQSS. References: https://en.wikipedia.org/wiki/Skip_list Signed-off-by: Chris Wilson --- .../drm/i915/gt/intel_execlists_submission.c | 52 ++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 31 +- drivers/gpu/drm/i915/i915_priolist_types.h | 64 +++- drivers/gpu/drm/i915/i915_scheduler.c | 288 ++++++++++++++---- drivers/gpu/drm/i915/i915_scheduler.h | 11 +- drivers/gpu/drm/i915/i915_scheduler_types.h | 2 +- .../drm/i915/selftests/i915_mock_selftests.h | 1 + .../gpu/drm/i915/selftests/i915_scheduler.c | 53 +++- 8 files changed, 383 insertions(+), 119 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 7ec33bd73d95..1a33c33c96c4 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -252,11 +252,6 @@ static void ring_set_paused(const struct intel_engine_cs *engine, int state) wmb(); } -static struct i915_priolist *to_priolist(struct rb_node *rb) -{ - return rb_entry(rb, struct i915_priolist, node); -} - static int rq_prio(const struct i915_request *rq) { return READ_ONCE(rq->sched.attr.priority); @@ -280,15 +275,27 @@ static int effective_prio(const struct i915_request *rq) return prio; } +static struct i915_request *first_request(const struct i915_sched *se) +{ + struct i915_priolist *pl = se->queue.sentinel.next[0]; + + if (pl == &se->queue.sentinel) + return NULL; + + return list_first_entry_or_null(&pl->requests, + struct i915_request, + sched.link); +} + static int queue_prio(const struct i915_sched *se) { - struct rb_node *rb; + struct i915_request *rq; - rb = rb_first_cached(&se->queue); - if (!rb) + rq = first_request(se); + if (!rq) return INT_MIN; - return to_priolist(rb)->priority; + return rq_prio(rq); } static int virtual_prio(const struct intel_engine_execlists *el) @@ -298,7 +305,7 @@ static int virtual_prio(const struct intel_engine_execlists *el) return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; } -static bool need_preempt(const struct intel_engine_cs *engine, +static bool need_preempt(struct intel_engine_cs *engine, const struct i915_request *rq) { const struct i915_sched *se = &engine->sched; @@ -1143,6 +1150,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct i915_request ** const last_port = port + execlists->port_mask; struct i915_request *last, * const *active; struct virtual_engine *ve; + struct i915_priolist *pl; struct rb_node *rb; bool submit = false; @@ -1353,11 +1361,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine) break; } - while ((rb = rb_first_cached(&se->queue))) { - struct i915_priolist *p = to_priolist(rb); + for_each_priolist(pl, &se->queue) { struct i915_request *rq, *rn; - priolist_for_each_request_consume(rq, rn, p) { + priolist_for_each_request_safe(rq, rn, pl) { bool merge = true; /* @@ -1432,8 +1439,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } } - rb_erase_cached(&p->node, &se->queue); - i915_priolist_free(p); + i915_priolist_advance(&se->queue, pl); } done: *port++ = i915_request_get(last); @@ -1454,7 +1460,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * request triggering preemption on the next dequeue (or subsequent * interrupt for secondary ports). */ - execlists->queue_priority_hint = queue_prio(se); + execlists->queue_priority_hint = pl->priority; spin_unlock(&se->lock); /* @@ -2645,6 +2651,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn; + struct i915_priolist *pl; struct rb_node *rb; unsigned long flags; @@ -2675,18 +2682,14 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) intel_engine_signal_breadcrumbs(engine); /* Flush the queued requests to the timeline list (for retiring). */ - while ((rb = rb_first_cached(&se->queue))) { - struct i915_priolist *p = to_priolist(rb); - - priolist_for_each_request_consume(rq, rn, p) { + for_each_priolist(pl, &se->queue) { + priolist_for_each_request_safe(rq, rn, pl) { if (i915_request_mark_eio(rq)) { __i915_request_submit(rq); i915_request_put(rq); } } - - rb_erase_cached(&p->node, &se->queue); - i915_priolist_free(p); + i915_priolist_advance(&se->queue, pl); } GEM_BUG_ON(!i915_sched_is_idle(se)); @@ -2720,7 +2723,6 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) /* Remaining _unready_ requests will be nop'ed when submitted */ execlists->queue_priority_hint = INT_MIN; - se->queue = RB_ROOT_CACHED; GEM_BUG_ON(__tasklet_is_enabled(&se->tasklet)); se->tasklet.callback = nop_submission_tasklet; @@ -3191,6 +3193,8 @@ static void virtual_context_exit(struct intel_context *ce) for (n = 0; n < ve->num_siblings; n++) intel_engine_pm_put(ve->siblings[n]); + + i915_sched_park(intel_engine_get_scheduler(&ve->base)); } static const struct intel_context_ops virtual_context_ops = { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index e8c66d868c59..75e25d419264 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -59,11 +59,6 @@ #define GUC_REQUEST_SIZE 64 /* bytes */ -static inline struct i915_priolist *to_priolist(struct rb_node *rb) -{ - return rb_entry(rb, struct i915_priolist, node); -} - static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id) { struct guc_stage_desc *base = guc->stage_desc_pool_vaddr; @@ -186,8 +181,8 @@ static void __guc_dequeue(struct intel_engine_cs *engine) struct i915_request ** const last_port = first + execlists->port_mask; struct i915_request *last = first[0]; struct i915_request **port; + struct i915_priolist *pl; bool submit = false; - struct rb_node *rb; lockdep_assert_held(&se->lock); @@ -204,11 +199,10 @@ static void __guc_dequeue(struct intel_engine_cs *engine) * event. */ port = first; - while ((rb = rb_first_cached(&se->queue))) { - struct i915_priolist *p = to_priolist(rb); + for_each_priolist(pl, &se->queue) { struct i915_request *rq, *rn; - priolist_for_each_request_consume(rq, rn, p) { + priolist_for_each_request_safe(rq, rn, pl) { if (last && rq->context != last->context) { if (port == last_port) goto done; @@ -224,12 +218,10 @@ static void __guc_dequeue(struct intel_engine_cs *engine) last = rq; } - rb_erase_cached(&p->node, &se->queue); - i915_priolist_free(p); + i915_priolist_advance(&se->queue, pl); } done: - execlists->queue_priority_hint = - rb ? to_priolist(rb)->priority : INT_MIN; + execlists->queue_priority_hint = pl->priority; if (submit) { *port = schedule_in(last, port - execlists->inflight); *++port = NULL; @@ -330,7 +322,7 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn; - struct rb_node *rb; + struct i915_priolist *p; unsigned long flags; ENGINE_TRACE(engine, "\n"); @@ -358,25 +350,20 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) } /* Flush the queued requests to the timeline list (for retiring). */ - while ((rb = rb_first_cached(&se->queue))) { - struct i915_priolist *p = to_priolist(rb); - - priolist_for_each_request_consume(rq, rn, p) { + for_each_priolist(p, &se->queue) { + priolist_for_each_request_safe(rq, rn, p) { list_del_init(&rq->sched.link); __i915_request_submit(rq); dma_fence_set_error(&rq->fence, -EIO); i915_request_mark_complete(rq); } - - rb_erase_cached(&p->node, &se->queue); - i915_priolist_free(p); + i915_priolist_advance(&se->queue, p); } GEM_BUG_ON(!i915_sched_is_idle(se)); /* Remaining _unready_ requests will be nop'ed when submitted */ execlists->queue_priority_hint = INT_MIN; - se->queue = RB_ROOT_CACHED; spin_unlock_irqrestore(&se->lock, flags); } diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h index bc2fa84f98a8..ee7482b9c813 100644 --- a/drivers/gpu/drm/i915/i915_priolist_types.h +++ b/drivers/gpu/drm/i915/i915_priolist_types.h @@ -38,10 +38,72 @@ enum { #define I915_PRIORITY_UNPREEMPTABLE INT_MAX #define I915_PRIORITY_BARRIER (I915_PRIORITY_UNPREEMPTABLE - 1) +/* + * The slab returns power-of-two chunks of memory, so fill out the + * node to the next cacheline. + * + * We can estimate how many requests the skiplist will scale to based + * on its height: + * 11 => 4 million requests + * 12 => 16 million requests + */ +#ifdef CONFIG_64BIT +#define I915_PRIOLIST_HEIGHT 12 +#else +#define I915_PRIOLIST_HEIGHT 11 +#endif + +/* + * i915_priolist forms a skiplist. The skiplist is built in layers, + * starting at the base [0] is a singly linked list of all i915_priolist. + * Each higher layer contains a fraction of the i915_priolist from the + * previous layer: + * + * S[0] 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF S + * E[1] >1>3>5>7>9>B>D>F>1>3>5>7>9>B>D>F>1>3>5>7>9>B>D>F>1>3>5>7>9>B>D>F E + * N[2] -->3-->7-->B-->F-->3-->7-->B-->F-->3-->7-->B-->F-->3-->7-->B-->F N + * T[3] ------>7------>F-------7------>F------>7------>F------>7------>F T + * I[4] -------------->F-------------->F-------------->F-------------->F I + * N[5] ------------------------------>F------------------------------>F N + * E[6] ------------------------------>F-------------------------------> E + * L[7] ---------------------------------------------------------------> L + * + * To iterate through all active i915_priolist, we only need to follow + * the chain in i915_priolist.next[0] (see for_each_priolist()). + * + * To quickly find a specific key (or insert point), we can perform a binary + * search by starting at the highest level and following the linked list + * at that level until we either find the node, or have gone passed the key. + * Then we descend a level, and start walking the list again starting from + * the current position, until eventually we find our key, or we run out of + * levels. + * + * https://en.wikipedia.org/wiki/Skip_list + */ struct i915_priolist { struct list_head requests; - struct rb_node node; int priority; + + int level; + struct i915_priolist *next[I915_PRIOLIST_HEIGHT]; }; +struct i915_priolist_root { + struct i915_priolist sentinel; + u32 prng; +}; + +#define i915_priolist_is_empty(root) ((root)->sentinel.level < 0) + +#define for_each_priolist(p, root) \ + for ((p) = (root)->sentinel.next[0]; \ + (p) != &(root)->sentinel; \ + (p) = (p)->next[0]) + +#define priolist_for_each_request(it, plist) \ + list_for_each_entry(it, &(plist)->requests, sched.link) + +#define priolist_for_each_request_safe(it, n, plist) \ + list_for_each_entry_safe(it, n, &(plist)->requests, sched.link) + #endif /* _I915_PRIOLIST_TYPES_H_ */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 48336434bff3..991d486b3bc1 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -4,7 +4,9 @@ * Copyright © 2018 Intel Corporation */ +#include #include +#include #include "gt/intel_ring.h" #include "gt/intel_lrc_reg.h" @@ -139,6 +141,16 @@ static bool not_executing(const struct i915_request *rq) return false; } +static void init_priolist(struct i915_priolist_root *const root) +{ + struct i915_priolist *pl = &root->sentinel; + + memset_p((void **)pl->next, pl, ARRAY_SIZE(pl->next)); + pl->requests.prev = NULL; + pl->priority = INT_MIN; + pl->level = -1; +} + void i915_sched_init(struct i915_sched *se, struct device *dev, const char *name, @@ -153,9 +165,9 @@ void i915_sched_init(struct i915_sched *se, se->mask = mask; + init_priolist(&se->queue); INIT_LIST_HEAD(&se->requests); INIT_LIST_HEAD(&se->hold); - se->queue = RB_ROOT_CACHED; i915_sched_init_ipi(&se->ipi); @@ -176,8 +188,60 @@ void i915_sched_init(struct i915_sched *se, #endif } +__maybe_unused static bool priolist_idle(struct i915_priolist_root *root) +{ + struct i915_priolist *pl = &root->sentinel; + int lvl; + + for (lvl = 0; lvl < ARRAY_SIZE(pl->next); lvl++) { + if (pl->next[lvl] != pl) { + GEM_TRACE_ERR("root[%d] is not empty\n", lvl); + return false; + } + } + + if (pl->level != -1) { + GEM_TRACE_ERR("root is not clear: %d\n", pl->level); + return false; + } + + return true; +} + +static bool pl_empty(struct list_head *st) +{ + return !st->prev; +} + +static void pl_push(struct i915_priolist *pl, struct list_head *st) +{ + /* Keep list_empty(&pl->requests) valid for concurrent readers */ + pl->requests.prev = st->prev; + st->prev = &pl->requests; + GEM_BUG_ON(pl_empty(st)); +} + +static struct i915_priolist *pl_pop(struct list_head *st) +{ + struct i915_priolist *pl; + + GEM_BUG_ON(pl_empty(st)); + pl = container_of(st->prev, typeof(*pl), requests); + st->prev = pl->requests.prev; + + return pl; +} + void i915_sched_park(struct i915_sched *se) { + struct i915_priolist_root *root = &se->queue; + struct list_head *list = &root->sentinel.requests; + + GEM_BUG_ON(!priolist_idle(root)); + + while (!pl_empty(list)) + kmem_cache_free(global.slab_priorities, pl_pop(list)); + GEM_BUG_ON(!i915_sched_is_idle(se)); se->no_priolist = false; } @@ -253,70 +317,71 @@ static inline bool node_signaled(const struct i915_sched_node *node) return i915_request_completed(node_to_request(node)); } -static inline struct i915_priolist *to_priolist(struct rb_node *rb) +static inline unsigned int random_level(struct i915_priolist_root *root) { - return rb_entry(rb, struct i915_priolist, node); -} - -static void assert_priolists(struct i915_sched * const se) -{ - struct rb_node *rb; - long last_prio; - - if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) - return; - - GEM_BUG_ON(rb_first_cached(&se->queue) != - rb_first(&se->queue.rb_root)); - - last_prio = INT_MAX; - for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { - const struct i915_priolist *p = to_priolist(rb); - - GEM_BUG_ON(p->priority > last_prio); - last_prio = p->priority; - } + /* + * Given a uniform distribution of random numbers over the u32, then + * the probability each bit being unset is P=0.5. The probability of a + * successive sequence of bits being unset is P(n) = 0.5^n [n > 0]. + * P(level:1) = 0.5 + * P(level:2) = 0.25 + * P(level:3) = 0.125 + * P(level:4) = 0.0625 + * ... + * So we can use ffs() on a good random number generator to pick our + * level. We divide by two to reduce the probability of choosing a + * level to .25, as the cost of descending a level is the same as + * following an extra link in the chain at that level (so we can + * pack more nodes into fewer levels without incurring extra cost, + * and allow scaling to higher volumes of requests without expanding + * the height of the skiplist). + */ + root->prng = next_pseudo_random32(root->prng); + return __ffs(root->prng) / 2; } static struct list_head * lookup_priolist(struct i915_sched *se, int prio) { - struct i915_priolist *p; - struct rb_node **parent, *rb; - bool first = true; + struct i915_priolist *update[I915_PRIOLIST_HEIGHT]; + struct i915_priolist_root *root = &se->queue; + struct i915_priolist *pl, *tmp; + int lvl; lockdep_assert_held(&se->lock); - assert_priolists(se); - if (unlikely(se->no_priolist)) prio = I915_PRIORITY_NORMAL; + for_each_priolist(pl, root) { /* recycle any empty elements before us */ + if (pl->priority <= prio || !list_empty(&pl->requests)) + break; + + i915_priolist_advance(root, pl); + } + find_priolist: - /* most positive priority is scheduled first, equal priorities fifo */ - rb = NULL; - parent = &se->queue.rb_root.rb_node; - while (*parent) { - rb = *parent; - p = to_priolist(rb); - if (prio > p->priority) { - parent = &rb->rb_left; - } else if (prio < p->priority) { - parent = &rb->rb_right; - first = false; - } else { - return &p->requests; - } + pl = &root->sentinel; + lvl = pl->level; + while (lvl >= 0) { + while (tmp = pl->next[lvl], tmp->priority >= prio) + pl = tmp; + if (pl->priority == prio) + goto out; + update[lvl--] = pl; } if (prio == I915_PRIORITY_NORMAL) { - p = &se->default_priolist; + pl = &se->default_priolist; + } else if (!pl_empty(&root->sentinel.requests)) { + pl = pl_pop(&root->sentinel.requests); } else { - p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC); + pl = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC); /* Convert an allocation failure to a priority bump */ - if (unlikely(!p)) { + if (unlikely(!pl)) { prio = I915_PRIORITY_NORMAL; /* recurses just once */ - /* To maintain ordering with all rendering, after an + /* + * To maintain ordering with all rendering, after an * allocation failure we have to disable all scheduling. * Requests will then be executed in fifo, and schedule * will ensure that dependencies are emitted in fifo. @@ -329,18 +394,122 @@ lookup_priolist(struct i915_sched *se, int prio) } } - p->priority = prio; - INIT_LIST_HEAD(&p->requests); + pl->priority = prio; + INIT_LIST_HEAD(&pl->requests); - rb_link_node(&p->node, rb, parent); - rb_insert_color_cached(&p->node, &se->queue, first); + lvl = random_level(root); + if (lvl > root->sentinel.level) { + if (root->sentinel.level < I915_PRIOLIST_HEIGHT - 1) { + lvl = ++root->sentinel.level; + update[lvl] = &root->sentinel; + } else { + lvl = I915_PRIOLIST_HEIGHT - 1; + } + } + GEM_BUG_ON(lvl < 0); + GEM_BUG_ON(lvl >= ARRAY_SIZE(pl->next)); - return &p->requests; + pl->level = lvl; + do { + tmp = update[lvl]; + pl->next[lvl] = tmp->next[lvl]; + tmp->next[lvl] = pl; + } while (--lvl >= 0); + + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) { + struct i915_priolist *chk; + + chk = &root->sentinel; + lvl = chk->level; + do { + while (tmp = chk->next[lvl], tmp->priority >= prio) + chk = tmp; + } while (--lvl >= 0); + + GEM_BUG_ON(chk != pl); + } + +out: + GEM_BUG_ON(pl == &root->sentinel); + return &pl->requests; } -void __i915_priolist_free(struct i915_priolist *p) +static void __remove_priolist(struct i915_sched *se, struct list_head *plist) { - kmem_cache_free(global.slab_priorities, p); + struct i915_priolist_root *root = &se->queue; + struct i915_priolist *pl, *tmp; + struct i915_priolist *old = + container_of(plist, struct i915_priolist, requests); + int prio = old->priority; + int lvl; + + lockdep_assert_held(&se->lock); + GEM_BUG_ON(!list_empty(plist)); + + pl = &root->sentinel; + lvl = pl->level; + GEM_BUG_ON(lvl < 0); + + if (prio != I915_PRIORITY_NORMAL) + pl_push(old, &pl->requests); + + do { + while (tmp = pl->next[lvl], tmp->priority > prio) + pl = tmp; + if (lvl <= old->level) { + pl->next[lvl] = old->next[lvl]; + if (pl == &root->sentinel && old->next[lvl] == pl) { + GEM_BUG_ON(pl->level != lvl); + pl->level--; + } + } + } while (--lvl >= 0); + GEM_BUG_ON(tmp != old); +} + +static void remove_from_priolist(struct i915_sched *se, + struct i915_request *rq, + struct list_head *list, + bool tail) +{ + struct list_head *prev = rq->sched.link.prev; + + GEM_BUG_ON(!i915_request_in_priority_queue(rq)); + + __list_del_entry(&rq->sched.link); + if (tail) + list_add_tail(&rq->sched.link, list); + else + list_add(&rq->sched.link, list); + + /* If we just removed the last element in the old plist, delete it */ + if (list_empty(prev)) + __remove_priolist(se, prev); +} + +void i915_priolist_advance(struct i915_priolist_root *root, + struct i915_priolist *pl) +{ + struct i915_priolist * const s = &root->sentinel; + int lvl; + + GEM_BUG_ON(!list_empty(&pl->requests)); + GEM_BUG_ON(pl != s->next[0]); + GEM_BUG_ON(pl == s); + + /* Keep pl->next[0] valid for for_each_priolist iteration */ + if (pl->priority != I915_PRIORITY_NORMAL) + pl_push(pl, &s->requests); + + lvl = pl->level; + GEM_BUG_ON(lvl < 0); + do { + s->next[lvl] = pl->next[lvl]; + if (pl->next[lvl] == s) { + GEM_BUG_ON(s->level != lvl); + s->level--; + } + } while (--lvl >= 0); } static struct i915_request * @@ -493,7 +662,7 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) GEM_BUG_ON(rq->engine != engine); if (i915_request_in_priority_queue(rq)) - list_move_tail(&rq->sched.link, plist); + remove_from_priolist(se, rq, plist, true); /* Defer (tasklet) submission until after all updates. */ kick_submission(engine, rq, prio); @@ -629,8 +798,7 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, /* Note list is reversed for waiters wrt signal hierarchy */ GEM_BUG_ON(rq->engine != engine); - GEM_BUG_ON(!i915_request_in_priority_queue(rq)); - list_move(&rq->sched.link, &dfs); + remove_from_priolist(se, rq, &dfs, false); /* Track our visit, and prevent duplicate processing */ clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); @@ -1207,9 +1375,9 @@ void i915_sched_show(struct drm_printer *m, unsigned int max) { const struct i915_request *rq, *last; + struct i915_priolist *pl; unsigned long flags; unsigned int count; - struct rb_node *rb; rcu_read_lock(); spin_lock_irqsave(&se->lock, flags); @@ -1263,10 +1431,8 @@ void i915_sched_show(struct drm_printer *m, last = NULL; count = 0; - for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { - struct i915_priolist *p = rb_entry(rb, typeof(*p), node); - - priolist_for_each_request(rq, p) { + for_each_priolist(pl, &se->queue) { + priolist_for_each_request(rq, pl) { if (count++ < max - 1) show_request(m, rq, "\t\t", 0); else diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 51bca23a5617..7f9ee3dc6551 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -24,12 +24,6 @@ struct intel_engine_cs; ##__VA_ARGS__); \ } while (0) -#define priolist_for_each_request(it, plist) \ - list_for_each_entry(it, &(plist)->requests, sched.link) - -#define priolist_for_each_request_consume(it, n, plist) \ - list_for_each_entry_safe(it, n, &(plist)->requests, sched.link) - void i915_sched_node_init(struct i915_sched_node *node); void i915_sched_node_reinit(struct i915_sched_node *node); @@ -80,7 +74,7 @@ static inline void i915_priolist_free(struct i915_priolist *p) static inline bool i915_sched_is_idle(const struct i915_sched *se) { - return RB_EMPTY_ROOT(&se->queue.rb_root); + return i915_priolist_is_empty(&se->queue); } static inline bool @@ -124,6 +118,9 @@ static inline void i915_sched_flush(struct i915_sched *se) __i915_sched_flush(se, true); } +void i915_priolist_advance(struct i915_priolist_root *root, + struct i915_priolist *old); + void i915_request_show_with_schedule(struct drm_printer *m, const struct i915_request *rq, const char *prefix, diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 7271a0259a56..ad35fabf9f6e 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -65,7 +65,7 @@ struct i915_sched { /** * @queue: queue of requests, in priority lists */ - struct rb_root_cached queue; + struct i915_priolist_root queue; /** * @tasklet: softirq tasklet for bottom half diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h index 3db34d3eea58..946c93441c1f 100644 --- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h @@ -25,6 +25,7 @@ selftest(ring, intel_ring_mock_selftests) selftest(engine, intel_engine_cs_mock_selftests) selftest(timelines, intel_timeline_mock_selftests) selftest(requests, i915_request_mock_selftests) +selftest(scheduler, i915_scheduler_mock_selftests) selftest(objects, i915_gem_object_mock_selftests) selftest(phys, i915_gem_phys_mock_selftests) selftest(dmabuf, i915_gem_dmabuf_mock_selftests) diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index dbbefd0da2f2..f179f1cb760a 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -12,6 +12,54 @@ #include "selftests/igt_spinner.h" #include "selftests/i915_random.h" +static int mock_skiplist_levels(void *dummy) +{ + struct i915_priolist_root root = {}; + struct i915_priolist *pl = &root.sentinel; + IGT_TIMEOUT(end_time); + unsigned long total; + int count, lvl; + + total = 0; + do { + for (count = 0; count < 16384; count++) { + lvl = random_level(&root); + if (lvl > pl->level) { + if (lvl < I915_PRIOLIST_HEIGHT - 1) + lvl = ++pl->level; + else + lvl = I915_PRIOLIST_HEIGHT - 1; + } + + pl->next[lvl] = ptr_inc(pl->next[lvl]); + } + total += count; + } while (!__igt_timeout(end_time, NULL)); + + pr_info("Total %9lu\n", total); + for (lvl = 0; lvl <= pl->level; lvl++) { + int x = ilog2((unsigned long)pl->next[lvl]); + char row[80]; + + memset(row, '*', x); + row[x] = '\0'; + + pr_info(" [%2d] %9lu %s\n", + lvl, (unsigned long)pl->next[lvl], row); + } + + return 0; +} + +int i915_scheduler_mock_selftests(void) +{ + static const struct i915_subtest tests[] = { + SUBTEST(mock_skiplist_levels), + }; + + return i915_subtests(tests, NULL); +} + static void scheduling_disable(struct intel_engine_cs *engine) { engine->props.preempt_timeout_ms = 0; @@ -80,9 +128,9 @@ static int all_engines(struct drm_i915_private *i915, static bool check_context_order(struct i915_sched *se) { u64 last_seqno, last_context; + struct i915_priolist *p; unsigned long count; bool result = false; - struct rb_node *rb; int last_prio; /* We expect the execution order to follow ascending fence-context */ @@ -92,8 +140,7 @@ static bool check_context_order(struct i915_sched *se) last_context = 0; last_seqno = 0; last_prio = 0; - for (rb = rb_first_cached(&se->queue); rb; rb = rb_next(rb)) { - struct i915_priolist *p = rb_entry(rb, typeof(*p), node); + for_each_priolist(p, &se->queue) { struct i915_request *rq; priolist_for_each_request(rq, p) { From patchwork Mon Feb 1 08:56:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C2B9C433DB for ; Mon, 1 Feb 2021 08:57:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C7B5164E3F for ; Mon, 1 Feb 2021 08:57:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C7B5164E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CE2906E4B3; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 83CAC6E499 for ; Mon, 1 Feb 2021 08:57:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757752-1500050 for multiple; Mon, 01 Feb 2021 08:57:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:54 +0000 Message-Id: <20210201085715.27435-36-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 36/57] drm/i915: Wrap cmpxchg64 with try_cmpxchg64() helper X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Wrap cmpxchg64 with a try_cmpxchg()-esque helper. Hiding the old-value dance in the helper allows for cleaner code. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_utils.h | 32 +++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h index abd4dcd9f79c..95ead6bb1ba6 100644 --- a/drivers/gpu/drm/i915/i915_utils.h +++ b/drivers/gpu/drm/i915/i915_utils.h @@ -461,4 +461,36 @@ static inline bool timer_expired(const struct timer_list *t) */ #define IS_ACTIVE(config) ((config) != 0) +#ifndef try_cmpxchg64 +#if IS_ENABLED(CONFIG_64BIT) +#define try_cmpxchg64(_ptr, _pold, _new) try_cmpxchg(_ptr, _pold, _new) +#else +#define try_cmpxchg64(_ptr, _pold, _new) \ +({ \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __cur = cmpxchg64(_ptr, __old, _new); \ + bool success = __cur == __old; \ + if (unlikely(!success)) \ + *_old = __cur; \ + likely(success); \ +}) +#endif +#endif + +#ifndef xchg64 +#if IS_ENABLED(CONFIG_64BIT) +#define xchg64(_ptr, _new) xchg(_ptr, _new) +#else +#define xchg64(_ptr, _new) \ +({ \ + __typeof__(_ptr) __ptr = (_ptr); \ + __typeof__(*(_ptr)) __old = *__ptr; \ + while (!try_cmpxchg64(__ptr, &__old, (_new))) \ + ; \ + __old; \ +}) +#endif +#endif + #endif /* !__I915_UTILS_H */ From patchwork Mon Feb 1 08:56:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F6D7C433E0 for ; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A1FF64E33 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A1FF64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C39E26E4A6; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6C5726E45D for ; Mon, 1 Feb 2021 08:57:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757755-1500050 for multiple; Mon, 01 Feb 2021 08:57:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:55 +0000 Message-Id: <20210201085715.27435-37-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 37/57] drm/i915: Fair low-latency scheduling X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The first "scheduler" was a topographical sorting of requests into priority order. The execution order was deterministic, the earliest submitted, highest priority request would be executed first. Priority inheritance ensured that inversions were kept at bay, and allowed us to dynamically boost priorities (e.g. for interactive pageflips). The minimalistic timeslicing scheme was an attempt to introduce fairness between long running requests, by evicting the active request at the end of a timeslice and moving it to the back of its priority queue (while ensuring that dependencies were kept in order). For short running requests from many clients of equal priority, the scheme is still very much FIFO submission ordering, and as unfair as before. To impose fairness, we need an external metric that ensures that clients are interspersed, so we don't execute one long chain from client A before executing any of client B. This could be imposed by the clients themselves by using fences based on an external clock, that is they only submit work for a "frame" at frame-intervals, instead of submitting as much work as they are able to. The standard SwapBuffers approach is akin to double buffering, where as one frame is being executed, the next is being submitted, such that there is always a maximum of two frames per client in the pipeline and so ideally maintains consistent input-output latency. Even this scheme exhibits unfairness under load as a single client will execute two frames back to back before the next, and with enough clients, deadlines will be missed. The idea introduced by BFS/MuQSS is that fairness is introduced by metering with an external clock. Every request, when it becomes ready to execute is assigned a virtual deadline, and execution order is then determined by earliest deadline. Priority is used as a hint, rather than strict ordering, where high priority requests have earlier deadlines, but not necessarily earlier than outstanding work. Thus work is executed in order of 'readiness', with timeslicing to demote long running work. The Achille's heel of this scheduler is its strong preference for low-latency and favouring of new queues. Whereas it was easy to dominate the old scheduler by flooding it with many requests over a short period of time, the new scheduler can be dominated by a 'synchronous' client that waits for each of its requests to complete before submitting the next. As such a client has no history, it is always considered ready-to-run and receives an earlier deadline than the long running requests. This is compensated for by refreshing the current execution's deadline and by disallowing preemption for timeslice shuffling. In contrast, one key advantage of disconnecting the sort key from the priority value is that we can freely adjust the deadline to compensate for other factors. This is used in conjunction with submitting requests ahead-of-schedule that then busywait on the GPU using semaphores. Since we don't want to spend a timeslice busywaiting instead of doing real work when available, we deprioritise work by giving the semaphore waits a later virtual deadline. The priority deboost is applied to semaphore workloads after they miss a semaphore wait and a new context is pending. The request is then restored to its normal priority once the semaphores are signaled so that it not unfairly penalised under contention by remaining at a far future deadline. This is a much improved and cleaner version of commit f9e9e9de58c7 ("drm/i915: Prioritise non-busywait semaphore workloads"). To check the impact on throughput (often the downfall of latency sensitive schedulers), we used gem_wsim to simulate various transcode workloads with different load balancers, and varying the number of competing [heterogeneous] clients. On Kabylake gt3e running at fixed cpu/gpu clocks, +delta%------------------------------------------------------------------+ | a | | a | | a | | a | | aa | | aaa | | aaaa | | aaaaaa | | aaaaaa | | aaaaaa a a | | aa aaaaaa a a a a aa a a a a a| ||______M__A__________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev 108 -4.6326643 47.797855 -0.00069639128 2.116185 7.6764049 Each point is the relative percentage change in gem_wim's work-per-second score [using the median result of 120 25s runs, the relative change computed as (B/A - 1) * 100]; 0 being no change. Reviewing the same workloads on Tigerlake, +delta%------------------------------------------------------------------+ | a | | a | | a | | aa a | | aaaa | | aaaa | | aaaaaaa | | aaaaaaa | | aaaaaaa a a aa a a a | | aaaaaaaaaa a aa a a a aaaa aa a a aa a a| ||_______M____A_____________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev 108 -4.258712 46.83081 0.36853159 4.1415662 9.461689 The expectation is that by deliberately increasing the number of context switches to improve fairness between clients, throughput will be diminished. What we do see is are small fluctuations around no change, with the median result being improved throughput. The dramatic improvement is from reintroducing the improved no-semaphore boosting, which avoids accidentally preventing scheduling of ready workloads due to busy spinners. We expect to see no change in single client workloads such as games, though running multiple applications on a desktop should have reduced jitter i.e. smoother input-output latency. This scheduler is based on MuQSS by Dr Con Kolivas. v2: More commentary, especially around where we reset the deadlines. Testcase: igt/gem_exec_fair Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 - .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 1 + drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 14 - .../drm/i915/gt/intel_execlists_submission.c | 232 +++++----- drivers/gpu/drm/i915/gt/selftest_execlists.c | 30 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 5 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 1 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 - drivers/gpu/drm/i915/i915_priolist_types.h | 7 +- drivers/gpu/drm/i915/i915_request.c | 19 +- drivers/gpu/drm/i915/i915_scheduler.c | 426 +++++++++++++----- drivers/gpu/drm/i915/i915_scheduler.h | 16 +- drivers/gpu/drm/i915/i915_scheduler_types.h | 23 + drivers/gpu/drm/i915/selftests/i915_request.c | 1 + .../gpu/drm/i915/selftests/i915_scheduler.c | 136 ++++++ 16 files changed, 657 insertions(+), 264 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 9ff597ef5aca..f39f8049641c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -572,8 +572,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine) memset(execlists->pending, 0, sizeof(execlists->pending)); execlists->active = memset(execlists->inflight, 0, sizeof(execlists->inflight)); - - execlists->queue_priority_hint = INT_MIN; } static void cleanup_status_page(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index fce86bd4b47f..f1811e79401e 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -203,6 +203,7 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine) if (IS_ERR(rq)) return PTR_ERR(rq); + rq->sched.deadline = 0; __set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags); heartbeat_commit(rq, &attr); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 27d9d17b35cb..ef5064ea54e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -211,6 +211,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine) i915_request_add_active_barriers(rq); /* Install ourselves as a preemption barrier */ + rq->sched.deadline = 0; rq->sched.attr.priority = I915_PRIORITY_BARRIER; if (likely(!__i915_request_commit(rq))) { /* engine should be idle! */ /* @@ -271,9 +272,6 @@ static int __engine_park(struct intel_wakeref *wf) intel_engine_park_heartbeat(engine); intel_breadcrumbs_park(engine->breadcrumbs); - /* Must be reset upon idling, or we may miss the busy wakeup. */ - GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN); - if (engine->park) engine->park(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index f856bd9b7dae..dc12cbdfda46 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -223,20 +223,6 @@ struct intel_engine_execlists { */ unsigned int port_mask; - /** - * @queue_priority_hint: Highest pending priority. - * - * When we add requests into the queue, or adjust the priority of - * executing requests, we compute the maximum priority of those - * pending requests. We can then use this value to determine if - * we need to preempt the executing requests to service the queue. - * However, since the we may have recorded the priority of an inflight - * request we wanted to preempt but since completed, at the time of - * dequeuing the priority hint may no longer may match the highest - * available request priority. - */ - int queue_priority_hint; - struct rb_root_cached virtual; /** diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 1a33c33c96c4..31d36057c729 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -178,7 +178,7 @@ struct virtual_engine { */ struct ve_node { struct rb_node rb; - int prio; + u64 deadline; } nodes[I915_NUM_ENGINES]; /* @@ -254,25 +254,12 @@ static void ring_set_paused(const struct intel_engine_cs *engine, int state) static int rq_prio(const struct i915_request *rq) { - return READ_ONCE(rq->sched.attr.priority); + return rq->sched.attr.priority; } -static int effective_prio(const struct i915_request *rq) +static u64 rq_deadline(const struct i915_request *rq) { - int prio = rq_prio(rq); - - /* - * If this request is special and must not be interrupted at any - * cost, so be it. Note we are only checking the most recent request - * in the context and so may be masking an earlier vip request. It - * is hoped that under the conditions where nopreempt is used, this - * will not matter (i.e. all requests to that context will be - * nopreempt for as long as desired). - */ - if (i915_request_has_nopreempt(rq)) - prio = I915_PRIORITY_UNPREEMPTABLE; - - return prio; + return rq->sched.deadline; } static struct i915_request *first_request(const struct i915_sched *se) @@ -287,62 +274,62 @@ static struct i915_request *first_request(const struct i915_sched *se) sched.link); } -static int queue_prio(const struct i915_sched *se) +static struct i915_request *first_virtual(const struct intel_engine_cs *engine) { - struct i915_request *rq; + struct rb_node *rb; - rq = first_request(se); - if (!rq) - return INT_MIN; + rb = rb_first_cached(&engine->execlists.virtual); + if (!rb) + return NULL; - return rq_prio(rq); + return READ_ONCE(rb_entry(rb, + struct virtual_engine, + nodes[engine->id].rb)->request); } -static int virtual_prio(const struct intel_engine_execlists *el) +static const struct i915_request * +next_elsp_request(const struct i915_sched *se, const struct i915_request *rq) { - struct rb_node *rb = rb_first_cached(&el->virtual); + if (i915_sched_is_last_request(se, rq)) + return NULL; - return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; + return list_next_entry(rq, sched.link); } -static bool need_preempt(struct intel_engine_cs *engine, +static bool +dl_before(const struct i915_request *next, const struct i915_request *prev) +{ + return !prev || (next && rq_deadline(next) < rq_deadline(prev)); +} + +static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { const struct i915_sched *se = &engine->sched; - int last_prio; + const struct i915_request *first = NULL; + const struct i915_request *next; if (!i915_sched_use_busywait(se)) return false; /* - * Check if the current priority hint merits a preemption attempt. - * - * We record the highest value priority we saw during rescheduling - * prior to this dequeue, therefore we know that if it is strictly - * less than the current tail of ESLP[0], we do not need to force - * a preempt-to-idle cycle. - * - * However, the priority hint is a mere hint that we may need to - * preempt. If that hint is stale or we may be trying to preempt - * ourselves, ignore the request. - * - * More naturally we would write - * prio >= max(0, last); - * except that we wish to prevent triggering preemption at the same - * priority level: the task that is running should remain running - * to preserve FIFO ordering of dependencies. + * If this request is special and must not be interrupted at any + * cost, so be it. Note we are only checking the most recent request + * in the context and so may be masking an earlier vip request. It + * is hoped that under the conditions where nopreempt is used, this + * will not matter (i.e. all requests to that context will be + * nopreempt for as long as desired). */ - last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); - if (engine->execlists.queue_priority_hint <= last_prio) + if (i915_request_has_nopreempt(rq)) return false; /* * Check against the first request in ELSP[1], it will, thanks to the * power of PI, be the highest priority of that context. */ - if (!list_is_last(&rq->sched.link, &se->requests) && - rq_prio(list_next_entry(rq, sched.link)) > last_prio) - return true; + next = next_elsp_request(se, rq); + if (dl_before(next, first)) + first = next; /* * If the inflight context did not trigger the preemption, then maybe @@ -354,8 +341,31 @@ static bool need_preempt(struct intel_engine_cs *engine, * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same * context, it's priority would not exceed ELSP[0] aka last_prio. */ - return max(virtual_prio(&engine->execlists), - queue_prio(se)) > last_prio; + next = first_request(se); + if (dl_before(next, first)) + first = next; + + next = first_virtual(engine); + if (dl_before(next, first)) + first = next; + + if (!dl_before(first, rq)) + return false; + + /* + * While a request may have been queued that has an earlier deadline + * than is currently running, we only allow it to perform an urgent + * preemption if it also has higher priority. The cost of frequently + * switching between contexts is noticeable, so we try to keep + * the deadline shuffling only to timeslice boundaries. + */ + ENGINE_TRACE(engine, + "preempt for first=%llx:%llu, dl=%llu, prio=%d?\n", + first->fence.context, + first->fence.seqno, + rq_deadline(first), + rq_prio(first)); + return rq_prio(first) > max(rq_prio(rq), I915_PRIORITY_NORMAL - 1); } __maybe_unused static bool @@ -372,7 +382,15 @@ assert_priority_queue(const struct i915_request *prev, if (i915_request_is_active(prev)) return true; - return rq_prio(prev) >= rq_prio(next); + if (rq_deadline(prev) <= rq_deadline(next)) + return true; + + ENGINE_TRACE(prev->engine, + "next %llx:%lld dl %lld is before prev %llx:%lld dl %lld\n", + next->fence.context, next->fence.seqno, rq_deadline(next), + prev->fence.context, prev->fence.seqno, rq_deadline(prev)); + + return false; } static void @@ -553,10 +571,25 @@ static void __execlists_schedule_out(struct i915_request * const rq, /* * If we have just completed this context, the engine may now be * idle and we want to re-enter powersaving. + * + * If the context is still active, update the deadline on the next + * request as we submitted it much earlier with an estimation based + * on this request and all those before it consuming their whole budget. + * Since the next request is ready but may have a deadline set far in + * the future, we will prefer any new client before executing this + * context again. If the other clients are submitting synchronous + * workloads, each submission appears as a fresh piece of work and ready + * to run; each time they will receive a deadline that is likely earlier + * than the accumulated deadline of this context. So we re-evaluate this + * context's deadline and put it on an equal footing with the + * synchronous clients. */ - if (intel_timeline_is_last(ce->timeline, rq) && - __i915_request_is_complete(rq)) - intel_engine_add_retire(engine, ce->timeline); + if (__i915_request_is_complete(rq)) { + if (!intel_timeline_is_last(ce->timeline, rq)) + i915_request_update_deadline(list_next_entry(rq, link)); + else + intel_engine_add_retire(engine, ce->timeline); + } ccid = ce->lrc.ccid; ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; @@ -666,14 +699,14 @@ dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq) if (!rq) return ""; - snprintf(buf, buflen, "%sccid:%x %llx:%lld%s prio %d", + snprintf(buf, buflen, "%sccid:%x %llx:%lld%s dl:%llu", prefix, rq->context->lrc.ccid, rq->fence.context, rq->fence.seqno, __i915_request_is_complete(rq) ? "!" : __i915_request_has_started(rq) ? "*" : "", - rq_prio(rq)); + rq_deadline(rq)); return buf; } @@ -1194,11 +1227,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last) { if (need_preempt(engine, last)) { ENGINE_TRACE(engine, - "preempting last=%llx:%lld, prio=%d, hint=%d\n", + "preempting last=%llx:%llu, dl=%llu, prio=%d\n", last->fence.context, last->fence.seqno, - last->sched.attr.priority, - execlists->queue_priority_hint); + rq_deadline(last), + rq_prio(last)); record_preemption(execlists); /* @@ -1220,11 +1253,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine) last = NULL; } else if (timeslice_expired(engine, last)) { ENGINE_TRACE(engine, - "expired:%s last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", + "expired:%s last=%llx:%llu, deadline=%llu, now=%llu, yield?=%s\n", yesno(timer_expired(&execlists->timer)), last->fence.context, last->fence.seqno, - rq_prio(last), - execlists->queue_priority_hint, + rq_deadline(last), + i915_sched_to_ticks(ktime_get()), yesno(timeslice_yield(execlists, last))); /* @@ -1295,7 +1328,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(rq->engine != &ve->base); GEM_BUG_ON(rq->context != &ve->context); - if (unlikely(rq_prio(rq) < queue_prio(se))) { + if (!dl_before(rq, first_request(se))) { spin_unlock(&ve->base.sched.lock); break; } @@ -1307,16 +1340,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } ENGINE_TRACE(engine, - "virtual rq=%llx:%lld%s, new engine? %s\n", + "virtual rq=%llx:%lld%s, dl %llx, new engine? %s\n", rq->fence.context, rq->fence.seqno, __i915_request_is_complete(rq) ? "!" : __i915_request_has_started(rq) ? "*" : "", + rq_deadline(rq), yesno(engine != ve->siblings[0])); - WRITE_ONCE(ve->request, NULL); - WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN); rb = &ve->nodes[engine->id].rb; rb_erase_cached(rb, &execlists->virtual); @@ -1407,6 +1439,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (rq->execution_mask != engine->mask) goto done; + if (unlikely(dl_before(first_virtual(engine), + rq))) + goto done; + /* * If GVT overrides us we only ever submit * port[0], leaving port[1] empty. Note that we @@ -1443,24 +1479,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } done: *port++ = i915_request_get(last); - - /* - * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. - * - * We choose the priority hint such that if we add a request of greater - * priority than this, we kick the submission tasklet to decide on - * the right order of submitting the requests to hardware. We must - * also be prepared to reorder requests as they are in-flight on the - * HW. We derive the priority hint then as the first "hole" in - * the HW submission ports and if there are no available slots, - * the priority of the lowest executing request, i.e. last. - * - * When we do receive a higher priority request ready to run from the - * user, see queue_request(), the priority hint is bumped to that - * request triggering preemption on the next dequeue (or subsequent - * interrupt for secondary ports). - */ - execlists->queue_priority_hint = pl->priority; spin_unlock(&se->lock); /* @@ -2637,15 +2655,6 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) rcu_read_unlock(); } -static void nop_submission_tasklet(struct tasklet_struct *t) -{ - struct intel_engine_cs * const engine = - from_tasklet(engine, t, sched.tasklet); - - /* The driver is wedged; don't process any more events. */ - WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN); -} - static void execlists_reset_cancel(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -2714,19 +2723,12 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) i915_request_put(rq); } i915_request_put(rq); - - ve->base.execlists.queue_priority_hint = INT_MIN; } spin_unlock(&ve->base.sched.lock); } /* Remaining _unready_ requests will be nop'ed when submitted */ - execlists->queue_priority_hint = INT_MIN; - - GEM_BUG_ON(__tasklet_is_enabled(&se->tasklet)); - se->tasklet.callback = nop_submission_tasklet; - spin_unlock_irqrestore(&se->lock, flags); rcu_read_unlock(); } @@ -2856,7 +2858,6 @@ static bool can_preempt(struct intel_engine_cs *engine) static void execlists_set_default_submission(struct intel_engine_cs *engine) { engine->sched.submit_request = i915_request_enqueue; - engine->sched.tasklet.callback = execlists_submission_tasklet; } static void execlists_shutdown(struct intel_engine_cs *engine) @@ -3213,7 +3214,8 @@ static const struct intel_context_ops virtual_context_ops = { .destroy = virtual_context_destroy, }; -static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) +static intel_engine_mask_t +virtual_submission_mask(struct virtual_engine *ve, u64 *deadline) { struct i915_request *rq; intel_engine_mask_t mask; @@ -3230,9 +3232,11 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) mask = ve->siblings[0]->mask; } - ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", + *deadline = rq_deadline(rq); + + ENGINE_TRACE(&ve->base, "rq=%llx:%llu, mask=%x, dl=%llu\n", rq->fence.context, rq->fence.seqno, - mask, ve->base.execlists.queue_priority_hint); + mask, *deadline); return mask; } @@ -3241,12 +3245,12 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) { struct virtual_engine * const ve = from_tasklet(ve, t, base.sched.tasklet); - const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint); intel_engine_mask_t mask; unsigned int n; + u64 deadline; rcu_read_lock(); - mask = virtual_submission_mask(ve); + mask = virtual_submission_mask(ve, &deadline); rcu_read_unlock(); if (unlikely(!mask)) return; @@ -3280,7 +3284,8 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) */ first = rb_first_cached(&sibling->execlists.virtual) == &node->rb; - if (prio == node->prio || (prio > node->prio && first)) + if (deadline == node->deadline || + (deadline < node->deadline && first)) goto submit_engine; rb_erase_cached(&node->rb, &sibling->execlists.virtual); @@ -3294,7 +3299,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) rb = *parent; other = rb_entry(rb, typeof(*other), rb); - if (prio > other->prio) { + if (deadline < other->deadline) { parent = &rb->rb_left; } else { parent = &rb->rb_right; @@ -3309,8 +3314,8 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) submit_engine: GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); - node->prio = prio; - if (first && prio > sibling->execlists.queue_priority_hint) + node->deadline = deadline; + if (first) i915_sched_kick(se); unlock_engine: @@ -3347,7 +3352,9 @@ static void virtual_submit_request(struct i915_request *rq) i915_request_put(ve->request); } - ve->base.execlists.queue_priority_hint = rq_prio(rq); + rq->sched.deadline = + min(rq->sched.deadline, + i915_scheduler_next_virtual_deadline(rq_prio(rq))); ve->request = i915_request_get(rq); GEM_BUG_ON(!list_empty(virtual_queue(ve))); @@ -3449,7 +3456,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.bond_execute = virtual_bond_execute; INIT_LIST_HEAD(virtual_queue(ve)); - ve->base.execlists.queue_priority_hint = INT_MIN; intel_context_init(&ve->context, &ve->base); diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 147cbfd6dec0..3d87674477da 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -868,7 +868,7 @@ semaphore_queue(struct intel_engine_cs *engine, struct i915_vma *vma, int idx) static int release_queue(struct intel_engine_cs *engine, struct i915_vma *vma, - int idx, int prio) + int idx, u64 deadline) { struct i915_request *rq; u32 *cs; @@ -893,10 +893,7 @@ release_queue(struct intel_engine_cs *engine, i915_request_get(rq); i915_request_add(rq); - local_bh_disable(); - i915_request_set_priority(rq, prio); - local_bh_enable(); /* kick tasklet */ - + i915_request_set_deadline(rq, deadline); i915_request_put(rq); return 0; @@ -910,6 +907,7 @@ slice_semaphore_queue(struct intel_engine_cs *outer, struct intel_engine_cs *engine; struct i915_request *head; enum intel_engine_id id; + long timeout; int err, i, n = 0; head = semaphore_queue(outer, vma, n++); @@ -933,12 +931,16 @@ slice_semaphore_queue(struct intel_engine_cs *outer, } } - err = release_queue(outer, vma, n, I915_PRIORITY_BARRIER); + err = release_queue(outer, vma, n, 0); if (err) goto out; - if (i915_request_wait(head, 0, - 2 * outer->gt->info.num_engines * (count + 2) * (count + 3)) < 0) { + /* Expected number of pessimal slices required */ + timeout = outer->gt->info.num_engines * (count + 2) * (count + 3); + timeout *= 4; /* safety factor, including bucketing */ + timeout += HZ / 2; /* and include the request completion */ + + if (i915_request_wait(head, 0, timeout) < 0) { pr_err("%s: Failed to slice along semaphore chain of length (%d, %d)!\n", outer->name, count, n); GEM_TRACE_DUMP(); @@ -1043,6 +1045,8 @@ create_rewinder(struct intel_context *ce, err = i915_request_await_dma_fence(rq, &wait->fence); if (err) goto err; + + i915_request_set_deadline(rq, rq_deadline(wait)); } cs = intel_ring_begin(rq, 14); @@ -1338,6 +1342,7 @@ static int live_timeslice_queue(void *arg) goto err_heartbeat; } i915_request_set_priority(rq, I915_PRIORITY_MAX); + i915_request_set_deadline(rq, 0); err = wait_for_submit(engine, rq, HZ / 2); if (err) { pr_err("%s: Timed out trying to submit semaphores\n", @@ -1360,10 +1365,9 @@ static int live_timeslice_queue(void *arg) } GEM_BUG_ON(i915_request_completed(rq)); - GEM_BUG_ON(execlists_active(&engine->execlists) != rq); /* Queue: semaphore signal, matching priority as semaphore */ - err = release_queue(engine, vma, 1, effective_prio(rq)); + err = release_queue(engine, vma, 1, rq_deadline(rq)); if (err) goto err_rq; @@ -1474,6 +1478,7 @@ static int live_timeslice_nopreempt(void *arg) goto out_spin; } + rq->sched.deadline = 0; rq->sched.attr.priority = I915_PRIORITY_BARRIER; i915_request_get(rq); i915_request_add(rq); @@ -1837,6 +1842,7 @@ static int live_late_preempt(void *arg) /* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */ ctx_lo->sched.priority = 1; + ctx_hi->sched.priority = I915_PRIORITY_MIN; for_each_engine(engine, gt, id) { struct igt_live_test t; @@ -3001,6 +3007,9 @@ static int live_preempt_gang(void *arg) while (rq) { /* wait for each rq from highest to lowest prio */ struct i915_request *n = list_next_entry(rq, mock.link); + /* With deadlines, no strict priority ordering */ + i915_request_set_deadline(rq, 0); + if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0) { struct drm_printer p = drm_info_printer(engine->i915->drm.dev); @@ -3223,6 +3232,7 @@ static int preempt_user(struct intel_engine_cs *engine, i915_request_add(rq); i915_request_set_priority(rq, I915_PRIORITY_MAX); + i915_request_set_deadline(rq, 0); if (i915_request_wait(rq, 0, HZ / 2) < 0) err = -ETIME; diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index cdb0ceff3be1..5323fd56efd6 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1010,7 +1010,10 @@ static int __igt_reset_engines(struct intel_gt *gt, break; } - if (i915_request_wait(rq, 0, HZ / 5) < 0) { + /* With deadlines, no strict priority */ + i915_request_set_deadline(rq, 0); + + if (i915_request_wait(rq, 0, HZ / 2) < 0) { struct drm_printer p = drm_info_printer(gt->i915->drm.dev); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 6d73add47109..b7dd5646c882 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1257,6 +1257,7 @@ poison_registers(struct intel_context *ce, intel_ring_advance(rq, cs); + rq->sched.deadline = 0; rq->sched.attr.priority = I915_PRIORITY_BARRIER; err_rq: i915_request_add(rq); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 75e25d419264..c4a252ee1cdd 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -221,7 +221,6 @@ static void __guc_dequeue(struct intel_engine_cs *engine) i915_priolist_advance(&se->queue, pl); } done: - execlists->queue_priority_hint = pl->priority; if (submit) { *port = schedule_in(last, port - execlists->inflight); *++port = NULL; @@ -319,7 +318,6 @@ static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) static void guc_reset_cancel(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn; struct i915_priolist *p; @@ -363,8 +361,6 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) /* Remaining _unready_ requests will be nop'ed when submitted */ - execlists->queue_priority_hint = INT_MIN; - spin_unlock_irqrestore(&se->lock, flags); } diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h index ee7482b9c813..542b47078104 100644 --- a/drivers/gpu/drm/i915/i915_priolist_types.h +++ b/drivers/gpu/drm/i915/i915_priolist_types.h @@ -22,6 +22,8 @@ enum { /* Interactive workload, scheduled for immediate pageflipping */ I915_PRIORITY_DISPLAY, + + __I915_PRIORITY_KERNEL__ }; /* Smallest priority value that cannot be bumped. */ @@ -35,8 +37,7 @@ enum { * i.e. nothing can have higher priority and force us to usurp the * active request. */ -#define I915_PRIORITY_UNPREEMPTABLE INT_MAX -#define I915_PRIORITY_BARRIER (I915_PRIORITY_UNPREEMPTABLE - 1) +#define I915_PRIORITY_BARRIER INT_MAX /* * The slab returns power-of-two chunks of memory, so fill out the @@ -82,7 +83,7 @@ enum { */ struct i915_priolist { struct list_head requests; - int priority; + u64 deadline; int level; struct i915_priolist *next[I915_PRIOLIST_HEIGHT]; diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index e7b4c4bc41a6..ce828dc73402 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -467,7 +467,7 @@ bool __i915_request_submit(struct i915_request *request) struct i915_sched *se = intel_engine_get_scheduler(engine); bool result = false; - RQ_TRACE(request, "\n"); + RQ_TRACE(request, "dl %llu\n", request->sched.deadline); lockdep_assert_held(&se->lock); @@ -650,6 +650,12 @@ semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) switch (state) { case FENCE_COMPLETE: + /* + * The request is now ready to run; re-evaluate its deadline + * to remove the semaphore deprioritisation and to assign + * a deadline relative to its point-of-readiness [now]. + */ + i915_request_update_deadline(rq); break; case FENCE_FREE: @@ -1810,14 +1816,15 @@ long i915_request_wait(struct i915_request *rq, return timeout; } -static int print_sched_attr(const struct i915_sched_attr *attr, - char *buf, int x, int len) +static int print_sched(const struct i915_sched_node *node, + char *buf, int x, int len) { - if (attr->priority == I915_PRIORITY_INVALID) + if (node->attr.priority == I915_PRIORITY_INVALID) return x; x += snprintf(buf + x, len - x, - " prio=%d", attr->priority); + " prio=%d, dl=%llu", + node->attr.priority, node->deadline); return x; } @@ -1903,7 +1910,7 @@ void i915_request_show(struct drm_printer *m, * - the request has been temporarily suspended from execution */ - x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf)); + x = print_sched(&rq->sched, buf, x, sizeof(buf)); drm_printf(m, "%s%.*s%c %llx:%lld%s%s %s @ %dms: %s\n", prefix, indent, " ", diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 991d486b3bc1..5b68a5f07f64 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -33,6 +33,11 @@ static void node_put(struct i915_sched_node *node) i915_request_put(container_of(node, struct i915_request, sched)); } +static inline u64 rq_deadline(const struct i915_request *rq) +{ + return READ_ONCE(rq->sched.deadline); +} + static inline int rq_prio(const struct i915_request *rq) { return READ_ONCE(rq->sched.attr.priority); @@ -46,6 +51,14 @@ static int ipi_get_prio(struct i915_request *rq) return xchg(&rq->sched.ipi_priority, I915_PRIORITY_INVALID); } +static u64 ipi_get_deadline(struct i915_request *rq) +{ + if (READ_ONCE(rq->sched.ipi_deadline) == I915_DEADLINE_NEVER) + return I915_DEADLINE_NEVER; + + return xchg64(&rq->sched.ipi_deadline, I915_DEADLINE_NEVER); +} + static void ipi_schedule(struct work_struct *wrk) { struct i915_sched_ipi *ipi = container_of(wrk, typeof(*ipi), work); @@ -53,9 +66,11 @@ static void ipi_schedule(struct work_struct *wrk) do { struct i915_request *rn = xchg(&rq->sched.ipi_link, NULL); + u64 deadline; int prio; prio = ipi_get_prio(rq); + deadline = ipi_get_deadline(rq); /* * For cross-engine scheduling to work we rely on one of two @@ -80,6 +95,7 @@ static void ipi_schedule(struct work_struct *wrk) */ local_bh_disable(); i915_request_set_priority(rq, prio); + i915_request_set_deadline(rq, deadline); local_bh_enable(); i915_request_put(rq); @@ -146,8 +162,8 @@ static void init_priolist(struct i915_priolist_root *const root) struct i915_priolist *pl = &root->sentinel; memset_p((void **)pl->next, pl, ARRAY_SIZE(pl->next)); + pl->deadline = I915_DEADLINE_NEVER; pl->requests.prev = NULL; - pl->priority = INT_MIN; pl->level = -1; } @@ -341,19 +357,20 @@ static inline unsigned int random_level(struct i915_priolist_root *root) } static struct list_head * -lookup_priolist(struct i915_sched *se, int prio) +lookup_priolist(struct i915_sched *se, u64 deadline) { struct i915_priolist *update[I915_PRIOLIST_HEIGHT]; struct i915_priolist_root *root = &se->queue; struct i915_priolist *pl, *tmp; int lvl; + GEM_BUG_ON(deadline == I915_DEADLINE_NEVER); lockdep_assert_held(&se->lock); if (unlikely(se->no_priolist)) - prio = I915_PRIORITY_NORMAL; + deadline = 0; for_each_priolist(pl, root) { /* recycle any empty elements before us */ - if (pl->priority <= prio || !list_empty(&pl->requests)) + if (pl->deadline >= deadline || !list_empty(&pl->requests)) break; i915_priolist_advance(root, pl); @@ -363,14 +380,14 @@ lookup_priolist(struct i915_sched *se, int prio) pl = &root->sentinel; lvl = pl->level; while (lvl >= 0) { - while (tmp = pl->next[lvl], tmp->priority >= prio) + while (tmp = pl->next[lvl], tmp->deadline <= deadline) pl = tmp; - if (pl->priority == prio) + if (pl->deadline == deadline) goto out; update[lvl--] = pl; } - if (prio == I915_PRIORITY_NORMAL) { + if (!deadline) { pl = &se->default_priolist; } else if (!pl_empty(&root->sentinel.requests)) { pl = pl_pop(&root->sentinel.requests); @@ -378,7 +395,7 @@ lookup_priolist(struct i915_sched *se, int prio) pl = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC); /* Convert an allocation failure to a priority bump */ if (unlikely(!pl)) { - prio = I915_PRIORITY_NORMAL; /* recurses just once */ + deadline = 0; /* recurses just once */ /* * To maintain ordering with all rendering, after an @@ -394,7 +411,7 @@ lookup_priolist(struct i915_sched *se, int prio) } } - pl->priority = prio; + pl->deadline = deadline; INIT_LIST_HEAD(&pl->requests); lvl = random_level(root); @@ -422,7 +439,7 @@ lookup_priolist(struct i915_sched *se, int prio) chk = &root->sentinel; lvl = chk->level; do { - while (tmp = chk->next[lvl], tmp->priority >= prio) + while (tmp = chk->next[lvl], tmp->deadline <= deadline) chk = tmp; } while (--lvl >= 0); @@ -440,7 +457,7 @@ static void __remove_priolist(struct i915_sched *se, struct list_head *plist) struct i915_priolist *pl, *tmp; struct i915_priolist *old = container_of(plist, struct i915_priolist, requests); - int prio = old->priority; + u64 deadline = old->deadline; int lvl; lockdep_assert_held(&se->lock); @@ -450,11 +467,11 @@ static void __remove_priolist(struct i915_sched *se, struct list_head *plist) lvl = pl->level; GEM_BUG_ON(lvl < 0); - if (prio != I915_PRIORITY_NORMAL) + if (deadline) pl_push(old, &pl->requests); do { - while (tmp = pl->next[lvl], tmp->priority > prio) + while (tmp = pl->next[lvl], tmp->deadline < deadline) pl = tmp; if (lvl <= old->level) { pl->next[lvl] = old->next[lvl]; @@ -498,7 +515,7 @@ void i915_priolist_advance(struct i915_priolist_root *root, GEM_BUG_ON(pl == s); /* Keep pl->next[0] valid for for_each_priolist iteration */ - if (pl->priority != I915_PRIORITY_NORMAL) + if (pl->deadline) pl_push(pl, &s->requests); lvl = pl->level; @@ -532,52 +549,249 @@ stack_pop(struct i915_request *rq, return rq; } -static inline bool need_preempt(int prio, int active) +static void ipi_deadline(struct i915_request *rq, u64 deadline) { - /* - * Allow preemption of low -> normal -> high, but we do - * not allow low priority tasks to preempt other low priority - * tasks under the impression that latency for low priority - * tasks does not matter (as much as background throughput), - * so kiss. - */ - return prio >= max(I915_PRIORITY_NORMAL, active); + u64 old = READ_ONCE(rq->sched.ipi_deadline); + + do { + if (deadline >= old) + return; + } while (!try_cmpxchg64(&rq->sched.ipi_deadline, &old, deadline)); + + __ipi_add(rq); } -static void kick_submission(struct intel_engine_cs *engine, - const struct i915_request *rq, - int prio) +static bool is_first_priolist(const struct i915_sched *se, + const struct list_head *requests) { - const struct i915_request *inflight; + return requests == &se->queue.sentinel.next[0]->requests; +} + +static bool __i915_request_set_deadline(struct i915_request *rq, u64 deadline) +{ + struct intel_engine_cs *engine = rq->engine; + struct i915_sched *se = intel_engine_get_scheduler(engine); + struct list_head *pos = &rq->sched.signalers_list; + struct list_head *plist; + + if (unlikely(!i915_request_in_priority_queue(rq))) { + rq->sched.deadline = deadline; + return false; + } + + /* Fifo and depth-first replacement ensure our deps execute first */ + plist = lookup_priolist(se, deadline); + + rq->sched.dfs.prev = NULL; + do { + list_for_each_continue(pos, &rq->sched.signalers_list) { + struct i915_dependency *p = + list_entry(pos, typeof(*p), signal_link); + struct i915_request *s = + container_of(p->signaler, typeof(*s), sched); + + if (rq_deadline(s) <= deadline) + continue; + + if (__i915_request_is_complete(s)) + continue; + + if (s->engine != engine) { + ipi_deadline(s, deadline); + continue; + } + + /* Remember our position along this branch */ + rq = stack_push(s, rq, pos); + pos = &rq->sched.signalers_list; + } + + RQ_TRACE(rq, "set-deadline:%llu\n", deadline); + WRITE_ONCE(rq->sched.deadline, deadline); + + /* + * Once the request is ready, it will be placed into the + * priority lists and then onto the HW runlist. Before the + * request is ready, it does not contribute to our preemption + * decisions and we can safely ignore it, as it will, and + * any preemption required, be dealt with upon submission. + * See engine->submit_request() + */ + GEM_BUG_ON(rq->engine != engine); + if (i915_request_in_priority_queue(rq)) + remove_from_priolist(se, rq, plist, true); + } while ((rq = stack_pop(rq, &pos))); + + return is_first_priolist(se, plist); +} + +void i915_request_set_deadline(struct i915_request *rq, u64 deadline) +{ + struct intel_engine_cs *engine; + unsigned long flags; + + if (deadline >= rq_deadline(rq)) + return; + + engine = lock_engine_irqsave(rq, flags); + if (!i915_sched_is_active(&engine->sched)) + goto unlock; + + if (deadline >= rq_deadline(rq)) + goto unlock; + + if (__i915_request_is_complete(rq)) + goto unlock; + + rcu_read_lock(); + if (__i915_request_set_deadline(rq, deadline)) + i915_sched_kick(&engine->sched); + rcu_read_unlock(); + GEM_BUG_ON(rq_deadline(rq) != deadline); + +unlock: + spin_unlock_irqrestore(&engine->sched.lock, flags); +} + +static u64 prio_slice(int prio) +{ + u64 slice; + int sf; /* - * We only need to kick the tasklet once for the high priority - * new context we add into the queue. + * This is the central heuristic to the virtual deadlines. By + * imposing that each task takes an equal amount of time, we + * let each client have an equal slice of the GPU time. By + * bringing the virtual deadline forward, that client will then + * have more GPU time, and vice versa a lower priority client will + * have a later deadline and receive less GPU time. + * + * In BFS/MuQSS, the prio_ratios[] are based on the task nice range of + * [-20, 20], with each lower priority having a ~10% longer deadline, + * with the note that the proportion of CPU time between two clients + * of different priority will be the square of the relative prio_slice. + * + * This property that the budget of each client is proprotional to + * the relative priority, and that the scheduler fairly distributes + * work according to that budget, opens up a very powerful tool + * for managing clients. + * + * In contrast, this prio_slice() curve was chosen because it gave good + * results with igt/gem_exec_schedule. It may not be the best choice! + * + * With a 1ms scheduling quantum: + * + * MAX USER: ~32us deadline + * 0: ~16ms deadline + * MIN_USER: 1000ms deadline */ - if (prio <= engine->execlists.queue_priority_hint) - return; - /* Nothing currently active? We're overdue for a submission! */ - inflight = execlists_active(&engine->execlists); - if (!inflight) - return; + if (prio >= __I915_PRIORITY_KERNEL__) + return INT_MAX - prio; + + slice = __I915_PRIORITY_KERNEL__ - prio; + if (prio >= 0) + sf = 20 - 6; + else + sf = 20 - 1; + + return slice << sf; +} + +static u64 virtual_deadline(u64 kt, int priority) +{ + return i915_sched_to_ticks(kt + prio_slice(priority)); +} + +u64 i915_scheduler_next_virtual_deadline(int priority) +{ + return virtual_deadline(ktime_get_mono_fast_ns(), priority); +} + +static u64 signal_deadline(const struct i915_request *rq) +{ + u64 last = ktime_get_mono_fast_ns(); + const struct i915_dependency *p; /* - * If we are already the currently executing context, don't - * bother evaluating if we should preempt ourselves. + * Find the earliest point at which we will become 'ready', + * which we infer from the deadline of all active signalers. + * We will position ourselves at the end of that chain of work. */ - if (inflight->context == rq->context) - return; - SCHED_TRACE(&engine->sched, - "bumping queue-priority-hint:%d for rq:" RQ_FMT ", inflight:" RQ_FMT " prio %d\n", - prio, - RQ_ARG(rq), RQ_ARG(inflight), - inflight->sched.attr.priority); + rcu_read_lock(); + for_each_signaler(p, rq) { + const struct i915_request *s = + container_of(p->signaler, typeof(*s), sched); + u64 deadline; + int prio; - engine->execlists.queue_priority_hint = prio; - if (need_preempt(prio, rq_prio(inflight))) - intel_engine_kick_scheduler(engine); + if (__i915_request_is_complete(s)) + continue; + + if (s->timeline == rq->timeline && + __i915_request_has_started(s)) + continue; + + prio = rq_prio(s); + if (prio < rq_prio(rq)) + continue; + + deadline = rq_deadline(s); + if (deadline == I915_DEADLINE_NEVER) /* retired & reused */ + continue; + + if (s->context == rq->context) /* break ties in favour of hot */ + deadline--; + + deadline = i915_sched_to_ns(deadline); + if (p->flags & I915_DEPENDENCY_WEAK) + deadline -= prio_slice(prio); + + last = max(last, deadline); + } + rcu_read_unlock(); + + return last; +} + +static int adj_prio(const struct i915_request *rq) +{ + int prio = rq_prio(rq); + + /* + * Deprioritize semaphore waiters. We only want to run these if there + * is nothing ready to run first. + * + * Note by giving a more distant deadline (due to a lower priority) + * we do not prevent them from having a slice of the GPU, and if there + * is still contention at that point, we expect to immediately yield + * on the semaphore. + * + * When all semaphores are signaled, we will update the request + * to remove the semaphore penalty. + */ + if (!i915_sw_fence_signaled(&rq->semaphore)) + prio -= __I915_PRIORITY_KERNEL__; + + return prio; +} + +static u64 earliest_deadline(const struct i915_request *rq) +{ + return virtual_deadline(signal_deadline(rq), rq_prio(rq)); +} + +static bool set_earliest_deadline(struct i915_request *rq, u64 old) +{ + u64 dl; + + /* Recompute our deadlines and promote after a priority change */ + dl = min(earliest_deadline(rq), rq_deadline(rq)); + if (dl >= old) + return false; + + return __i915_request_set_deadline(rq, dl); } static void ipi_priority(struct i915_request *rq, int prio) @@ -592,18 +806,15 @@ static void ipi_priority(struct i915_request *rq, int prio) __ipi_add(rq); } -static void __i915_request_set_priority(struct i915_request *rq, int prio) +static bool __i915_request_set_priority(struct i915_request *rq, int prio) { struct intel_engine_cs *engine = rq->engine; - struct i915_sched *se = intel_engine_get_scheduler(engine); struct list_head *pos = &rq->sched.signalers_list; - struct list_head *plist; + bool kick = false; SCHED_TRACE(&engine->sched, "PI for " RQ_FMT ", prio:%d\n", RQ_ARG(rq), prio); - plist = lookup_priolist(se, prio); - /* * Recursively bump all dependent priorities to match the new request. * @@ -624,6 +835,8 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) */ rq->sched.dfs.prev = NULL; do { + struct i915_request *next; + list_for_each_continue(pos, &rq->sched.signalers_list) { struct i915_dependency *p = list_entry(pos, typeof(*p), signal_link); @@ -649,6 +862,8 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) RQ_TRACE(rq, "set-priority:%d\n", prio); WRITE_ONCE(rq->sched.attr.priority, prio); + next = stack_pop(rq, &pos); + /* * Once the request is ready, it will be placed into the * priority lists and then onto the HW runlist. Before the @@ -657,16 +872,15 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio) * any preemption required, be dealt with upon submission. * See engine->submit_request() */ - if (!i915_request_is_ready(rq)) - continue; - GEM_BUG_ON(rq->engine != engine); - if (i915_request_in_priority_queue(rq)) - remove_from_priolist(se, rq, plist, true); + if (i915_request_is_ready(rq) && + set_earliest_deadline(rq, rq_deadline(rq))) + kick = true; - /* Defer (tasklet) submission until after all updates. */ - kick_submission(engine, rq, prio); - } while ((rq = stack_pop(rq, &pos))); + rq = next; + } while (rq); + + return kick; } #define all_signalers_checked(p, rq) \ @@ -719,13 +933,9 @@ void i915_request_set_priority(struct i915_request *rq, int prio) if (__i915_request_is_complete(rq)) goto unlock; - if (!i915_sched_is_active(&engine->sched)) { - rq->sched.attr.priority = prio; - goto unlock; - } - rcu_read_lock(); - __i915_request_set_priority(rq, prio); + if (__i915_request_set_priority(rq, prio)) + i915_sched_kick(&engine->sched); rcu_read_unlock(); GEM_BUG_ON(rq_prio(rq) != prio); @@ -738,8 +948,10 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, { struct list_head *pos = &rq->sched.waiters_list; struct i915_sched *se = intel_engine_get_scheduler(engine); - const int prio = rq_prio(rq); struct i915_request *rn; + const u64 deadline = + max(rq_deadline(rq), + i915_scheduler_next_virtual_deadline(adj_prio(rq))); LIST_HEAD(dfs); SCHED_TRACE(se, "defer request " RQ_FMT "\n", RQ_ARG(rq)); @@ -772,30 +984,32 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, __i915_request_has_started(w) && !__i915_request_is_complete(rq)); + /* An unready waiter imposes no deadline */ if (!i915_request_in_priority_queue(w)) continue; /* - * We also need to reorder within the same priority. + * We also need to reorder within the same deadline. * * This is unlike priority-inheritance, where if the * signaler already has a higher priority [earlier * deadline] than us, we can ignore as it will be * scheduled first. If a waiter already has the - * same priority, we still have to push it to the end + * same deadline, we still have to push it to the end * of the list. This unfortunately means we cannot * use the rq_deadline() itself as a 'visited' bit. */ - if (rq_prio(w) < prio) + if (rq_deadline(w) > deadline) continue; - GEM_BUG_ON(rq_prio(w) != prio); - /* Remember our position along this branch */ rq = stack_push(w, rq, pos); pos = &rq->sched.waiters_list; } + RQ_TRACE(rq, "set-deadline:%llu\n", deadline); + WRITE_ONCE(rq->sched.deadline, deadline); + /* Note list is reversed for waiters wrt signal hierarchy */ GEM_BUG_ON(rq->engine != engine); remove_from_priolist(se, rq, &dfs, false); @@ -804,31 +1018,17 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } while ((rq = stack_pop(rq, &pos))); - pos = lookup_priolist(se, prio); + pos = lookup_priolist(se, deadline); list_for_each_entry_safe(rq, rn, &dfs, sched.link) { set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); list_add_tail(&rq->sched.link, pos); } } -static void queue_request(struct i915_sched *se, - struct i915_request *rq) +static bool queue_request(struct i915_request *rq) { - GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, lookup_priolist(se, rq_prio(rq))); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); -} - -static bool submit_queue(struct intel_engine_cs *engine, - const struct i915_request *rq) -{ - struct intel_engine_execlists *execlists = &engine->execlists; - - if (rq_prio(rq) <= execlists->queue_priority_hint) - return false; - - execlists->queue_priority_hint = rq_prio(rq); - return true; + return set_earliest_deadline(rq, I915_DEADLINE_NEVER); } static bool hold_request(const struct i915_request *rq) @@ -868,6 +1068,7 @@ void i915_request_enqueue(struct i915_request *rq) { struct intel_engine_cs *engine = rq->engine; struct i915_sched *se = intel_engine_get_scheduler(engine); + u64 dl = earliest_deadline(rq); unsigned long flags; bool kick = false; @@ -882,11 +1083,11 @@ void i915_request_enqueue(struct i915_request *rq) list_add_tail(&rq->sched.link, &se->hold); i915_request_set_hold(rq); } else { - queue_request(se, rq); - + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + kick = __i915_request_set_deadline(rq, + min(dl, rq_deadline(rq))); + GEM_BUG_ON(rq_deadline(rq) == I915_DEADLINE_NEVER); GEM_BUG_ON(i915_sched_is_idle(se)); - - kick = submit_queue(engine, rq); } GEM_BUG_ON(list_empty(&rq->sched.link)); @@ -900,8 +1101,8 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) { struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rq, *rn, *active = NULL; + u64 deadline = I915_DEADLINE_NEVER; struct list_head *pl; - int prio = I915_PRIORITY_INVALID; lockdep_assert_held(&se->lock); @@ -913,13 +1114,20 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) __i915_request_unsubmit(rq); - GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); - if (rq_prio(rq) != prio) { - prio = rq_prio(rq); - pl = lookup_priolist(se, prio); + if (__i915_request_has_started(rq)) { + u64 deadline = + i915_scheduler_next_virtual_deadline(rq_prio(rq)); + rq->sched.deadline = min(rq_deadline(rq), deadline); + } + GEM_BUG_ON(rq_deadline(rq) == I915_DEADLINE_NEVER); + + if (rq_deadline(rq) != deadline) { + deadline = rq_deadline(rq); + pl = lookup_priolist(se, deadline); } GEM_BUG_ON(i915_sched_is_idle(se)); + GEM_BUG_ON(i915_request_in_priority_queue(rq)); list_move(&rq->sched.link, pl); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); @@ -1025,14 +1233,10 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, { struct i915_sched *se = intel_engine_get_scheduler(engine); LIST_HEAD(list); + bool submit = false; lockdep_assert_held(&se->lock); - if (rq_prio(rq) > engine->execlists.queue_priority_hint) { - engine->execlists.queue_priority_hint = rq_prio(rq); - i915_sched_kick(se); - } - if (!i915_request_on_hold(rq)) return; @@ -1053,7 +1257,7 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, i915_request_clear_hold(rq); list_del_init(&rq->sched.link); - queue_request(se, rq); + submit |= queue_request(rq); /* Also release any children on this engine that are ready */ for_each_waiter(p, rq) { @@ -1083,6 +1287,18 @@ void __i915_sched_resume_request(struct intel_engine_cs *engine, rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); } while (rq); + + if (submit) + i915_sched_kick(se); +} + +void i915_request_update_deadline(struct i915_request *rq) +{ + if (!i915_request_in_priority_queue(rq)) + return; + + /* Recompute our deadlines and promote after a priority change */ + i915_request_set_deadline(rq, earliest_deadline(rq)); } void i915_sched_resume_request(struct intel_engine_cs *engine, @@ -1111,10 +1327,12 @@ void i915_sched_node_init(struct i915_sched_node *node) void i915_sched_node_reinit(struct i915_sched_node *node) { node->attr.priority = I915_PRIORITY_INVALID; + node->deadline = I915_DEADLINE_NEVER; node->semaphores = 0; node->flags = 0; GEM_BUG_ON(node->ipi_link); + node->ipi_deadline = I915_DEADLINE_NEVER; node->ipi_priority = I915_PRIORITY_INVALID; GEM_BUG_ON(!list_empty(&node->signalers_list)); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 7f9ee3dc6551..f8057abed1e7 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -47,6 +47,11 @@ void i915_sched_park(struct i915_sched *se); void i915_sched_fini(struct i915_sched *se); void i915_request_set_priority(struct i915_request *request, int prio); +void i915_request_set_deadline(struct i915_request *request, u64 deadline); + +void i915_request_update_deadline(struct i915_request *request); + +u64 i915_scheduler_next_virtual_deadline(int priority); void i915_request_enqueue(struct i915_request *request); @@ -65,11 +70,14 @@ bool i915_sched_suspend_request(struct intel_engine_cs *engine, void i915_sched_resume_request(struct intel_engine_cs *engine, struct i915_request *rq); -void __i915_priolist_free(struct i915_priolist *p); -static inline void i915_priolist_free(struct i915_priolist *p) +static inline u64 i915_sched_to_ticks(ktime_t kt) { - if (p->priority != I915_PRIORITY_NORMAL) - __i915_priolist_free(p); + return ktime_to_ns(kt) >> I915_SCHED_DEADLINE_SHIFT; +} + +static inline u64 i915_sched_to_ns(u64 deadline) +{ + return deadline << I915_SCHED_DEADLINE_SHIFT; } static inline bool i915_sched_is_idle(const struct i915_sched *se) diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index ad35fabf9f6e..34c46c526f74 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -157,8 +157,31 @@ struct i915_sched_node { #define I915_SCHED_HAS_EXTERNAL_CHAIN BIT(0) unsigned long semaphores; + /** + * @deadline: [virtual] deadline + * + * When the request is ready for execution, it is given a quota + * (the engine's timeslice) and a virtual deadline. The virtual + * deadline is derived from the current time: + * ktime_get() + (prio_ratio * timeslice) + * + * Requests are then executed in order of deadline completion. + * Requests with earlier deadlines than currently executing on + * the engine will preempt the active requests. + * + * By treating it as a virtual deadline, we use it as a hint for + * when it is appropriate for a request to start with respect to + * all other requests in the system. It is not a hard deadline, as + * we allow requests to miss them, and we do not account for the + * request runtime. + */ + u64 deadline; +#define I915_SCHED_DEADLINE_SHIFT 19 /* i.e. roughly 500us buckets */ +#define I915_DEADLINE_NEVER U64_MAX + /* handle being scheduled for PI from outside of our active.lock */ struct i915_request *ipi_link; + u64 ipi_deadline; int ipi_priority; }; diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index 8035ea7565ed..c5d7427bd429 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -2129,6 +2129,7 @@ static int measure_preemption(struct intel_context *ce) intel_ring_advance(rq, cs); rq->sched.attr.priority = I915_PRIORITY_BARRIER; + rq->sched.deadline = 0; elapsed[i - 1] = ENGINE_READ_FW(ce->engine, RING_TIMESTAMP); i915_request_add(rq); diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c index f179f1cb760a..ebe0239dc3ae 100644 --- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c +++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c @@ -12,6 +12,40 @@ #include "selftests/igt_spinner.h" #include "selftests/i915_random.h" +static int mock_scheduler_slices(void *dummy) +{ + u64 min, max, normal, kernel; + + min = prio_slice(I915_PRIORITY_MIN); + pr_info("%8s slice: %lluus\n", "min", min >> 10); + + normal = prio_slice(0); + pr_info("%8s slice: %lluus\n", "normal", normal >> 10); + + max = prio_slice(I915_PRIORITY_MAX); + pr_info("%8s slice: %lluus\n", "max", max >> 10); + + kernel = prio_slice(I915_PRIORITY_BARRIER); + pr_info("%8s slice: %lluus\n", "kernel", kernel >> 10); + + if (kernel != 0) { + pr_err("kernel prio slice should be 0\n"); + return -EINVAL; + } + + if (max >= normal) { + pr_err("maximum prio slice should be shorter than normal\n"); + return -EINVAL; + } + + if (min <= normal) { + pr_err("minimum prio slice should be longer than normal\n"); + return -EINVAL; + } + + return 0; +} + static int mock_skiplist_levels(void *dummy) { struct i915_priolist_root root = {}; @@ -54,6 +88,7 @@ static int mock_skiplist_levels(void *dummy) int i915_scheduler_mock_selftests(void) { static const struct i915_subtest tests[] = { + SUBTEST(mock_scheduler_slices), SUBTEST(mock_skiplist_levels), }; @@ -556,6 +591,53 @@ static int igt_priority_chains(void *arg) return igt_schedule_chains(arg, igt_priority); } +static bool igt_deadline(struct i915_request *rq, + unsigned long v, unsigned long e) +{ + i915_request_set_deadline(rq, 0); + GEM_BUG_ON(rq_deadline(rq) != 0); + return true; +} + +static int igt_deadline_chains(void *arg) +{ + return igt_schedule_chains(arg, igt_deadline); +} + +static bool igt_defer(struct i915_request *rq, unsigned long v, unsigned long e) +{ + struct intel_engine_cs *engine = rq->engine; + struct i915_sched *se = intel_engine_get_scheduler(engine); + + /* XXX No generic means to unwind incomplete requests yet */ + if (!i915_request_in_priority_queue(rq)) + return false; + + if (!intel_engine_has_preemption(engine)) + return false; + + spin_lock_irq(&se->lock); + + /* Push all the requests to the same deadline */ + __i915_request_set_deadline(rq, 0); + GEM_BUG_ON(rq_deadline(rq) != 0); + + /* Then the very first request must be the one everyone depends on */ + rq = list_first_entry(lookup_priolist(se, 0), typeof(*rq), sched.link); + GEM_BUG_ON(rq->engine != engine); + + /* Deferring the first request will then have to defer all requests */ + __i915_sched_defer_request(engine, rq); + + spin_unlock_irq(&se->lock); + return true; +} + +static int igt_deadline_defer(void *arg) +{ + return igt_schedule_chains(arg, igt_defer); +} + static struct i915_request * __write_timestamp(struct intel_engine_cs *engine, struct drm_i915_gem_object *obj, @@ -777,13 +859,22 @@ static int igt_priority_cycle(void *arg) return __igt_schedule_cycle(arg, igt_priority); } +static int igt_deadline_cycle(void *arg) +{ + return __igt_schedule_cycle(arg, igt_deadline); +} + int i915_scheduler_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { + SUBTEST(igt_deadline_chains), SUBTEST(igt_priority_chains), SUBTEST(igt_schedule_cycle), + SUBTEST(igt_deadline_cycle), SUBTEST(igt_priority_cycle), + + SUBTEST(igt_deadline_defer), }; return i915_subtests(tests, i915); @@ -919,9 +1010,54 @@ static int sparse_priority(void *arg) return sparse(arg, set_priority); } +static u64 __set_deadline(struct i915_request *rq, u64 deadline) +{ + u64 dt; + + preempt_disable(); + dt = ktime_get_raw_fast_ns(); + i915_request_set_deadline(rq, deadline); + dt = ktime_get_raw_fast_ns() - dt; + preempt_enable(); + + return dt; +} + +static bool set_deadline(struct i915_request *rq, + unsigned long v, unsigned long e) +{ + report("set-deadline", v, e, __set_deadline(rq, 0)); + return true; +} + +static int single_deadline(void *arg) +{ + return single(arg, set_deadline); +} + +static int wide_deadline(void *arg) +{ + return wide(arg, set_deadline); +} + +static int inv_deadline(void *arg) +{ + return inv(arg, set_deadline); +} + +static int sparse_deadline(void *arg) +{ + return sparse(arg, set_deadline); +} + int i915_scheduler_perf_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { + SUBTEST(single_deadline), + SUBTEST(wide_deadline), + SUBTEST(inv_deadline), + SUBTEST(sparse_deadline), + SUBTEST(single_priority), SUBTEST(wide_priority), SUBTEST(inv_priority), From patchwork Mon Feb 1 08:56:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94352C433E6 for ; Mon, 1 Feb 2021 08:58:03 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3B23A64E33 for ; Mon, 1 Feb 2021 08:58:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3B23A64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F397F6E50E; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 653376E4AD for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757756-1500050 for multiple; Mon, 01 Feb 2021 08:57:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:56 +0000 Message-Id: <20210201085715.27435-38-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 38/57] drm/i915/gt: Specify a deadline for the heartbeat X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" As we know when we expect the heartbeat to be checked for completion, pass this information along as its deadline. We still do not complain if the deadline is missed, at least until we have tried a few times, but it will allow for quicker hang detection on systems where deadlines are adhered to. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index f1811e79401e..0f0bf9e4d34f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -65,6 +65,16 @@ static void heartbeat_commit(struct i915_request *rq, __i915_request_queue(rq, attr); } +static void set_heartbeat_deadline(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + unsigned long interval; + + interval = READ_ONCE(engine->props.heartbeat_interval_ms); + if (interval) + i915_request_set_deadline(rq, ktime_get() + (interval << 20)); +} + static void show_heartbeat(const struct i915_request *rq, struct intel_engine_cs *engine) { @@ -128,6 +138,8 @@ static void heartbeat(struct work_struct *wrk) attr.priority = I915_PRIORITY_BARRIER; local_bh_disable(); + if (attr.priority == I915_PRIORITY_BARRIER) + i915_request_set_deadline(rq, 0); i915_request_set_priority(rq, attr.priority); local_bh_enable(); } else { @@ -160,6 +172,7 @@ static void heartbeat(struct work_struct *wrk) if (IS_ERR(rq)) goto unlock; + set_heartbeat_deadline(engine, rq); heartbeat_commit(rq, &attr); unlock: From patchwork Mon Feb 1 08:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4221EC433E9 for ; Mon, 1 Feb 2021 08:57:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E557864E3F for ; Mon, 1 Feb 2021 08:57:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E557864E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D7C456E4AF; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5E2476E49B for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757759-1500050 for multiple; Mon, 01 Feb 2021 08:57:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:57 +0000 Message-Id: <20210201085715.27435-39-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 39/57] drm/i915: Extend the priority boosting for the display with a deadline X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" For a modeset/pageflip, there is a very precise deadline by which the frame must be completed in order to hit the vblank and be shown. While we don't pass along that exact information, we can at least inform the scheduler that this request-chain needs to be completed asap. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/display/intel_display.c | 7 +++++-- drivers/gpu/drm/i915/gem/i915_gem_object.h | 5 +++-- drivers/gpu/drm/i915/gem/i915_gem_wait.c | 21 ++++++++++++-------- 3 files changed, 21 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index aca964f7ba72..ce59119ac14a 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -13702,7 +13702,8 @@ intel_prepare_plane_fb(struct drm_plane *_plane, if (new_plane_state->uapi.fence) { /* explicit fencing */ i915_gem_fence_wait_priority(new_plane_state->uapi.fence, - I915_PRIORITY_DISPLAY); + I915_PRIORITY_DISPLAY, + ktime_get() /* next vblank? */); ret = i915_sw_fence_await_dma_fence(&state->commit_ready, new_plane_state->uapi.fence, i915_fence_timeout(dev_priv), @@ -13724,7 +13725,9 @@ intel_prepare_plane_fb(struct drm_plane *_plane, if (ret) return ret; - i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY); + i915_gem_object_wait_priority(obj, 0, + I915_PRIORITY_DISPLAY, + ktime_get() /* next vblank? */); i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB); if (!new_plane_state->uapi.fence) { /* implicit fencing */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 325766abca21..9935a2e59df0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -549,14 +549,15 @@ static inline void __start_cpu_write(struct drm_i915_gem_object *obj) obj->cache_dirty = true; } -void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio); +void i915_gem_fence_wait_priority(struct dma_fence *fence, + int prio, ktime_t deadline); int i915_gem_object_wait(struct drm_i915_gem_object *obj, unsigned int flags, long timeout); int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, unsigned int flags, - int prio); + int prio, ktime_t deadline); void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, enum fb_op_origin origin); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c index 4d1897c347b9..162f9737965f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -92,11 +92,14 @@ i915_gem_object_wait_reservation(struct dma_resv *resv, return timeout; } -static void fence_set_priority(struct dma_fence *fence, int prio) +static void +fence_set_priority(struct dma_fence *fence, int prio, ktime_t deadline) { if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence)) return; + i915_request_set_deadline(to_request(fence), + i915_sched_to_ticks(deadline)); i915_request_set_priority(to_request(fence), prio); } @@ -105,7 +108,8 @@ static inline bool __dma_fence_is_chain(const struct dma_fence *fence) return fence->ops == &dma_fence_chain_ops; } -void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio) +void i915_gem_fence_wait_priority(struct dma_fence *fence, + int prio, ktime_t deadline) { if (dma_fence_is_signaled(fence)) return; @@ -118,19 +122,19 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio) int i; for (i = 0; i < array->num_fences; i++) - fence_set_priority(array->fences[i], prio); + fence_set_priority(array->fences[i], prio, deadline); } else if (__dma_fence_is_chain(fence)) { struct dma_fence *iter; /* The chain is ordered; if we boost the last, we boost all */ dma_fence_chain_for_each(iter, fence) { fence_set_priority(to_dma_fence_chain(iter)->fence, - prio); + prio, deadline); break; } dma_fence_put(iter); } else { - fence_set_priority(fence, prio); + fence_set_priority(fence, prio, deadline); } local_bh_enable(); /* kick the tasklets if queues were reprioritised */ @@ -139,7 +143,8 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio) int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, unsigned int flags, - int prio) + int prio, + ktime_t deadline) { struct dma_fence *excl; @@ -154,7 +159,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, return ret; for (i = 0; i < count; i++) { - i915_gem_fence_wait_priority(shared[i], prio); + i915_gem_fence_wait_priority(shared[i], prio, deadline); dma_fence_put(shared[i]); } @@ -164,7 +169,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, } if (excl) { - i915_gem_fence_wait_priority(excl, prio); + i915_gem_fence_wait_priority(excl, prio, deadline); dma_fence_put(excl); } return 0; From patchwork Mon Feb 1 08:56:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8996CC433DB for ; Mon, 1 Feb 2021 08:58:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 29AD464E33 for ; Mon, 1 Feb 2021 08:58:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29AD464E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 94A856E513; Mon, 1 Feb 2021 08:57:38 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5DADC6E49A for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757761-1500050 for multiple; Mon, 01 Feb 2021 08:57:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:58 +0000 Message-Id: <20210201085715.27435-40-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 40/57] drm/i915/gt: Support virtual engine queues X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Allow multiple requests to be queued unto a virtual engine, whereas before we only allowed a single request to be queued at a time. The advantage of keeping just one request in the queue was to ensure that we always decided late which engine to use. However, with the introduction of the virtual deadline we throttle submission and still only drip one request into the sibling at a time (unless it is truly empty, but then a second request will have an earlier deadline than the queued virtual engine and force itself in front). This also takes advantage that a virtual engine will remain bound while it is active, i.e. we can not switch to a second engine until the context is completed -- such that we cannot be as lazy as lazy can be. By allowing a full queue, we avoid having to synchronize via the breadcrumb interrupt everytime, letting the virtual engine reach the full throughput of the siblings. Signed-off-by: Chris Wilson --- .../drm/i915/gt/intel_execlists_submission.c | 435 +++++++++--------- drivers/gpu/drm/i915/gt/selftest_execlists.c | 146 ------ drivers/gpu/drm/i915/i915_request.c | 12 +- drivers/gpu/drm/i915/i915_scheduler.c | 70 ++- drivers/gpu/drm/i915/i915_scheduler.h | 4 +- 5 files changed, 281 insertions(+), 386 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 31d36057c729..9c929688a955 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -160,17 +160,6 @@ struct virtual_engine { struct intel_context context; struct rcu_work rcu; - /* - * We allow only a single request through the virtual engine at a time - * (each request in the timeline waits for the completion fence of - * the previous before being submitted). By restricting ourselves to - * only submitting a single request, each request is placed on to a - * physical to maximise load spreading (by virtue of the late greedy - * scheduling -- each real engine takes the next available request - * upon idling). - */ - struct i915_request *request; - /* * We keep a rbtree of available virtual engines inside each physical * engine, sorted by priority. Here we preallocate the nodes we need @@ -274,17 +263,24 @@ static struct i915_request *first_request(const struct i915_sched *se) sched.link); } -static struct i915_request *first_virtual(const struct intel_engine_cs *engine) +static struct virtual_engine * +first_virtual_engine(const struct intel_engine_cs *engine) { - struct rb_node *rb; + return rb_entry_safe(rb_first_cached(&engine->execlists.virtual), + struct virtual_engine, + nodes[engine->id].rb); +} - rb = rb_first_cached(&engine->execlists.virtual); - if (!rb) +static const struct i915_request * +first_virtual(const struct intel_engine_cs *engine) +{ + struct virtual_engine *ve; + + ve = first_virtual_engine(engine); + if (!ve) return NULL; - return READ_ONCE(rb_entry(rb, - struct virtual_engine, - nodes[engine->id].rb)->request); + return first_request(&ve->base.sched); } static const struct i915_request * @@ -500,7 +496,7 @@ static void execlists_schedule_in(struct i915_request *rq, int idx) trace_i915_request_in(rq, idx); old = ce->inflight; - if (!old) + if (!__intel_context_inflight_count(old)) old = __execlists_schedule_in(rq); WRITE_ONCE(ce->inflight, ptr_inc(old)); @@ -510,31 +506,43 @@ static void execlists_schedule_in(struct i915_request *rq, int idx) static void resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) { - struct i915_sched *se = i915_request_get_scheduler(rq); + struct i915_sched *se = intel_engine_get_scheduler(&ve->base); + struct i915_sched *pv = i915_request_get_scheduler(rq); + struct i915_request *pos = rq; + struct intel_timeline *tl; - spin_lock_irq(&se->lock); + spin_lock_irq(&pv->lock); - clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - WRITE_ONCE(rq->engine, &ve->base); - ve->base.sched.submit_request(rq); + if (__i915_request_is_complete(rq)) + goto unlock; - spin_unlock_irq(&se->lock); + tl = i915_request_active_timeline(rq); + + /* Rewind back to the start of this virtual engine queue */ + list_for_each_entry_continue_reverse(rq, &tl->requests, link) { + if (!i915_request_in_priority_queue(rq)) + break; + + pos = rq; + } + + /* Resubmit the queue in execution order */ + spin_lock(&se->lock); + list_for_each_entry_from(pos, &tl->requests, link) { + if (pos->engine == &ve->base) + break; + + __i915_request_requeue(pos, &ve->base); + } + spin_unlock(&se->lock); + +unlock: + spin_unlock_irq(&pv->lock); } static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); - struct intel_engine_cs *engine = rq->engine; - - /* - * After this point, the rq may be transferred to a new sibling, so - * before we clear ce->inflight make sure that the context has been - * removed from the b->signalers and furthermore we need to make sure - * that the concurrent iterator in signal_irq_work is no longer - * following ce->signal_link. - */ - if (!list_empty(&ce->signals)) - intel_context_remove_breadcrumbs(ce, engine->breadcrumbs); /* * This engine is now too busy to run this virtual request, so @@ -543,11 +551,11 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) * same as other native request. */ if (i915_request_in_priority_queue(rq) && - rq->execution_mask != engine->mask) + rq->execution_mask != rq->engine->mask) resubmit_virtual_request(rq, ve); - if (READ_ONCE(ve->request)) - intel_engine_kick_scheduler(&ve->base); + if (!i915_sched_is_idle(&ve->base.sched)) + i915_sched_kick(&ve->base.sched); } static void __execlists_schedule_out(struct i915_request * const rq, @@ -896,10 +904,16 @@ static bool ctx_single_port_submission(const struct intel_context *ce) intel_context_force_single_submission(ce)); } +static bool __can_merge_ctx(const struct intel_context *prev, + const struct intel_context *next) +{ + return prev == next; +} + static bool can_merge_ctx(const struct intel_context *prev, const struct intel_context *next) { - if (prev != next) + if (!__can_merge_ctx(prev, next)) return false; if (ctx_single_port_submission(prev)) @@ -970,31 +984,6 @@ static bool virtual_matches(const struct virtual_engine *ve, return true; } -static struct virtual_engine * -first_virtual_engine(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists *el = &engine->execlists; - struct rb_node *rb = rb_first_cached(&el->virtual); - - while (rb) { - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - struct i915_request *rq = READ_ONCE(ve->request); - - /* lazily cleanup after another engine handled rq */ - if (!rq || !virtual_matches(ve, rq, engine)) { - rb_erase_cached(rb, &el->virtual); - RB_CLEAR_NODE(rb); - rb = rb_first_cached(&el->virtual); - continue; - } - - return ve; - } - - return NULL; -} - static void virtual_xfer_context(struct virtual_engine *ve, struct intel_engine_cs *engine) { @@ -1003,6 +992,10 @@ static void virtual_xfer_context(struct virtual_engine *ve, if (likely(engine == ve->siblings[0])) return; + if (!list_empty(&ve->context.signals)) + intel_context_remove_breadcrumbs(&ve->context, + ve->siblings[0]->breadcrumbs); + GEM_BUG_ON(READ_ONCE(ve->context.inflight)); if (!intel_engine_has_relative_mmio(engine)) lrc_update_offsets(&ve->context, engine); @@ -1175,6 +1168,118 @@ static bool completed(const struct i915_request *rq) return __i915_request_is_complete(rq); } +static void __virtual_dequeue(struct virtual_engine *ve, + struct intel_engine_cs *sibling) +{ + struct ve_node * const node = &ve->nodes[sibling->id]; + struct rb_node **parent, *rb; + struct i915_request *rq; + u64 deadline; + bool first; + + rb_erase_cached(&node->rb, &sibling->execlists.virtual); + RB_CLEAR_NODE(&node->rb); + + rq = first_request(&ve->base.sched); + if (!virtual_matches(ve, rq, sibling)) + return; + + rb = NULL; + first = true; + parent = &sibling->execlists.virtual.rb_root.rb_node; + deadline = rq_deadline(rq); + while (*parent) { + struct ve_node *other; + + rb = *parent; + other = rb_entry(rb, typeof(*other), rb); + if (deadline <= other->deadline) { + parent = &rb->rb_left; + } else { + parent = &rb->rb_right; + first = false; + } + } + + rb_link_node(&node->rb, rb, parent); + rb_insert_color_cached(&node->rb, &sibling->execlists.virtual, first); +} + +static void virtual_requeue(struct intel_engine_cs *engine, + struct i915_request *last) +{ + const struct i915_request * const first = + first_request(intel_engine_get_scheduler(engine)); + struct virtual_engine *ve; + + while ((ve = first_virtual_engine(engine))) { + struct i915_sched *se = intel_engine_get_scheduler(&ve->base); + struct i915_request *rq; + + spin_lock(&se->lock); + + rq = first_request(se); + if (unlikely(!virtual_matches(ve, rq, engine))) + /* lost the race to a sibling */ + goto unlock; + + GEM_BUG_ON(rq->engine != &ve->base); + GEM_BUG_ON(rq->context != &ve->context); + + if (last && !__can_merge_ctx(last->context, rq->context)) { + spin_unlock(&se->lock); + return; /* leave this for another sibling? */ + } + + if (!dl_before(rq, first)) { + spin_unlock(&se->lock); + return; + } + + ENGINE_TRACE(engine, + "virtual rq=%llx:%lld%s, dl %lld, new engine? %s\n", + rq->fence.context, + rq->fence.seqno, + __i915_request_is_complete(rq) ? "!" : + __i915_request_has_started(rq) ? "*" : + "", + rq_deadline(rq), + yesno(engine != ve->siblings[0])); + + GEM_BUG_ON(!(rq->execution_mask & engine->mask)); + if (__i915_request_requeue(rq, engine)) { + /* + * Only after we confirm that we will submit + * this request (i.e. it has not already + * completed), do we want to update the context. + * + * This serves two purposes. It avoids + * unnecessary work if we are resubmitting an + * already completed request after timeslicing. + * But more importantly, it prevents us altering + * ve->siblings[] on an idle context, where + * we may be using ve->siblings[] in + * virtual_context_enter / virtual_context_exit. + */ + virtual_xfer_context(ve, engine); + + /* Bind this ve before we release the lock */ + if (!ve->context.inflight) + WRITE_ONCE(ve->context.inflight, engine); + + GEM_BUG_ON(rq->engine != engine); + GEM_BUG_ON(ve->siblings[0] != engine); + GEM_BUG_ON(intel_context_inflight(rq->context) != engine); + + last = rq; + } + +unlock: + __virtual_dequeue(ve, engine); + spin_unlock(&se->lock); + } +} + static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -1182,9 +1287,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; struct i915_request *last, * const *active; - struct virtual_engine *ve; struct i915_priolist *pl; - struct rb_node *rb; bool submit = false; /* @@ -1315,83 +1418,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } } - /* XXX virtual is always taking precedence */ - while ((ve = first_virtual_engine(engine))) { - struct i915_request *rq; - - spin_lock(&ve->base.sched.lock); - - rq = ve->request; - if (unlikely(!virtual_matches(ve, rq, engine))) - goto unlock; /* lost the race to a sibling */ - - GEM_BUG_ON(rq->engine != &ve->base); - GEM_BUG_ON(rq->context != &ve->context); - - if (!dl_before(rq, first_request(se))) { - spin_unlock(&ve->base.sched.lock); - break; - } - - if (last && !can_merge_rq(last, rq)) { - spin_unlock(&ve->base.sched.lock); - spin_unlock(&se->lock); - return; /* leave this for another sibling */ - } - - ENGINE_TRACE(engine, - "virtual rq=%llx:%lld%s, dl %llx, new engine? %s\n", - rq->fence.context, - rq->fence.seqno, - __i915_request_is_complete(rq) ? "!" : - __i915_request_has_started(rq) ? "*" : - "", - rq_deadline(rq), - yesno(engine != ve->siblings[0])); - WRITE_ONCE(ve->request, NULL); - - rb = &ve->nodes[engine->id].rb; - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); - - GEM_BUG_ON(!(rq->execution_mask & engine->mask)); - WRITE_ONCE(rq->engine, engine); - - if (__i915_request_submit(rq)) { - /* - * Only after we confirm that we will submit - * this request (i.e. it has not already - * completed), do we want to update the context. - * - * This serves two purposes. It avoids - * unnecessary work if we are resubmitting an - * already completed request after timeslicing. - * But more importantly, it prevents us altering - * ve->siblings[] on an idle context, where - * we may be using ve->siblings[] in - * virtual_context_enter / virtual_context_exit. - */ - virtual_xfer_context(ve, engine); - GEM_BUG_ON(ve->siblings[0] != engine); - - submit = true; - last = rq; - } - - i915_request_put(rq); -unlock: - spin_unlock(&ve->base.sched.lock); - - /* - * Hmm, we have a bunch of virtual engine requests, - * but the first one was already completed (thanks - * preempt-to-busy!). Keep looking at the veng queue - * until we have no more relevant requests (i.e. - * the normal submit queue has higher priority). - */ - if (submit) - break; - } + if (!RB_EMPTY_ROOT(&execlists->virtual.rb_root)) + virtual_requeue(engine, last); for_each_priolist(pl, &se->queue) { struct i915_request *rq, *rn; @@ -1399,6 +1427,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) priolist_for_each_request_safe(rq, rn, pl) { bool merge = true; + GEM_BUG_ON(i915_request_get_scheduler(rq) != se); + /* * Can we combine this request with the current port? * It has to be the same context/ringbuffer and not @@ -2715,14 +2745,15 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) RB_CLEAR_NODE(rb); spin_lock(&ve->base.sched.lock); - rq = fetch_and_zero(&ve->request); - if (rq) { - if (i915_request_mark_eio(rq)) { - rq->engine = engine; - __i915_request_submit(rq); - i915_request_put(rq); + for_each_priolist(pl, &ve->base.sched.queue) { + priolist_for_each_request_safe(rq, rn, pl) { + if (i915_request_mark_eio(rq)) { + rq->engine = engine; + __i915_request_submit(rq); + i915_request_put(rq); + } } - i915_request_put(rq); + i915_priolist_advance(&ve->base.sched.queue, pl); } spin_unlock(&ve->base.sched.lock); } @@ -3039,11 +3070,6 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) return 0; } -static struct list_head *virtual_queue(struct virtual_engine *ve) -{ - return &ve->base.sched.default_priolist.requests; -} - static void rcu_virtual_context_destroy(struct work_struct *wrk) { struct virtual_engine *ve = @@ -3054,19 +3080,19 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) GEM_BUG_ON(ve->context.inflight); /* Preempt-to-busy may leave a stale request behind. */ - if (unlikely(ve->request)) { - struct i915_request *old; + if (unlikely(!i915_sched_is_idle(se))) { + struct i915_request *rq, *rn; + struct i915_priolist *pl; spin_lock_irq(&se->lock); - - old = fetch_and_zero(&ve->request); - if (old) { - GEM_BUG_ON(!__i915_request_is_complete(old)); - __i915_request_submit(old); - i915_request_put(old); + for_each_priolist(pl, &se->queue) { + priolist_for_each_request_safe(rq, rn, pl) + __i915_request_submit(rq); + i915_priolist_advance(&se->queue, pl); } - spin_unlock_irq(&se->lock); + + GEM_BUG_ON(!i915_sched_is_idle(se)); } /* @@ -3095,7 +3121,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) spin_unlock_irq(&sibling->sched.lock); } GEM_BUG_ON(__tasklet_is_scheduled(&se->tasklet)); - GEM_BUG_ON(!list_empty(virtual_queue(ve))); lrc_fini(&ve->context); intel_context_fini(&ve->context); @@ -3214,46 +3239,43 @@ static const struct intel_context_ops virtual_context_ops = { .destroy = virtual_context_destroy, }; -static intel_engine_mask_t +static struct i915_request * virtual_submission_mask(struct virtual_engine *ve, u64 *deadline) { struct i915_request *rq; - intel_engine_mask_t mask; - rq = READ_ONCE(ve->request); + rq = first_request(&ve->base.sched); if (!rq) - return 0; + return NULL; /* The rq is ready for submission; rq->execution_mask is now stable. */ - mask = rq->execution_mask; - if (unlikely(!mask)) { + if (unlikely(!rq->execution_mask)) { /* Invalid selection, submit to a random engine in error */ i915_request_set_error_once(rq, -ENODEV); - mask = ve->siblings[0]->mask; + WRITE_ONCE(rq->execution_mask, ALL_ENGINES); } *deadline = rq_deadline(rq); ENGINE_TRACE(&ve->base, "rq=%llx:%llu, mask=%x, dl=%llu\n", rq->fence.context, rq->fence.seqno, - mask, *deadline); + rq->execution_mask, *deadline); - return mask; + return rq; } static void virtual_submission_tasklet(struct tasklet_struct *t) { struct virtual_engine * const ve = from_tasklet(ve, t, base.sched.tasklet); - intel_engine_mask_t mask; + struct i915_request *rq; unsigned int n; u64 deadline; rcu_read_lock(); - mask = virtual_submission_mask(ve, &deadline); - rcu_read_unlock(); - if (unlikely(!mask)) - return; + rq = virtual_submission_mask(ve, &deadline); + if (unlikely(!rq)) + goto out; for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); @@ -3262,12 +3284,9 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) struct rb_node **parent, *rb; bool first; - if (!READ_ONCE(ve->request)) - break; /* already handled by a sibling's tasklet */ - spin_lock_irq(&se->lock); - if (unlikely(!(mask & sibling->mask))) { + if (unlikely(!virtual_matches(ve, rq, sibling))) { if (!RB_EMPTY_NODE(&node->rb)) { rb_erase_cached(&node->rb, &sibling->execlists.virtual); @@ -3324,46 +3343,9 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) if (intel_context_inflight(&ve->context)) break; } -} -static void virtual_submit_request(struct i915_request *rq) -{ - struct virtual_engine *ve = to_virtual_engine(rq->engine); - struct i915_sched *se = intel_engine_get_scheduler(&ve->base); - unsigned long flags; - - ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", - rq->fence.context, - rq->fence.seqno); - - GEM_BUG_ON(ve->base.sched.submit_request != virtual_submit_request); - - spin_lock_irqsave(&se->lock, flags); - - /* By the time we resubmit a request, it may be completed */ - if (__i915_request_is_complete(rq)) { - __i915_request_submit(rq); - goto unlock; - } - - if (ve->request) { /* background completion from preempt-to-busy */ - GEM_BUG_ON(!__i915_request_is_complete(ve->request)); - __i915_request_submit(ve->request); - i915_request_put(ve->request); - } - - rq->sched.deadline = - min(rq->sched.deadline, - i915_scheduler_next_virtual_deadline(rq_prio(rq))); - ve->request = i915_request_get(rq); - - GEM_BUG_ON(!list_empty(virtual_queue(ve))); - list_move_tail(&rq->sched.link, virtual_queue(ve)); - - intel_engine_kick_scheduler(&ve->base); - -unlock: - spin_unlock_irqrestore(&se->lock, flags); +out: + rcu_read_unlock(); } static struct ve_bond * @@ -3455,8 +3437,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.bond_execute = virtual_bond_execute; - INIT_LIST_HEAD(virtual_queue(ve)); - intel_context_init(&ve->context, &ve->base); ve->base.breadcrumbs = intel_breadcrumbs_create(NULL); @@ -3539,7 +3519,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ENGINE_VIRTUAL); ve->base.sched.flags = sched; - ve->base.sched.submit_request = virtual_submit_request; + ve->base.sched.submit_request = i915_request_enqueue; tasklet_setup(&ve->base.sched.tasklet, virtual_submission_tasklet); virtual_engine_initial_hint(ve); @@ -3675,8 +3655,9 @@ static void execlists_show(struct drm_printer *m, for (rb = rb_first_cached(&el->virtual); rb; rb = rb_next(rb)) { struct virtual_engine *ve = rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - struct i915_request *rq = READ_ONCE(ve->request); + struct i915_request *rq; + rq = first_request(&ve->base.sched); if (rq) { if (count++ < max - 1) show_request(m, rq, "\t\t", 0); diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 3d87674477da..721a66ef301a 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -4551,151 +4551,6 @@ static int live_virtual_bond(void *arg) return 0; } -static int reset_virtual_engine(struct intel_gt *gt, - struct intel_engine_cs **siblings, - unsigned int nsibling) -{ - struct intel_engine_cs *engine; - struct intel_context *ve; - struct igt_spinner spin; - struct i915_request *rq; - struct i915_sched *se; - unsigned int n; - int err = 0; - - /* - * In order to support offline error capture for fast preempt reset, - * we need to decouple the guilty request and ensure that it and its - * descendents are not executed while the capture is in progress. - */ - - if (igt_spinner_init(&spin, gt)) - return -ENOMEM; - - ve = intel_execlists_create_virtual(siblings, nsibling); - if (IS_ERR(ve)) { - err = PTR_ERR(ve); - goto out_spin; - } - - for (n = 0; n < nsibling; n++) - st_engine_heartbeat_disable(siblings[n]); - - rq = igt_spinner_create_request(&spin, ve, MI_ARB_CHECK); - if (IS_ERR(rq)) { - err = PTR_ERR(rq); - goto out_heartbeat; - } - i915_request_add(rq); - - if (!igt_wait_for_spinner(&spin, rq)) { - intel_gt_set_wedged(gt); - err = -ETIME; - goto out_heartbeat; - } - - engine = rq->engine; - GEM_BUG_ON(engine == ve->engine); - se = intel_engine_get_scheduler(engine); - - /* Take ownership of the reset and tasklet */ - local_bh_disable(); - if (test_and_set_bit(I915_RESET_ENGINE + engine->id, - >->reset.flags)) { - local_bh_enable(); - intel_gt_set_wedged(gt); - err = -EBUSY; - goto out_heartbeat; - } - tasklet_disable(&se->tasklet); - - se->tasklet.callback(&se->tasklet); - GEM_BUG_ON(execlists_active(&engine->execlists) != rq); - - /* Fake a preemption event; failed of course */ - spin_lock_irq(&se->lock); - __i915_sched_rewind_requests(engine); - spin_unlock_irq(&se->lock); - GEM_BUG_ON(rq->engine != engine); - - /* Reset the engine while keeping our active request on hold */ - i915_sched_suspend_request(engine, rq); - GEM_BUG_ON(!i915_request_on_hold(rq)); - - __intel_engine_reset_bh(engine, NULL); - GEM_BUG_ON(rq->fence.error != -EIO); - - /* Release our grasp on the engine, letting CS flow again */ - tasklet_enable(&se->tasklet); - clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, >->reset.flags); - local_bh_enable(); - - /* Check that we do not resubmit the held request */ - i915_request_get(rq); - if (!i915_request_wait(rq, 0, HZ / 5)) { - pr_err("%s: on hold request completed!\n", - engine->name); - intel_gt_set_wedged(gt); - err = -EIO; - goto out_rq; - } - GEM_BUG_ON(!i915_request_on_hold(rq)); - - /* But is resubmitted on release */ - i915_sched_resume_request(engine, rq); - if (i915_request_wait(rq, 0, HZ / 5) < 0) { - pr_err("%s: held request did not complete!\n", - engine->name); - intel_gt_set_wedged(gt); - err = -ETIME; - } - -out_rq: - i915_request_put(rq); -out_heartbeat: - for (n = 0; n < nsibling; n++) - st_engine_heartbeat_enable(siblings[n]); - - intel_context_put(ve); -out_spin: - igt_spinner_fini(&spin); - return err; -} - -static int live_virtual_reset(void *arg) -{ - struct intel_gt *gt = arg; - struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1]; - unsigned int class; - - /* - * Check that we handle a reset event within a virtual engine. - * Only the physical engine is reset, but we have to check the flow - * of the virtual requests around the reset, and make sure it is not - * forgotten. - */ - - if (intel_uc_uses_guc_submission(>->uc)) - return 0; - - if (!intel_has_reset_engine(gt)) - return 0; - - for (class = 0; class <= MAX_ENGINE_CLASS; class++) { - int nsibling, err; - - nsibling = select_siblings(gt, class, siblings); - if (nsibling < 2) - continue; - - err = reset_virtual_engine(gt, siblings, nsibling); - if (err) - return err; - } - - return 0; -} - int intel_execlists_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { @@ -4727,7 +4582,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915) SUBTEST(live_virtual_preserved), SUBTEST(live_virtual_slice), SUBTEST(live_virtual_bond), - SUBTEST(live_virtual_reset), }; if (i915->gt.submission_method != INTEL_SUBMISSION_ELSP) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index ce828dc73402..aa12289ea14b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1290,6 +1290,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) GEM_BUG_ON(to == from); GEM_BUG_ON(to->timeline == from->timeline); + GEM_BUG_ON(to->context == from->context); if (i915_request_completed(from)) { i915_sw_fence_set_error_once(&to->submit, from->fence.error); @@ -1436,6 +1437,15 @@ i915_request_await_object(struct i915_request *to, return ret; } +static bool in_order_submission(const struct i915_request *prev, + const struct i915_request *rq) +{ + if (likely(prev->context == rq->context)) + return true; + + return is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask); +} + static struct i915_request * __i915_request_add_to_timeline(struct i915_request *rq) { @@ -1475,7 +1485,7 @@ __i915_request_add_to_timeline(struct i915_request *rq) i915_seqno_passed(prev->fence.seqno, rq->fence.seqno)); - if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) + if (in_order_submission(prev, rq)) i915_sw_fence_await_sw_fence(&rq->submit, &prev->submit, &rq->submitq); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 5b68a5f07f64..de65e747e809 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -494,7 +494,9 @@ static void remove_from_priolist(struct i915_sched *se, GEM_BUG_ON(!i915_request_in_priority_queue(rq)); __list_del_entry(&rq->sched.link); - if (tail) + if (!list) + INIT_LIST_HEAD(&rq->sched.link); + else if (tail) list_add_tail(&rq->sched.link, list); else list_add(&rq->sched.link, list); @@ -703,7 +705,7 @@ static u64 virtual_deadline(u64 kt, int priority) return i915_sched_to_ticks(kt + prio_slice(priority)); } -u64 i915_scheduler_next_virtual_deadline(int priority) +static u64 next_virtual_deadline(int priority) { return virtual_deadline(ktime_get_mono_fast_ns(), priority); } @@ -943,15 +945,13 @@ void i915_request_set_priority(struct i915_request *rq, int prio) spin_unlock_irqrestore(&engine->sched.lock, flags); } -void __i915_sched_defer_request(struct intel_engine_cs *engine, - struct i915_request *rq) +static void __defer_request(struct intel_engine_cs *engine, + struct i915_request *rq, + const u64 deadline) { struct list_head *pos = &rq->sched.waiters_list; struct i915_sched *se = intel_engine_get_scheduler(engine); struct i915_request *rn; - const u64 deadline = - max(rq_deadline(rq), - i915_scheduler_next_virtual_deadline(adj_prio(rq))); LIST_HEAD(dfs); SCHED_TRACE(se, "defer request " RQ_FMT "\n", RQ_ARG(rq)); @@ -1025,6 +1025,14 @@ void __i915_sched_defer_request(struct intel_engine_cs *engine, } } +void __i915_sched_defer_request(struct intel_engine_cs *engine, + struct i915_request *rq) +{ + __defer_request(engine, rq, + max(rq_deadline(rq), + next_virtual_deadline(adj_prio(rq)))); +} + static bool queue_request(struct i915_request *rq) { set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); @@ -1064,6 +1072,48 @@ static bool ancestor_on_hold(const struct i915_sched *se, return unlikely(!list_empty(&se->hold)) && hold_request(rq); } +bool __i915_request_requeue(struct i915_request *rq, + struct intel_engine_cs *engine) +{ + struct i915_sched *se = intel_engine_get_scheduler(engine); + + RQ_TRACE(rq, "transfer from %s to %s\n", + rq->engine->name, engine->name); + + lockdep_assert_held(&se->lock); + lockdep_assert_held(&i915_request_get_scheduler(rq)->lock); + GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags)); + GEM_BUG_ON(rq->engine == engine); + + remove_from_priolist(i915_request_get_scheduler(rq), rq, NULL, false); + WRITE_ONCE(rq->engine, engine); + + if (__i915_request_is_complete(rq)) { + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + return false; + } + + if (unlikely(ancestor_on_hold(se, rq))) { + RQ_TRACE(rq, "ancestor on hold\n"); + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + list_add_tail(&rq->sched.link, &se->hold); + i915_request_set_hold(rq); + } else { + u64 deadline = min(earliest_deadline(rq), rq_deadline(rq)); + + /* Maintain request ordering wrt to existing on target */ + __i915_request_set_deadline(rq, deadline); + if (!list_empty(&rq->sched.waiters_list)) + __defer_request(engine, rq, deadline); + + GEM_BUG_ON(rq_deadline(rq) == I915_DEADLINE_NEVER); + } + + GEM_BUG_ON(list_empty(&rq->sched.link)); + return true; +} + void i915_request_enqueue(struct i915_request *rq) { struct intel_engine_cs *engine = rq->engine; @@ -1115,9 +1165,9 @@ __i915_sched_rewind_requests(struct intel_engine_cs *engine) __i915_request_unsubmit(rq); if (__i915_request_has_started(rq)) { - u64 deadline = - i915_scheduler_next_virtual_deadline(rq_prio(rq)); - rq->sched.deadline = min(rq_deadline(rq), deadline); + rq->sched.deadline = + min(rq_deadline(rq), + next_virtual_deadline(rq_prio(rq))); } GEM_BUG_ON(rq_deadline(rq) == I915_DEADLINE_NEVER); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index f8057abed1e7..c5612dd4a081 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -51,9 +51,9 @@ void i915_request_set_deadline(struct i915_request *request, u64 deadline); void i915_request_update_deadline(struct i915_request *request); -u64 i915_scheduler_next_virtual_deadline(int priority); - void i915_request_enqueue(struct i915_request *request); +bool __i915_request_requeue(struct i915_request *rq, + struct intel_engine_cs *engine); struct i915_request * __i915_sched_rewind_requests(struct intel_engine_cs *engine); From patchwork Mon Feb 1 08:56:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F527C433E6 for ; Mon, 1 Feb 2021 08:58:10 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ABBD064E33 for ; Mon, 1 Feb 2021 08:58:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABBD064E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 55C156E491; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4BEEF6E4A7 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757762-1500050 for multiple; Mon, 01 Feb 2021 08:57:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:56:59 +0000 Message-Id: <20210201085715.27435-41-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 41/57] drm/i915: Move saturated workload detection back to the context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When we introduced the saturated workload detection to tell us to back off from semaphore usage [semaphores have a noticeable impact on contended bus cycles with the CPU for some heavy workloads], we first introduced it as a per-context tracker. This allows individual contexts to try and optimise their own usage, but we found that with the local tracking and the no-semaphore boosting, the first context to disable semaphores got a massive priority boost and so would starve the rest and all new contexts (as they started with semaphores enabled and lower priority). Hence we moved the saturated workload detection to the engine, and a consequence had to disable semaphores on virtual engines. Now that we do not have semaphore priority boosting, and try to fairly schedule irrespective of semaphore usage, we can move the tracking back to the context and virtual engines can now utilise the faster inter-engine synchronisation. If we see that any context fairs to use the semaphore, because the system is oversubscribed and was busy doing something else instead of spinning on the semaphore, we disable further usage of semaphores with that context until it idles again. This should restrict the semaphores to lightly utilised system where the latency between requests is more noticeable, and curtail the bus-contention from checking for signaled semaphores. References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global") Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context.c | 3 +++ drivers/gpu/drm/i915/gt/intel_context_types.h | 2 ++ drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 -- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 -- .../gpu/drm/i915/gt/intel_execlists_submission.c | 15 --------------- drivers/gpu/drm/i915/i915_request.c | 6 +++--- 6 files changed, 8 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index daf537d1e415..57b6bde2b736 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -344,6 +344,9 @@ static int __intel_context_active(struct i915_active *active) { struct intel_context *ce = container_of(active, typeof(*ce), active); + CE_TRACE(ce, "active\n"); + ce->saturated = 0; + intel_context_get(ce); /* everything should already be activated by intel_context_pre_pin() */ diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 0ea18c9e2aca..d1a35c3055a7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -109,6 +109,8 @@ struct intel_context { } lrc; u32 tag; /* cookie passed to HW to track this context on submission */ + intel_engine_mask_t saturated; /* submitting semaphores too late? */ + /** stats: Context GPU engine busyness tracking. */ struct intel_context_stats { u64 active; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index ef5064ea54e5..44948abe4bf8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -253,8 +253,6 @@ static int __engine_park(struct intel_wakeref *wf) struct intel_engine_cs *engine = container_of(wf, typeof(*engine), wakeref); - engine->saturated = 0; - /* * If one and only one request is completed between pm events, * we know that we are inside the kernel context and it is diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index dc12cbdfda46..e94c99dee5cb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -303,8 +303,6 @@ struct intel_engine_cs { struct intel_context *kernel_context; /* pinned */ - intel_engine_mask_t saturated; /* submitting semaphores too late? */ - struct { struct delayed_work work; struct i915_request *systole; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 9c929688a955..e8f192984e88 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3413,21 +3413,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; - /* - * The decision on whether to submit a request using semaphores - * depends on the saturated state of the engine. We only compute - * this during HW submission of the request, and we need for this - * state to be globally applied to all requests being submitted - * to this engine. Virtual engines encompass more than one physical - * engine and so we cannot accurately tell in advance if one of those - * engines is already saturated and so cannot afford to use a semaphore - * and be pessimized in priority for doing so -- if we are the only - * context using semaphores after all other clients have stopped, we - * will be starved on the saturated system. Such a global switch for - * semaphores is less than ideal, but alas is the current compromise. - */ - ve->base.saturated = ALL_ENGINES; - snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); intel_engine_init_execlists(&ve->base); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index aa12289ea14b..352083889b97 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -516,7 +516,7 @@ bool __i915_request_submit(struct i915_request *request) */ if (request->sched.semaphores && i915_sw_fence_signaled(&request->semaphore)) - engine->saturated |= request->sched.semaphores; + request->context->saturated |= request->sched.semaphores; engine->emit_fini_breadcrumb(request, request->ring->vaddr + request->postfix); @@ -977,7 +977,7 @@ already_busywaiting(struct i915_request *rq) * * See the are-we-too-late? check in __i915_request_submit(). */ - return rq->sched.semaphores | READ_ONCE(rq->engine->saturated); + return rq->sched.semaphores | READ_ONCE(rq->context->saturated); } static int @@ -1071,7 +1071,7 @@ emit_semaphore_wait(struct i915_request *to, if (__emit_semaphore_wait(to, from, from->fence.seqno)) goto await_fence; - to->sched.semaphores |= mask; + to->sched.semaphores |= mask & ~to->engine->mask; wait = &to->semaphore; await_fence: From patchwork Mon Feb 1 08:57:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BCB5C433E9 for ; Mon, 1 Feb 2021 08:58:08 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E3F2164E3F for ; Mon, 1 Feb 2021 08:58:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E3F2164E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3E2296E4A1; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 448C26E491 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757764-1500050 for multiple; Mon, 01 Feb 2021 08:57:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:00 +0000 Message-Id: <20210201085715.27435-42-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 42/57] drm/i915: Bump default timeslicing quantum to 5ms X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Primarily to smooth over differences with the guc backend that struggles with smaller quanta, bump the default timeslicing to 5ms from 1ms. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Kconfig.profile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 35bbe2b80596..3eacea42b19f 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -90,7 +90,7 @@ config DRM_I915_STOP_TIMEOUT config DRM_I915_TIMESLICE_DURATION int "Scheduling quantum for userspace batches (ms, jiffy granularity)" - default 1 # milliseconds + default 5 # milliseconds help When two user batches of equal priority are executing, we will alternate execution of each batch to ensure forward progress of From patchwork Mon Feb 1 08:57:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8930C433E9 for ; Mon, 1 Feb 2021 08:58:00 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7BBCF64E3F for ; Mon, 1 Feb 2021 08:58:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BBCF64E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D9C096E504; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4555E6E499 for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757766-1500050 for multiple; Mon, 01 Feb 2021 08:57:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:01 +0000 Message-Id: <20210201085715.27435-43-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 43/57] drm/i915/gt: Delay taking irqoff for execlists submission X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Before we take the irqsafe spinlock to dequeue requests and submit them to HW, first do the check whether we need to take any action (i.e. whether the HW is ready for some work, or if we need to preempt the currently executing context) without taking the lock. We will then likely skip taking the spinlock, and so reduce contention. Signed-off-by: Chris Wilson --- .../drm/i915/gt/intel_execlists_submission.c | 88 ++++++++----------- 1 file changed, 39 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index e8f192984e88..d4ae65af7dc1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1014,24 +1014,6 @@ static void virtual_xfer_context(struct virtual_engine *ve, } } -static void defer_active(struct intel_engine_cs *engine) -{ - struct i915_request *rq; - - rq = __i915_sched_rewind_requests(engine); - if (!rq) - return; - - /* - * We want to move the interrupted request to the back of - * the round-robin list (i.e. its priority level), but - * in doing so, we must then move all requests that were in - * flight and were waiting for the interrupted request to - * be run after it again. - */ - __i915_sched_defer_request(engine, rq); -} - static bool timeslice_yield(const struct intel_engine_execlists *el, const struct i915_request *rq) @@ -1312,8 +1294,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ - spin_lock(&se->lock); - /* * If the queue is higher priority than the last * request in the currently active context, submit afresh. @@ -1336,24 +1316,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) rq_deadline(last), rq_prio(last)); record_preemption(execlists); - - /* - * Don't let the RING_HEAD advance past the breadcrumb - * as we unwind (and until we resubmit) so that we do - * not accidentally tell it to go backwards. - */ - ring_set_paused(engine, 1); - - /* - * Note that we have not stopped the GPU at this point, - * so we are unwinding the incomplete requests as they - * remain inflight and so by the time we do complete - * the preemption, some of the unwound requests may - * complete! - */ - __i915_sched_rewind_requests(engine); - - last = NULL; + last = (void *)1; } else if (timeslice_expired(engine, last)) { ENGINE_TRACE(engine, "expired:%s last=%llx:%llu, deadline=%llu, now=%llu, yield?=%s\n", @@ -1380,8 +1343,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * same context again, grant it a full timeslice. */ cancel_timer(&execlists->timer); - ring_set_paused(engine, 1); - defer_active(engine); /* * Unlike for preemption, if we rewind and continue @@ -1396,7 +1357,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * normal save/restore will preserve state and allow * us to later continue executing the same request. */ - last = NULL; + last = (void *)3; } else { /* * Otherwise if we already have a request pending @@ -1412,12 +1373,46 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * Even if ELSP[1] is occupied and not worthy * of timeslices, our queue might be. */ - spin_unlock(&se->lock); return; } } } + local_irq_disable(); /* irq remains off until after ELSP write */ + spin_lock(&se->lock); + + if ((unsigned long)last & 1) { + bool defer = (unsigned long)last & 2; + + /* + * Don't let the RING_HEAD advance past the breadcrumb + * as we unwind (and until we resubmit) so that we do + * not accidentally tell it to go backwards. + */ + ring_set_paused(engine, (unsigned long)last); + + /* + * Note that we have not stopped the GPU at this point, + * so we are unwinding the incomplete requests as they + * remain inflight and so by the time we do complete + * the preemption, some of the unwound requests may + * complete! + */ + last = __i915_sched_rewind_requests(engine); + + /* + * We want to move the interrupted request to the back of + * the round-robin list (i.e. its priority level), but + * in doing so, we must then move all requests that were in + * flight and were waiting for the interrupted request to + * be run after it again. + */ + if (last && defer) + __i915_sched_defer_request(engine, last); + + last = NULL; + } + if (!RB_EMPTY_ROOT(&execlists->virtual.rb_root)) virtual_requeue(engine, last); @@ -1533,13 +1528,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) i915_request_put(*port); *execlists->pending = NULL; } -} -static void execlists_dequeue_irq(struct intel_engine_cs *engine) -{ - local_irq_disable(); /* Suspend interrupts across request submission */ - execlists_dequeue(engine); - local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ + local_irq_enable(); } static void clear_ports(struct i915_request **ports, int count) @@ -2191,7 +2181,7 @@ static void execlists_submission_tasklet(struct tasklet_struct *t) execlists_reset(engine); if (!engine->execlists.pending[0]) { - execlists_dequeue_irq(engine); + execlists_dequeue(engine); start_timeslice(engine); } From patchwork Mon Feb 1 08:57:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C700C433DB for ; Mon, 1 Feb 2021 08:57:52 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BF28064E3F for ; Mon, 1 Feb 2021 08:57:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF28064E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4C8D66E4F3; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 73CB66E49A for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757767-1500050 for multiple; Mon, 01 Feb 2021 08:57:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:02 +0000 Message-Id: <20210201085715.27435-44-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 44/57] drm/i915/gt: Wrap intel_timeline.has_initial_breadcrumb X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" In preparation for removing the has_initial_breadcrumb field, add a helper function for the existing callers. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_ring_submission.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_timeline.c | 6 +++--- drivers/gpu/drm/i915/gt/intel_timeline.h | 6 ++++++ drivers/gpu/drm/i915/gt/selftest_timeline.c | 5 +++-- 5 files changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 8791e03ebe61..d8763146e054 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -354,7 +354,7 @@ int gen8_emit_init_breadcrumb(struct i915_request *rq) u32 *cs; GEM_BUG_ON(i915_request_has_initial_breadcrumb(rq)); - if (!i915_request_timeline(rq)->has_initial_breadcrumb) + if (!intel_timeline_has_initial_breadcrumb(i915_request_timeline(rq))) return 0; cs = intel_ring_begin(rq, 6); diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index a7d49ea71900..9d193acd260b 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -908,7 +908,7 @@ static int ring_request_alloc(struct i915_request *request) int ret; GEM_BUG_ON(!intel_context_is_pinned(request->context)); - GEM_BUG_ON(i915_request_timeline(request)->has_initial_breadcrumb); + GEM_BUG_ON(intel_timeline_has_initial_breadcrumb(i915_request_timeline(request))); /* * Flush enough space to reduce the likelihood of waiting after @@ -1229,7 +1229,7 @@ int intel_ring_submission_setup(struct intel_engine_cs *engine) err = PTR_ERR(timeline); goto err; } - GEM_BUG_ON(timeline->has_initial_breadcrumb); + GEM_BUG_ON(intel_timeline_has_initial_breadcrumb(timeline)); err = intel_timeline_pin(timeline, NULL); if (err) diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 491b8df174c2..1505dffbaba9 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -444,14 +444,14 @@ void intel_timeline_exit(struct intel_timeline *tl) static u32 timeline_advance(struct intel_timeline *tl) { GEM_BUG_ON(!atomic_read(&tl->pin_count)); - GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb); + GEM_BUG_ON(tl->seqno & intel_timeline_has_initial_breadcrumb(tl)); - return tl->seqno += 1 + tl->has_initial_breadcrumb; + return tl->seqno += 1 + intel_timeline_has_initial_breadcrumb(tl); } static void timeline_rollback(struct intel_timeline *tl) { - tl->seqno -= 1 + tl->has_initial_breadcrumb; + tl->seqno -= 1 + intel_timeline_has_initial_breadcrumb(tl); } static noinline int diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h index b1f81d947f8d..7d6218b55df6 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline.h @@ -42,6 +42,12 @@ static inline void intel_timeline_put(struct intel_timeline *timeline) kref_put(&timeline->kref, __intel_timeline_free); } +static inline bool +intel_timeline_has_initial_breadcrumb(const struct intel_timeline *tl) +{ + return tl->has_initial_breadcrumb; +} + static inline int __intel_timeline_sync_set(struct intel_timeline *tl, u64 context, u32 seqno) { diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index d283dce5b4ac..562a450d2832 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -665,7 +665,7 @@ static int live_hwsp_wrap(void *arg) if (IS_ERR(tl)) return PTR_ERR(tl); - if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline) + if (!intel_timeline_has_initial_breadcrumb(tl) || !tl->hwsp_cacheline) goto out_free; err = intel_timeline_pin(tl, NULL); @@ -1234,7 +1234,8 @@ static int live_hwsp_rollover_user(void *arg) goto out; tl = ce->timeline; - if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline) + if (!intel_timeline_has_initial_breadcrumb(tl) || + !tl->hwsp_cacheline) goto out; timeline_rollback(tl); From patchwork Mon Feb 1 08:57:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD6B1C433E0 for ; Mon, 1 Feb 2021 08:57:55 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4755C64E3F for ; Mon, 1 Feb 2021 08:57:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4755C64E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D79DD6E503; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7C2A46E491 for ; Mon, 1 Feb 2021 08:57:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757770-1500050 for multiple; Mon, 01 Feb 2021 08:57:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:03 +0000 Message-Id: <20210201085715.27435-45-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 45/57] drm/i915/gt: Track timeline GGTT offset separately from subpage offset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Currently we know that the timeline status page is at most a page in size, and so we can preserve the lower 12bits of the offset when relocating the status page in the GGTT. If we want to use a larger object, such as the context state, we may not necessarily use a position within the first page and so need more than 12b. Signed-off-by: Chris Wilson Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 4 ++-- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 1 - .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_timeline.c | 17 +++++++---------- drivers/gpu/drm/i915/gt/intel_timeline_types.h | 1 + drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/selftest_rc6.c | 2 +- drivers/gpu/drm/i915/gt/selftest_timeline.c | 16 ++++++++-------- drivers/gpu/drm/i915/i915_scheduler.c | 2 +- 10 files changed, 23 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c index ce38d1bcaba3..2f59dd3bdc18 100644 --- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c @@ -161,7 +161,7 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) PIPE_CONTROL_DC_FLUSH_ENABLE | PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL); - *cs++ = i915_request_active_timeline(rq)->hwsp_offset | + *cs++ = i915_request_active_timeline(rq)->ggtt_offset | PIPE_CONTROL_GLOBAL_GTT; *cs++ = rq->fence.seqno; @@ -359,7 +359,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_CS_STALL); - *cs++ = i915_request_active_timeline(rq)->hwsp_offset; + *cs++ = i915_request_active_timeline(rq)->ggtt_offset; *cs++ = rq->fence.seqno; *cs++ = MI_USER_INTERRUPT; diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index d8763146e054..187f1dad1054 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -346,7 +346,7 @@ static u32 hwsp_offset(const struct i915_request *rq) if (cl) return cl->ggtt_offset; - return rcu_dereference_protected(rq->timeline, 1)->hwsp_offset; + return rcu_dereference_protected(rq->timeline, 1)->ggtt_offset; } int gen8_emit_init_breadcrumb(struct i915_request *rq) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index f39f8049641c..f91c38124871 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1452,7 +1452,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, i915_sched_show(m, intel_engine_get_scheduler(engine), i915_request_show, 8); - drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base); wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm); if (wakeref) { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index d4ae65af7dc1..d9b5b6c9eb5d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3591,7 +3591,7 @@ static int print_ring(char *buf, int sz, struct i915_request *rq) len = scnprintf(buf, sz, "ring:{start:%08x, hwsp:%08x, seqno:%08x, runtime:%llums}, ", i915_ggtt_offset(rq->ring->vma), - tl ? tl->hwsp_offset : 0, + tl ? tl->ggtt_offset : 0, hwsp_seqno(rq), DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context), 1000 * 1000)); diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 1505dffbaba9..b684322c879c 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -354,13 +354,11 @@ int intel_timeline_pin(struct intel_timeline *tl, struct i915_gem_ww_ctx *ww) if (err) return err; - tl->hwsp_offset = - i915_ggtt_offset(tl->hwsp_ggtt) + - offset_in_page(tl->hwsp_offset); + tl->ggtt_offset = i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset; GT_TRACE(tl->gt, "timeline:%llx using HWSP offset:%x\n", - tl->fence_context, tl->hwsp_offset); + tl->fence_context, tl->ggtt_offset); - cacheline_acquire(tl->hwsp_cacheline, tl->hwsp_offset); + cacheline_acquire(tl->hwsp_cacheline, tl->ggtt_offset); if (atomic_fetch_inc(&tl->pin_count)) { cacheline_release(tl->hwsp_cacheline); __i915_vma_unpin(tl->hwsp_ggtt); @@ -528,14 +526,13 @@ __intel_timeline_get_seqno(struct intel_timeline *tl, vaddr = page_mask_bits(cl->vaddr); tl->hwsp_offset = cacheline * CACHELINE_BYTES; - tl->hwsp_seqno = - memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES); + tl->hwsp_seqno = memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES); - tl->hwsp_offset += i915_ggtt_offset(vma); + tl->ggtt_offset = i915_ggtt_offset(vma) + tl->hwsp_offset; GT_TRACE(tl->gt, "timeline:%llx using HWSP offset:%x\n", - tl->fence_context, tl->hwsp_offset); + tl->fence_context, tl->ggtt_offset); - cacheline_acquire(cl, tl->hwsp_offset); + cacheline_acquire(cl, tl->ggtt_offset); tl->hwsp_cacheline = cl; *seqno = timeline_advance(tl); diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h index 9f677c9b7d06..c5995cc290a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h @@ -47,6 +47,7 @@ struct intel_timeline { const u32 *hwsp_seqno; struct i915_vma *hwsp_ggtt; u32 hwsp_offset; + u32 ggtt_offset; struct intel_timeline_cacheline *hwsp_cacheline; diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c index 84d883de30ee..e33ec4e3b35d 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c @@ -53,7 +53,7 @@ static int write_timestamp(struct i915_request *rq, int slot) cmd++; *cs++ = cmd; *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(rq->engine->mmio_base)); - *cs++ = i915_request_timeline(rq)->hwsp_offset + slot * sizeof(u32); + *cs++ = i915_request_timeline(rq)->ggtt_offset + slot * sizeof(u32); *cs++ = 0; intel_ring_advance(rq, cs); diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c index f097e420ac45..285cead849dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_rc6.c +++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c @@ -137,7 +137,7 @@ static const u32 *__live_rc6_ctx(struct intel_context *ce) *cs++ = cmd; *cs++ = i915_mmio_reg_offset(GEN8_RC6_CTX_INFO); - *cs++ = ce->timeline->hwsp_offset + 8; + *cs++ = ce->timeline->ggtt_offset + 8; *cs++ = 0; intel_ring_advance(rq, cs); diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index 562a450d2832..6b412228a6fd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -468,7 +468,7 @@ tl_write(struct intel_timeline *tl, struct intel_engine_cs *engine, u32 value) i915_request_get(rq); - err = emit_ggtt_store_dw(rq, tl->hwsp_offset, value); + err = emit_ggtt_store_dw(rq, tl->ggtt_offset, value); i915_request_add(rq); if (err) { i915_request_put(rq); @@ -564,7 +564,7 @@ static int live_hwsp_engine(void *arg) if (!err && READ_ONCE(*tl->hwsp_seqno) != n) { GEM_TRACE_ERR("Invalid seqno:%lu stored in timeline %llu @ %x, found 0x%x\n", - n, tl->fence_context, tl->hwsp_offset, *tl->hwsp_seqno); + n, tl->fence_context, tl->ggtt_offset, *tl->hwsp_seqno); GEM_TRACE_DUMP(); err = -EINVAL; } @@ -636,7 +636,7 @@ static int live_hwsp_alternate(void *arg) if (!err && READ_ONCE(*tl->hwsp_seqno) != n) { GEM_TRACE_ERR("Invalid seqno:%lu stored in timeline %llu @ %x, found 0x%x\n", - n, tl->fence_context, tl->hwsp_offset, *tl->hwsp_seqno); + n, tl->fence_context, tl->ggtt_offset, *tl->hwsp_seqno); GEM_TRACE_DUMP(); err = -EINVAL; } @@ -696,9 +696,9 @@ static int live_hwsp_wrap(void *arg) goto out; } pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n", - seqno[0], tl->hwsp_offset); + seqno[0], tl->ggtt_offset); - err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]); + err = emit_ggtt_store_dw(rq, tl->ggtt_offset, seqno[0]); if (err) { i915_request_add(rq); goto out; @@ -713,9 +713,9 @@ static int live_hwsp_wrap(void *arg) goto out; } pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n", - seqno[1], tl->hwsp_offset); + seqno[1], tl->ggtt_offset); - err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]); + err = emit_ggtt_store_dw(rq, tl->ggtt_offset, seqno[1]); if (err) { i915_request_add(rq); goto out; @@ -1343,7 +1343,7 @@ static int live_hwsp_recycle(void *arg) if (READ_ONCE(*tl->hwsp_seqno) != count) { GEM_TRACE_ERR("Invalid seqno:%lu stored in timeline %llu @ %x found 0x%x\n", count, tl->fence_context, - tl->hwsp_offset, *tl->hwsp_seqno); + tl->ggtt_offset, *tl->hwsp_seqno); GEM_TRACE_DUMP(); err = -EINVAL; } diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index de65e747e809..838fd26c5ac6 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -1665,7 +1665,7 @@ void i915_sched_show(struct drm_printer *m, drm_printf(m, "\t\tring->space: 0x%08x\n", rq->ring->space); drm_printf(m, "\t\tring->hwsp: 0x%08x\n", - i915_request_active_timeline(rq)->hwsp_offset); + i915_request_active_timeline(rq)->ggtt_offset); print_request_ring(m, rq); From patchwork Mon Feb 1 08:57:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78DDEC433E0 for ; Mon, 1 Feb 2021 08:57:53 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0B94364E33 for ; Mon, 1 Feb 2021 08:57:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B94364E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9AFDB6E4C4; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5790D6E45D for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757772-1500050 for multiple; Mon, 01 Feb 2021 08:57:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:04 +0000 Message-Id: <20210201085715.27435-46-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 46/57] drm/i915/gt: Add timeline "mode" X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Explicitly differentiate between the absolute and relative timelines, and the global HWSP and ppHWSP relative offsets. When using a timeline that is relative to a known status page, we can replace the absolute addressing in the commands with indexed variants. Signed-off-by: Chris Wilson Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_timeline.c | 21 ++++++++++++++++--- drivers/gpu/drm/i915/gt/intel_timeline.h | 2 +- .../gpu/drm/i915/gt/intel_timeline_types.h | 10 +++++++-- 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index b684322c879c..69052495c64a 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -226,7 +226,6 @@ static int intel_timeline_init(struct intel_timeline *timeline, timeline->gt = gt; - timeline->has_initial_breadcrumb = !hwsp; timeline->hwsp_cacheline = NULL; if (!hwsp) { @@ -243,13 +242,29 @@ static int intel_timeline_init(struct intel_timeline *timeline, return PTR_ERR(cl); } + timeline->mode = INTEL_TIMELINE_ABSOLUTE; timeline->hwsp_cacheline = cl; timeline->hwsp_offset = cacheline * CACHELINE_BYTES; vaddr = page_mask_bits(cl->vaddr); } else { - timeline->hwsp_offset = offset; - vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB); + int preferred; + + if (offset & INTEL_TIMELINE_RELATIVE_CONTEXT) { + timeline->mode = INTEL_TIMELINE_RELATIVE_CONTEXT; + timeline->hwsp_offset = + offset & ~INTEL_TIMELINE_RELATIVE_CONTEXT; + preferred = i915_coherent_map_type(gt->i915); + } else { + timeline->mode = INTEL_TIMELINE_RELATIVE_ENGINE; + timeline->hwsp_offset = offset; + preferred = I915_MAP_WB; + } + + vaddr = i915_gem_object_pin_map(hwsp->obj, + preferred | I915_MAP_OVERRIDE); + if (IS_ERR(vaddr)) + vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WC); if (IS_ERR(vaddr)) return PTR_ERR(vaddr); } diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h index 7d6218b55df6..e1d522329757 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline.h @@ -45,7 +45,7 @@ static inline void intel_timeline_put(struct intel_timeline *timeline) static inline bool intel_timeline_has_initial_breadcrumb(const struct intel_timeline *tl) { - return tl->has_initial_breadcrumb; + return tl->mode == INTEL_TIMELINE_ABSOLUTE; } static inline int __intel_timeline_sync_set(struct intel_timeline *tl, diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h index c5995cc290a0..61938d103a13 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h @@ -19,6 +19,12 @@ struct i915_syncmap; struct intel_gt; struct intel_timeline_hwsp; +enum intel_timeline_mode { + INTEL_TIMELINE_ABSOLUTE = 0, + INTEL_TIMELINE_RELATIVE_CONTEXT = BIT(0), + INTEL_TIMELINE_RELATIVE_ENGINE = BIT(1), +}; + struct intel_timeline { u64 fence_context; u32 seqno; @@ -44,6 +50,8 @@ struct intel_timeline { atomic_t pin_count; atomic_t active_count; + enum intel_timeline_mode mode; + const u32 *hwsp_seqno; struct i915_vma *hwsp_ggtt; u32 hwsp_offset; @@ -51,8 +59,6 @@ struct intel_timeline { struct intel_timeline_cacheline *hwsp_cacheline; - bool has_initial_breadcrumb; - /** * List of breadcrumbs associated with GPU requests currently * outstanding. From patchwork Mon Feb 1 08:57:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CF1BC433DB for ; Mon, 1 Feb 2021 08:57:48 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 78E7764E33 for ; Mon, 1 Feb 2021 08:57:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 78E7764E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9F0256E4D7; Mon, 1 Feb 2021 08:57:34 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5F1EC6E491 for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757774-1500050 for multiple; Mon, 01 Feb 2021 08:57:24 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:05 +0000 Message-Id: <20210201085715.27435-47-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 47/57] drm/i915/gt: Use indices for writing into relative timelines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Relative timelines are relative to either the global or per-process HWSP, and so we can replace the absolute addressing with store-index variants for position invariance. Signed-off-by: Chris Wilson Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 98 +++++++++++++++++------- drivers/gpu/drm/i915/gt/intel_timeline.h | 12 +++ 2 files changed, 82 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 187f1dad1054..7fd843369b41 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -518,7 +518,19 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) static u32 *emit_xcs_breadcrumb(struct i915_request *rq, u32 *cs) { - return gen8_emit_ggtt_write(cs, rq->fence.seqno, hwsp_offset(rq), 0); + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + unsigned int flags = MI_FLUSH_DW_OP_STOREDW; + u32 offset = hwsp_offset(rq); + + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= MI_FLUSH_DW_STORE_INDEX; + } + GEM_BUG_ON(offset & 7); + if (!intel_timeline_in_context(tl)) + offset |= MI_FLUSH_DW_USE_GTT; + + return __gen8_emit_flush_dw(cs, rq->fence.seqno, offset, flags); } u32 *gen8_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs) @@ -528,6 +540,18 @@ u32 *gen8_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs) u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + unsigned int flags = PIPE_CONTROL_FLUSH_ENABLE | PIPE_CONTROL_CS_STALL; + u32 offset = hwsp_offset(rq); + + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + } + GEM_BUG_ON(offset & 7); + if (!intel_timeline_in_context(tl)) + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; + cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | PIPE_CONTROL_DEPTH_CACHE_FLUSH | @@ -535,26 +559,33 @@ u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) 0); /* XXX flush+write+CS_STALL all in one upsets gem_concurrent_blt:kbl */ - cs = gen8_emit_ggtt_write_rcs(cs, - rq->fence.seqno, - hwsp_offset(rq), - PIPE_CONTROL_FLUSH_ENABLE | - PIPE_CONTROL_CS_STALL); + cs = __gen8_emit_write_rcs(cs, rq->fence.seqno, offset, 0, flags); return gen8_emit_fini_breadcrumb_tail(rq, cs); } u32 *gen11_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { - cs = gen8_emit_ggtt_write_rcs(cs, - rq->fence.seqno, - hwsp_offset(rq), - PIPE_CONTROL_CS_STALL | - PIPE_CONTROL_TILE_CACHE_FLUSH | - PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | - PIPE_CONTROL_DEPTH_CACHE_FLUSH | - PIPE_CONTROL_DC_FLUSH_ENABLE | - PIPE_CONTROL_FLUSH_ENABLE); + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = hwsp_offset(rq); + unsigned int flags; + + flags = (PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_TILE_CACHE_FLUSH | + PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | + PIPE_CONTROL_DEPTH_CACHE_FLUSH | + PIPE_CONTROL_DC_FLUSH_ENABLE | + PIPE_CONTROL_FLUSH_ENABLE); + + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + } + GEM_BUG_ON(offset & 7); + if (!intel_timeline_in_context(tl)) + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; + + cs = __gen8_emit_write_rcs(cs, rq->fence.seqno, offset, 0, flags); return gen8_emit_fini_breadcrumb_tail(rq, cs); } @@ -617,19 +648,30 @@ u32 *gen12_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs) u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { - cs = gen12_emit_ggtt_write_rcs(cs, - rq->fence.seqno, - hwsp_offset(rq), - PIPE_CONTROL0_HDC_PIPELINE_FLUSH, - PIPE_CONTROL_CS_STALL | - PIPE_CONTROL_TILE_CACHE_FLUSH | - PIPE_CONTROL_FLUSH_L3 | - PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | - PIPE_CONTROL_DEPTH_CACHE_FLUSH | - /* Wa_1409600907:tgl */ - PIPE_CONTROL_DEPTH_STALL | - PIPE_CONTROL_DC_FLUSH_ENABLE | - PIPE_CONTROL_FLUSH_ENABLE); + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = hwsp_offset(rq); + unsigned int flags; + + flags = (PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_TILE_CACHE_FLUSH | + PIPE_CONTROL_FLUSH_L3 | + PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | + PIPE_CONTROL_DEPTH_CACHE_FLUSH | + /* Wa_1409600907:tgl */ + PIPE_CONTROL_DEPTH_STALL | + PIPE_CONTROL_DC_FLUSH_ENABLE | + PIPE_CONTROL_FLUSH_ENABLE); + + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + } + GEM_BUG_ON(offset & 7); + if (!intel_timeline_in_context(tl)) + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; + + cs = __gen8_emit_write_rcs(cs, rq->fence.seqno, offset, + PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags); return gen12_emit_fini_breadcrumb_tail(rq, cs); } diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h index e1d522329757..9859a77a6f54 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline.h @@ -48,6 +48,18 @@ intel_timeline_has_initial_breadcrumb(const struct intel_timeline *tl) return tl->mode == INTEL_TIMELINE_ABSOLUTE; } +static inline bool +intel_timeline_is_relative(const struct intel_timeline *tl) +{ + return tl->mode != INTEL_TIMELINE_ABSOLUTE; +} + +static inline bool +intel_timeline_in_context(const struct intel_timeline *tl) +{ + return tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT; +} + static inline int __intel_timeline_sync_set(struct intel_timeline *tl, u64 context, u32 seqno) { From patchwork Mon Feb 1 08:57:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A60D3C433E6 for ; Mon, 1 Feb 2021 08:58:20 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4670864E3F for ; Mon, 1 Feb 2021 08:58:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4670864E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 344236E527; Mon, 1 Feb 2021 08:57:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 416046E48C for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757775-1500050 for multiple; Mon, 01 Feb 2021 08:57:24 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:06 +0000 Message-Id: <20210201085715.27435-48-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 48/57] drm/i915/selftests: Exercise relative timeline modes X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" A quick test to verify that the backend accepts each type of timeline and can use them to track and control request emission. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/selftest_timeline.c | 105 ++++++++++++++++++++ 1 file changed, 105 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index 6b412228a6fd..dcc03522b277 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -1364,9 +1364,114 @@ static int live_hwsp_recycle(void *arg) return err; } +static int live_hwsp_relative(void *arg) +{ + struct intel_gt *gt = arg; + struct intel_engine_cs *engine; + enum intel_engine_id id; + + /* + * Check backend support for different timeline modes. + */ + + for_each_engine(engine, gt, id) { + enum intel_timeline_mode mode; + + if (!intel_engine_has_scheduler(engine)) + continue; + + for (mode = INTEL_TIMELINE_ABSOLUTE; + mode <= INTEL_TIMELINE_RELATIVE_ENGINE; + mode++) { + struct intel_timeline *tl; + struct i915_request *rq; + struct intel_context *ce; + const char *msg; + int err; + + if (mode == INTEL_TIMELINE_RELATIVE_CONTEXT && + !HAS_EXECLISTS(gt->i915)) + continue; + + ce = intel_context_create(engine); + if (IS_ERR(ce)) + return PTR_ERR(ce); + + err = intel_context_alloc_state(ce); + if (err) { + intel_context_put(ce); + return err; + } + + switch (mode) { + case INTEL_TIMELINE_ABSOLUTE: + tl = intel_timeline_create(gt); + msg = "local"; + break; + + case INTEL_TIMELINE_RELATIVE_CONTEXT: + tl = __intel_timeline_create(gt, + ce->state, + INTEL_TIMELINE_RELATIVE_CONTEXT | + 0x400); + msg = "ppHWSP"; + break; + + case INTEL_TIMELINE_RELATIVE_ENGINE: + tl = __intel_timeline_create(gt, + engine->status_page.vma, + 0x400); + msg = "HWSP"; + break; + default: + continue; + } + if (IS_ERR(tl)) { + intel_context_put(ce); + return PTR_ERR(tl); + } + + pr_info("Testing %s timeline on %s\n", + msg, engine->name); + + intel_timeline_put(ce->timeline); + ce->timeline = tl; + + err = intel_timeline_pin(tl, NULL); + if (err) { + intel_context_put(ce); + return err; + } + tl->seqno = 0xc0000000; + WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno); + intel_timeline_unpin(tl); + + rq = intel_context_create_request(ce); + intel_context_put(ce); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + GEM_BUG_ON(rcu_access_pointer(rq->timeline) != tl); + + i915_request_get(rq); + i915_request_add(rq); + + if (i915_request_wait(rq, 0, HZ / 5) < 0) { + i915_request_put(rq); + return -EIO; + } + + i915_request_put(rq); + } + } + + return 0; +} + int intel_timeline_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { + SUBTEST(live_hwsp_relative), SUBTEST(live_hwsp_recycle), SUBTEST(live_hwsp_engine), SUBTEST(live_hwsp_alternate), From patchwork Mon Feb 1 08:57:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE82CC4332B for ; Mon, 1 Feb 2021 08:58:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 678BC64E33 for ; Mon, 1 Feb 2021 08:58:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 678BC64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 62BC16E4B5; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 70ED66E4AF for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757778-1500050 for multiple; Mon, 01 Feb 2021 08:57:24 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:07 +0000 Message-Id: <20210201085715.27435-49-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 49/57] drm/i915/gt: Use ppHWSP for unshared non-semaphore related timelines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When we are not using semaphores with a context/engine, we can simply reuse the same seqno location across wraps, but we still require each timeline to have its own address. For LRC submission, each context is prefixed by a per-process HWSP, which provides us with a unique location for each context-local timeline. A shared timeline that is common to multiple contexts will continue to use a separate page. This enables us to create position invariant contexts should we feel the need to relocate them. Initially they are automatically used by Broadwell/Braswell as they do not require independent timelines. Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_lrc.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 8508b8d701c1..f9acd9e63066 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -833,6 +833,14 @@ pinned_timeline(struct intel_context *ce, struct intel_engine_cs *engine) return intel_timeline_create_from_engine(engine, page_unmask_bits(tl)); } +static struct intel_timeline * +pphwsp_timeline(struct intel_context *ce, struct i915_vma *state) +{ + return __intel_timeline_create(ce->engine->gt, state, + I915_GEM_HWS_SEQNO_ADDR | + INTEL_TIMELINE_RELATIVE_CONTEXT); +} + int lrc_alloc(struct intel_context *ce, struct intel_engine_cs *engine) { struct intel_ring *ring; @@ -860,8 +868,10 @@ int lrc_alloc(struct intel_context *ce, struct intel_engine_cs *engine) */ if (unlikely(ce->timeline)) tl = pinned_timeline(ce, engine); - else + else if (intel_engine_has_semaphores(engine)) tl = intel_timeline_create(engine->gt); + else + tl = pphwsp_timeline(ce, vma); if (IS_ERR(tl)) { err = PTR_ERR(tl); goto err_ring; From patchwork Mon Feb 1 08:57:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18CAEC433E6 for ; Mon, 1 Feb 2021 08:58:07 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A27C664E3F for ; Mon, 1 Feb 2021 08:58:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A27C664E3F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1556C6E499; Mon, 1 Feb 2021 08:57:42 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 89B7C6E4A2 for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757780-1500050 for multiple; Mon, 01 Feb 2021 08:57:24 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:08 +0000 Message-Id: <20210201085715.27435-50-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 50/57] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq" X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" This was removed in commit 478ffad6d690 ("drm/i915: drop engine_pin/unpin_breadcrumbs_irq") as the last user had been removed, but now there is a promise of a new user in the next patch. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 24 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 3 +++ 2 files changed, 27 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb..9e67810c7767 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -310,6 +310,30 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) spin_unlock_irqrestore(&b->irq_lock, flags); } +void intel_breadcrumbs_pin_irq(struct intel_breadcrumbs *b) +{ + if (GEM_DEBUG_WARN_ON(!b->irq_engine)) + return; + + spin_lock_irq(&b->irq_lock); + if (!b->irq_enabled++) + irq_enable(b->irq_engine); + GEM_BUG_ON(!b->irq_enabled); /* no overflow! */ + spin_unlock_irq(&b->irq_lock); +} + +void intel_breadcrumbs_unpin_irq(struct intel_breadcrumbs *b) +{ + if (GEM_DEBUG_WARN_ON(!b->irq_engine)) + return; + + spin_lock_irq(&b->irq_lock); + GEM_BUG_ON(!b->irq_enabled); /* no underflow! */ + if (!--b->irq_enabled) + irq_disable(b->irq_engine); + spin_unlock_irq(&b->irq_lock); +} + void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) { if (!READ_ONCE(b->irq_armed)) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..c2bb3a79ca9f 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -19,6 +19,9 @@ struct intel_breadcrumbs * intel_breadcrumbs_create(struct intel_engine_cs *irq_engine); void intel_breadcrumbs_free(struct intel_breadcrumbs *b); +void intel_breadcrumbs_pin_irq(struct intel_breadcrumbs *b); +void intel_breadcrumbs_unpin_irq(struct intel_breadcrumbs *b); + void intel_breadcrumbs_reset(struct intel_breadcrumbs *b); void __intel_breadcrumbs_park(struct intel_breadcrumbs *b); From patchwork Mon Feb 1 08:57:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78A49C433E0 for ; Mon, 1 Feb 2021 08:57:49 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0CD5264E41 for ; Mon, 1 Feb 2021 08:57:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0CD5264E41 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2A36D6E4A7; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7B6876E49B for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757782-1500050 for multiple; Mon, 01 Feb 2021 08:57:25 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:09 +0000 Message-Id: <20210201085715.27435-51-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 51/57] drm/i915/gt: Couple tasklet scheduling for all CS interrupts X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If any engine asks for the tasklet to be kicked from the CS interrupt, do so. Currently, this is used by the execlists scheduler backends to feed in the next request to the HW, and similarly could be used by a ring scheduler, as will be seen in the next patch. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_gt_irq.c | 17 ++++++++++++----- drivers/gpu/drm/i915/gt/intel_gt_irq.h | 3 +++ drivers/gpu/drm/i915/gt/intel_rps.c | 2 +- drivers/gpu/drm/i915/i915_irq.c | 8 ++++---- 4 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index 6ce5bd28a23d..270dbebc4c18 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -62,6 +62,13 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir) intel_engine_kick_scheduler(engine); } +void gen2_engine_cs_irq(struct intel_engine_cs *engine) +{ + intel_engine_signal_breadcrumbs(engine); + if (intel_engine_needs_breadcrumb_tasklet(engine)) + intel_engine_kick_scheduler(engine); +} + static u32 gen11_gt_engine_identity(struct intel_gt *gt, const unsigned int bank, const unsigned int bit) @@ -275,9 +282,9 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) void gen5_gt_irq_handler(struct intel_gt *gt, u32 gt_iir) { if (gt_iir & GT_RENDER_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]); + gen2_engine_cs_irq(gt->engine_class[RENDER_CLASS][0]); if (gt_iir & ILK_BSD_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]); + gen2_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0]); } static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir) @@ -301,11 +308,11 @@ static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir) void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir) { if (gt_iir & GT_RENDER_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]); + gen2_engine_cs_irq(gt->engine_class[RENDER_CLASS][0]); if (gt_iir & GT_BSD_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]); + gen2_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0]); if (gt_iir & GT_BLT_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(gt->engine_class[COPY_ENGINE_CLASS][0]); + gen2_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0]); if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT | GT_BSD_CS_ERROR_INTERRUPT | diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h b/drivers/gpu/drm/i915/gt/intel_gt_irq.h index f667e976fb2b..26c2a5ea3b23 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h @@ -8,6 +8,7 @@ #include +struct intel_engine_cs; struct intel_gt; #define GEN8_GT_IRQS (GEN8_GT_RCS_IRQ | \ @@ -18,6 +19,8 @@ struct intel_gt; GEN8_GT_PM_IRQ | \ GEN8_GT_GUC_IRQ) +void gen2_engine_cs_irq(struct intel_engine_cs *engine); + void gen11_gt_irq_reset(struct intel_gt *gt); void gen11_gt_irq_postinstall(struct intel_gt *gt); void gen11_gt_irq_handler(struct intel_gt *gt, const u32 master_ctl); diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 405d814e9040..900c20a6d073 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1774,7 +1774,7 @@ void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir) return; if (pm_iir & PM_VEBOX_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(gt->engine[VECS0]); + gen2_engine_cs_irq(gt->engine[VECS0]); if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT) DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 9665cd9742a6..c244ba2c8cee 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3954,7 +3954,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg) intel_uncore_write16(&dev_priv->uncore, GEN2_IIR, iir); if (iir & I915_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]); + gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]); if (iir & I915_MASTER_ERROR_INTERRUPT) i8xx_error_irq_handler(dev_priv, eir, eir_stuck); @@ -4062,7 +4062,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg) intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir); if (iir & I915_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]); + gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]); if (iir & I915_MASTER_ERROR_INTERRUPT) i9xx_error_irq_handler(dev_priv, eir, eir_stuck); @@ -4207,10 +4207,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) intel_uncore_write(&dev_priv->uncore, GEN2_IIR, iir); if (iir & I915_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]); + gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]); if (iir & I915_BSD_USER_INTERRUPT) - intel_engine_signal_breadcrumbs(dev_priv->gt.engine[VCS0]); + gen2_engine_cs_irq(dev_priv->gt.engine[VCS0]); if (iir & I915_MASTER_ERROR_INTERRUPT) i9xx_error_irq_handler(dev_priv, eir, eir_stuck); From patchwork Mon Feb 1 08:57:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D0BAC433E0 for ; Mon, 1 Feb 2021 08:58:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D21D464E33 for ; Mon, 1 Feb 2021 08:58:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D21D464E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D593B6E49A; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3E4636E48B for ; Mon, 1 Feb 2021 08:57:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757783-1500050 for multiple; Mon, 01 Feb 2021 08:57:25 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:10 +0000 Message-Id: <20210201085715.27435-52-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 52/57] drm/i915/gt: Support creation of 'internal' rings X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To support legacy ring buffer scheduling, we want a virtual ringbuffer for each client. These rings are purely for holding the requests as they are being constructed on the CPU and never accessed by the GPU, so they should not be bound into the GGTT, and we can use plain old WB mapped pages. As they are not bound, we need to nerf a few assumptions that a rq->ring is in the GGTT. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context.c | 2 +- .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_ring.c | 69 ++++++++++++------- drivers/gpu/drm/i915/gt/intel_ring.h | 17 ++++- drivers/gpu/drm/i915/gt/intel_ring_types.h | 2 + drivers/gpu/drm/i915/i915_scheduler.c | 7 +- 6 files changed, 69 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 57b6bde2b736..c7ab4ed92da4 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -258,7 +258,7 @@ int __intel_context_do_pin_ww(struct intel_context *ce, } CE_TRACE(ce, "pin ring:{start:%08x, head:%04x, tail:%04x}\n", - i915_ggtt_offset(ce->ring->vma), + intel_ring_address(ce->ring), ce->ring->head, ce->ring->tail); handoff = true; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index d9b5b6c9eb5d..ac288b180574 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3590,7 +3590,7 @@ static int print_ring(char *buf, int sz, struct i915_request *rq) len = scnprintf(buf, sz, "ring:{start:%08x, hwsp:%08x, seqno:%08x, runtime:%llums}, ", - i915_ggtt_offset(rq->ring->vma), + intel_ring_address(rq->ring), tl ? tl->ggtt_offset : 0, hwsp_seqno(rq), DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context), diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..521972c297a9 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -32,33 +32,42 @@ void __intel_ring_pin(struct intel_ring *ring) int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww) { struct i915_vma *vma = ring->vma; - unsigned int flags; void *addr; int ret; if (atomic_fetch_inc(&ring->pin_count)) return 0; - /* Ring wraparound at offset 0 sometimes hangs. No idea why. */ - flags = PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma); + if (!intel_ring_is_internal(ring)) { + int type = i915_coherent_map_type(vma->vm->i915); + unsigned int pin; - if (i915_gem_object_is_stolen(vma->obj)) - flags |= PIN_MAPPABLE; - else - flags |= PIN_HIGH; + /* Ring wraparound at offset 0 sometimes hangs. No idea why. */ + pin |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma); - ret = i915_ggtt_pin(vma, ww, 0, flags); - if (unlikely(ret)) - goto err_unpin; + if (i915_gem_object_is_stolen(vma->obj)) + pin |= PIN_MAPPABLE; + else + pin |= PIN_HIGH; - if (i915_vma_is_map_and_fenceable(vma)) - addr = (void __force *)i915_vma_pin_iomap(vma); - else - addr = i915_gem_object_pin_map(vma->obj, - i915_coherent_map_type(vma->vm->i915)); - if (IS_ERR(addr)) { - ret = PTR_ERR(addr); - goto err_ring; + ret = i915_ggtt_pin(vma, ww, 0, pin); + if (unlikely(ret)) + goto err_unpin; + + if (i915_vma_is_map_and_fenceable(vma)) + addr = (void __force *)i915_vma_pin_iomap(vma); + else + addr = i915_gem_object_pin_map(vma->obj, type); + if (IS_ERR(addr)) { + ret = PTR_ERR(addr); + goto err_ring; + } + } else { + addr = i915_gem_object_pin_map(vma->obj, I915_MAP_WB); + if (IS_ERR(addr)) { + ret = PTR_ERR(addr); + goto err_ring; + } } i915_vma_make_unshrinkable(vma); @@ -99,19 +108,24 @@ void intel_ring_unpin(struct intel_ring *ring) i915_gem_object_unpin_map(vma->obj); i915_vma_make_purgeable(vma); - i915_vma_unpin(vma); + if (!intel_ring_is_internal(ring)) + i915_vma_unpin(vma); } -static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size) +static struct i915_vma * +create_ring_vma(struct i915_ggtt *ggtt, int size, unsigned int flags) { struct i915_address_space *vm = &ggtt->vm; struct drm_i915_private *i915 = vm->i915; struct drm_i915_gem_object *obj; struct i915_vma *vma; - obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE); - if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt)) - obj = i915_gem_object_create_stolen(i915, size); + obj = ERR_PTR(-ENODEV); + if (!(flags & INTEL_RING_CREATE_INTERNAL)) { + obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE); + if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt)) + obj = i915_gem_object_create_stolen(i915, size); + } if (IS_ERR(obj)) obj = i915_gem_object_create_internal(i915, size); if (IS_ERR(obj)) @@ -136,12 +150,14 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size) } struct intel_ring * -intel_engine_create_ring(struct intel_engine_cs *engine, int size) +intel_engine_create_ring(struct intel_engine_cs *engine, unsigned int size) { struct drm_i915_private *i915 = engine->i915; + unsigned int flags = size & GENMASK(11, 0); struct intel_ring *ring; struct i915_vma *vma; + size ^= flags; GEM_BUG_ON(!is_power_of_2(size)); GEM_BUG_ON(RING_CTL_SIZE(size) & ~RING_NR_PAGES); @@ -150,8 +166,10 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size) return ERR_PTR(-ENOMEM); kref_init(&ring->ref); + ring->size = size; ring->wrap = BITS_PER_TYPE(ring->size) - ilog2(size); + ring->flags = flags; /* * Workaround an erratum on the i830 which causes a hang if @@ -164,11 +182,12 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size) intel_ring_update_space(ring); - vma = create_ring_vma(engine->gt->ggtt, size); + vma = create_ring_vma(engine->gt->ggtt, size, flags); if (IS_ERR(vma)) { kfree(ring); return ERR_CAST(vma); } + ring->vma = vma; return ring; diff --git a/drivers/gpu/drm/i915/gt/intel_ring.h b/drivers/gpu/drm/i915/gt/intel_ring.h index dbf5f14a136f..89d79c22fe9e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.h +++ b/drivers/gpu/drm/i915/gt/intel_ring.h @@ -8,12 +8,14 @@ #include "i915_gem.h" /* GEM_BUG_ON */ #include "i915_request.h" +#include "i915_vma.h" #include "intel_ring_types.h" struct intel_engine_cs; struct intel_ring * -intel_engine_create_ring(struct intel_engine_cs *engine, int size); +intel_engine_create_ring(struct intel_engine_cs *engine, unsigned int size); +#define INTEL_RING_CREATE_INTERNAL BIT(0) u32 *intel_ring_begin(struct i915_request *rq, unsigned int num_dwords); int intel_ring_cacheline_align(struct i915_request *rq); @@ -138,4 +140,17 @@ __intel_ring_space(unsigned int head, unsigned int tail, unsigned int size) return (head - tail - CACHELINE_BYTES) & (size - 1); } +static inline u32 intel_ring_address(const struct intel_ring *ring) +{ + if (ring->flags & INTEL_RING_CREATE_INTERNAL) + return -1; + + return i915_ggtt_offset(ring->vma); +} + +static inline bool intel_ring_is_internal(const struct intel_ring *ring) +{ + return ring->flags & INTEL_RING_CREATE_INTERNAL; +} + #endif /* INTEL_RING_H */ diff --git a/drivers/gpu/drm/i915/gt/intel_ring_types.h b/drivers/gpu/drm/i915/gt/intel_ring_types.h index 49ccb76dda3b..3d091c699110 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_types.h +++ b/drivers/gpu/drm/i915/gt/intel_ring_types.h @@ -46,6 +46,8 @@ struct intel_ring { u32 size; u32 wrap; u32 effective_size; + + unsigned long flags; }; #endif /* INTEL_RING_TYPES_H */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 838fd26c5ac6..de9c187290cd 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -109,11 +109,14 @@ static void i915_sched_init_ipi(struct i915_sched_ipi *ipi) ipi->list = NULL; } -static bool match_ring(struct i915_request *rq) +static bool match_ring(const struct i915_request *rq) { const struct intel_engine_cs *engine = rq->engine; const struct intel_ring *ring = rq->ring; + if (intel_ring_is_internal(ring)) + return true; + return ENGINE_READ(engine, RING_START) == i915_ggtt_offset(ring->vma); } @@ -1655,7 +1658,7 @@ void i915_sched_show(struct drm_printer *m, i915_request_show(m, rq, "\t\tactive ", 0); drm_printf(m, "\t\tring->start: 0x%08x\n", - i915_ggtt_offset(rq->ring->vma)); + intel_ring_address(rq->ring)); drm_printf(m, "\t\tring->head: 0x%08x\n", rq->ring->head); drm_printf(m, "\t\tring->tail: 0x%08x\n", From patchwork Mon Feb 1 08:57:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058427 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93405C433E6 for ; Mon, 1 Feb 2021 08:57:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 359ED64E33 for ; Mon, 1 Feb 2021 08:57:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 359ED64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 321606E505; Mon, 1 Feb 2021 08:57:37 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id A55636E4A5 for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757785-1500050 for multiple; Mon, 01 Feb 2021 08:57:25 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:11 +0000 Message-Id: <20210201085715.27435-53-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 53/57] drm/i915/gt: Use client timeline address for seqno writes X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If we allow for per-client timelines, even with legacy ring submission, we open the door to a world full of possiblities [scheduling and semaphores]. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/gen2_engine_cs.c | 72 ++++++++++++++- drivers/gpu/drm/i915/gt/gen2_engine_cs.h | 5 +- drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 89 +++++++++++++------ drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 23 ++--- .../gpu/drm/i915/gt/intel_ring_submission.c | 30 +++---- drivers/gpu/drm/i915/i915_request.h | 13 +++ 6 files changed, 169 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c index b491a64919c8..b3fff7a955f2 100644 --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c @@ -172,9 +172,77 @@ u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs) return __gen2_emit_breadcrumb(rq, cs, 16, 8); } -u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs) +static u32 *__gen4_emit_breadcrumb(struct i915_request *rq, u32 *cs, + int flush, int post) { - return __gen2_emit_breadcrumb(rq, cs, 8, 8); + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = __i915_request_hwsp_offset(rq); + + GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT); + + *cs++ = MI_FLUSH; + + while (flush--) { + *cs++ = MI_STORE_DWORD_INDEX; + *cs++ = I915_GEM_HWS_SCRATCH * sizeof(u32); + *cs++ = rq->fence.seqno; + } + + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + while (post--) { + *cs++ = MI_STORE_DWORD_INDEX; + *cs++ = offset; + *cs++ = rq->fence.seqno; + *cs++ = MI_NOOP; + } + } else { + while (post--) { + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; + *cs++ = 0; + *cs++ = offset; + *cs++ = rq->fence.seqno; + } + } + + *cs++ = MI_USER_INTERRUPT; + + rq->tail = intel_ring_offset(rq, cs); + assert_ring_tail_valid(rq->ring, rq->tail); + + return cs; +} + +u32 *gen4_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs) +{ + return __gen4_emit_breadcrumb(rq, cs, 8, 8); +} + +int gen4_emit_init_breadcrumb_xcs(struct i915_request *rq) +{ + struct intel_timeline *tl = i915_request_timeline(rq); + u32 *cs; + + GEM_BUG_ON(i915_request_has_initial_breadcrumb(rq)); + if (!intel_timeline_has_initial_breadcrumb(tl)) + return 0; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; + *cs++ = 0; + *cs++ = __i915_request_hwsp_offset(rq); + *cs++ = rq->fence.seqno - 1; + + intel_ring_advance(rq, cs); + + /* Record the updated position of the request's payload */ + rq->infix = intel_ring_offset(rq, cs); + + __set_bit(I915_FENCE_FLAG_INITIAL_BREADCRUMB, &rq->fence.flags); + return 0; } /* Just userspace ABI convention to limit the wa batch bo to a resonable size */ diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.h b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h index a5cd64a65c9e..ba7567b15229 100644 --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.h +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h @@ -16,7 +16,10 @@ int gen4_emit_flush_rcs(struct i915_request *rq, u32 mode); int gen4_emit_flush_vcs(struct i915_request *rq, u32 mode); u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs); -u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs); +u32 *gen4_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs); + +u32 *gen4_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs); +int gen4_emit_init_breadcrumb_xcs(struct i915_request *rq); int i830_emit_bb_start(struct i915_request *rq, u64 offset, u32 len, diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c index 2f59dd3bdc18..14cab4c726ce 100644 --- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c @@ -141,6 +141,12 @@ int gen6_emit_flush_rcs(struct i915_request *rq, u32 mode) u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = __i915_request_hwsp_offset(rq); + unsigned int flags; + + GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT); + /* First we do the gen6_emit_post_sync_nonzero_flush w/a */ *cs++ = GFX_OP_PIPE_CONTROL(4); *cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD; @@ -154,15 +160,22 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) PIPE_CONTROL_GLOBAL_GTT; *cs++ = 0; - /* Finally we can flush and with it emit the breadcrumb */ - *cs++ = GFX_OP_PIPE_CONTROL(4); - *cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | + flags = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DC_FLUSH_ENABLE | PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL); - *cs++ = i915_request_active_timeline(rq)->ggtt_offset | - PIPE_CONTROL_GLOBAL_GTT; + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + } + if (!intel_timeline_in_context(tl)) + offset |= PIPE_CONTROL_GLOBAL_GTT; + + /* Finally we can flush and with it emit the breadcrumb */ + *cs++ = GFX_OP_PIPE_CONTROL(4); + *cs++ = flags; + *cs++ = offset; *cs++ = rq->fence.seqno; *cs++ = MI_USER_INTERRUPT; @@ -351,15 +364,28 @@ int gen7_emit_flush_rcs(struct i915_request *rq, u32 mode) u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { - *cs++ = GFX_OP_PIPE_CONTROL(4); - *cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = __i915_request_hwsp_offset(rq); + unsigned int flags; + + GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT); + + flags = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DC_FLUSH_ENABLE | PIPE_CONTROL_FLUSH_ENABLE | PIPE_CONTROL_QW_WRITE | - PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_CS_STALL); - *cs++ = i915_request_active_timeline(rq)->ggtt_offset; + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + } + if (!intel_timeline_in_context(tl)) + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; + + *cs++ = GFX_OP_PIPE_CONTROL(4); + *cs++ = flags; + *cs++ = offset; *cs++ = rq->fence.seqno; *cs++ = MI_USER_INTERRUPT; @@ -373,11 +399,21 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs) { - GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma); - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR); + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = __i915_request_hwsp_offset(rq); + unsigned int flags = 0; - *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX; - *cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT; + GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT); + + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + flags |= MI_FLUSH_DW_STORE_INDEX; + } + if (!intel_timeline_in_context(tl)) + offset |= MI_FLUSH_DW_USE_GTT; + + *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | flags; + *cs++ = offset; *cs++ = rq->fence.seqno; *cs++ = MI_USER_INTERRUPT; @@ -391,28 +427,31 @@ u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs) #define GEN7_XCS_WA 32 u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs) { + struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); + u32 offset = __i915_request_hwsp_offset(rq); + u32 cmd = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW; int i; - GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma); - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR); + GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT); - *cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB | - MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX; - *cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT; + if (intel_timeline_is_relative(tl)) { + offset = offset_in_page(offset); + cmd |= MI_FLUSH_DW_STORE_INDEX; + } + if (!intel_timeline_in_context(tl)) + offset |= MI_FLUSH_DW_USE_GTT; + + *cs++ = cmd; + *cs++ = offset; *cs++ = rq->fence.seqno; for (i = 0; i < GEN7_XCS_WA; i++) { - *cs++ = MI_STORE_DWORD_INDEX; - *cs++ = I915_GEM_HWS_SEQNO_ADDR; + *cs++ = cmd; + *cs++ = offset; *cs++ = rq->fence.seqno; } - *cs++ = MI_FLUSH_DW; - *cs++ = 0; - *cs++ = 0; - *cs++ = MI_USER_INTERRUPT; - *cs++ = MI_NOOP; rq->tail = intel_ring_offset(rq, cs); assert_ring_tail_valid(rq->ring, rq->tail); diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 7fd843369b41..4a0d32584ef0 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -336,19 +336,6 @@ static u32 preempt_address(struct intel_engine_cs *engine) I915_GEM_HWS_PREEMPT_ADDR); } -static u32 hwsp_offset(const struct i915_request *rq) -{ - const struct intel_timeline_cacheline *cl; - - /* Before the request is executed, the timeline/cachline is fixed */ - - cl = rcu_dereference_protected(rq->hwsp_cacheline, 1); - if (cl) - return cl->ggtt_offset; - - return rcu_dereference_protected(rq->timeline, 1)->ggtt_offset; -} - int gen8_emit_init_breadcrumb(struct i915_request *rq) { u32 *cs; @@ -362,7 +349,7 @@ int gen8_emit_init_breadcrumb(struct i915_request *rq) return PTR_ERR(cs); *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; - *cs++ = hwsp_offset(rq); + *cs++ = __i915_request_hwsp_offset(rq); *cs++ = 0; *cs++ = rq->fence.seqno - 1; @@ -520,7 +507,7 @@ static u32 *emit_xcs_breadcrumb(struct i915_request *rq, u32 *cs) { struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); unsigned int flags = MI_FLUSH_DW_OP_STOREDW; - u32 offset = hwsp_offset(rq); + u32 offset = __i915_request_hwsp_offset(rq); if (intel_timeline_is_relative(tl)) { offset = offset_in_page(offset); @@ -542,7 +529,7 @@ u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); unsigned int flags = PIPE_CONTROL_FLUSH_ENABLE | PIPE_CONTROL_CS_STALL; - u32 offset = hwsp_offset(rq); + u32 offset = __i915_request_hwsp_offset(rq); if (intel_timeline_is_relative(tl)) { offset = offset_in_page(offset); @@ -567,7 +554,7 @@ u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) u32 *gen11_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); - u32 offset = hwsp_offset(rq); + u32 offset = __i915_request_hwsp_offset(rq); unsigned int flags; flags = (PIPE_CONTROL_CS_STALL | @@ -649,7 +636,7 @@ u32 *gen12_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs) u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1); - u32 offset = hwsp_offset(rq); + u32 offset = __i915_request_hwsp_offset(rq); unsigned int flags; flags = (PIPE_CONTROL_CS_STALL | diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 9d193acd260b..ef80f47f468a 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1043,11 +1043,6 @@ static void setup_common(struct intel_engine_cs *engine) * equivalent to our next initial bread so we can elide * engine->emit_init_breadcrumb(). */ - engine->emit_fini_breadcrumb = gen3_emit_breadcrumb; - if (IS_GEN(i915, 5)) - engine->emit_fini_breadcrumb = gen5_emit_breadcrumb; - - engine->set_default_submission = i9xx_set_default_submission; if (INTEL_GEN(i915) >= 6) engine->emit_bb_start = gen6_emit_bb_start; @@ -1057,6 +1052,17 @@ static void setup_common(struct intel_engine_cs *engine) engine->emit_bb_start = i830_emit_bb_start; else engine->emit_bb_start = gen3_emit_bb_start; + + if (INTEL_GEN(i915) >= 7) + engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs; + else if (INTEL_GEN(i915) >= 6) + engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs; + else if (INTEL_GEN(i915) >= 4) + engine->emit_fini_breadcrumb = gen4_emit_breadcrumb_xcs; + else + engine->emit_fini_breadcrumb = gen3_emit_breadcrumb; + + engine->set_default_submission = i9xx_set_default_submission; } static void setup_rcs(struct intel_engine_cs *engine) @@ -1098,11 +1104,6 @@ static void setup_vcs(struct intel_engine_cs *engine) engine->set_default_submission = gen6_bsd_set_default_submission; engine->emit_flush = gen6_emit_flush_vcs; engine->irq_enable_mask = GT_BSD_USER_INTERRUPT; - - if (IS_GEN(i915, 6)) - engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs; - else - engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs; } else { engine->emit_flush = gen4_emit_flush_vcs; if (IS_GEN(i915, 5)) @@ -1116,13 +1117,10 @@ static void setup_bcs(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; + GEM_BUG_ON(INTEL_GEN(i915) < 6); + engine->emit_flush = gen6_emit_flush_xcs; engine->irq_enable_mask = GT_BLT_USER_INTERRUPT; - - if (IS_GEN(i915, 6)) - engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs; - else - engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs; } static void setup_vecs(struct intel_engine_cs *engine) @@ -1135,8 +1133,6 @@ static void setup_vecs(struct intel_engine_cs *engine) engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT; engine->irq_enable = hsw_irq_enable_vecs; engine->irq_disable = hsw_irq_disable_vecs; - - engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs; } static int gen7_ctx_switch_bb_setup(struct intel_engine_cs * const engine, diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 7c29d33e7d51..62840206e3dd 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -624,6 +624,19 @@ i915_request_active_timeline(const struct i915_request *rq) lockdep_is_held(&i915_request_get_scheduler(rq)->lock)); } +static inline u32 __i915_request_hwsp_offset(const struct i915_request *rq) +{ + const struct intel_timeline_cacheline *cl; + + /* Before the request is executed, the timeline/cachline is fixed */ + + cl = rcu_dereference_protected(rq->hwsp_cacheline, 1); + if (cl) + return cl->ggtt_offset; + + return rcu_dereference_protected(rq->timeline, 1)->ggtt_offset; +} + static inline bool i915_request_use_scheduler(const struct i915_request *rq) { return i915_sched_is_active(i915_request_get_scheduler(rq)); From patchwork Mon Feb 1 08:57:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80DF3C43381 for ; Mon, 1 Feb 2021 08:57:59 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2596264E33 for ; Mon, 1 Feb 2021 08:57:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2596264E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B576C6E514; Mon, 1 Feb 2021 08:57:38 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 826F76E4A1 for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757787-1500050 for multiple; Mon, 01 Feb 2021 08:57:25 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:12 +0000 Message-Id: <20210201085715.27435-54-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 54/57] drm/i915/gt: Infrastructure for ring scheduling X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Build a bare bones scheduler to sit on top the global legacy ringbuffer submission. This virtual execlists scheme should be applicable to all older platforms. A key problem we have with the legacy ring buffer submission is that it only allows for FIFO queuing. All clients share the global request queue and must contend for its lock when submitting. As any client may need to wait for external events, all clients must then wait. However, if we stage each client into their own virtual ringbuffer with their own timelines, we can copy the client requests into the global ringbuffer only when they are ready, reordering the submission around stalls. Furthermore, the ability to reorder gives us rudimentarily priority sorting -- although without preemption support, once something is on the GPU it stays on the GPU, and so it is still possible for a hog to delay a high priority request (such as updating the display). However, it does means that in keeping a short submission queue, the high priority request will be next. This design resembles the old guc submission scheduler, for reordering requests onto a global workqueue. The implementation uses the MI_USER_INTERRUPT at the end of every request to track completion, so is more interrupt happy than execlists [which has an interrupt for each context event, albeit two]. Our interrupts on these system are relatively heavy, and in the past we have been able to completely starve Sandybrige by the interrupt traffic. Our interrupt handlers are being much better (in part offloading the work to bottom halves leaving the interrupt itself only dealing with acking the registers) but we can still see the impact of starvation in the uneven submission latency on a saturated system. Overall though, the short sumission queues and extra interrupts do not appear to be affecting throughput (+-10%, some tasks even improve to the reduced request overheads) and improve latency. [Which is a massive improvement since the introduction of Sandybridge!] Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine.h | 1 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + .../gpu/drm/i915/gt/intel_ring_scheduler.c | 783 ++++++++++++++++++ .../gpu/drm/i915/gt/intel_ring_submission.c | 17 +- .../gpu/drm/i915/gt/intel_ring_submission.h | 17 + 6 files changed, 812 insertions(+), 8 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_ring_scheduler.c create mode 100644 drivers/gpu/drm/i915/gt/intel_ring_submission.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index ce01634d4ea7..1f9c98eae605 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -115,6 +115,7 @@ gt-y += \ gt/intel_renderstate.o \ gt/intel_reset.o \ gt/intel_ring.o \ + gt/intel_ring_scheduler.o \ gt/intel_ring_submission.o \ gt/intel_rps.o \ gt/intel_sseu.o \ diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 33a29623571d..bc07c96ab48c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -193,6 +193,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine); int intel_engine_resume(struct intel_engine_cs *engine); int intel_ring_submission_setup(struct intel_engine_cs *engine); +int intel_ring_scheduler_setup(struct intel_engine_cs *engine); int intel_engine_stop_cs(struct intel_engine_cs *engine); void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e94c99dee5cb..9f14cc631287 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -318,6 +318,7 @@ struct intel_engine_cs { struct { struct intel_ring *ring; struct intel_timeline *timeline; + struct intel_context *context; } legacy; /* diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c new file mode 100644 index 000000000000..b6fcb18ef0a6 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c @@ -0,0 +1,783 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include + +#include + +#include "i915_drv.h" +#include "intel_breadcrumbs.h" +#include "intel_context.h" +#include "intel_engine_pm.h" +#include "intel_engine_stats.h" +#include "intel_gt.h" +#include "intel_gt_pm.h" +#include "intel_gt_requests.h" +#include "intel_reset.h" +#include "intel_ring.h" +#include "intel_ring_submission.h" +#include "shmem_utils.h" + +/* + * Rough estimate of the typical request size, performing a flush, + * set-context and then emitting the batch. + */ +#define LEGACY_REQUEST_SIZE 200 + +static void +set_current_context(struct intel_context **ptr, struct intel_context *ce) +{ + if (ce) + intel_context_get(ce); + + ce = xchg(ptr, ce); + + if (ce) + intel_context_put(ce); +} + +static inline void runtime_start(struct intel_context *ce) +{ + struct intel_context_stats *stats = &ce->stats; + + if (intel_context_is_barrier(ce)) + return; + + if (stats->active) + return; + + WRITE_ONCE(stats->active, intel_context_clock()); +} + +static inline void runtime_stop(struct intel_context *ce) +{ + struct intel_context_stats *stats = &ce->stats; + ktime_t dt; + + if (!stats->active) + return; + + dt = ktime_sub(intel_context_clock(), stats->active); + ewma_runtime_add(&stats->runtime.avg, dt); + stats->runtime.total += dt; + + WRITE_ONCE(stats->active, 0); +} + +static struct intel_engine_cs *__schedule_in(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + struct intel_engine_cs *engine = rq->engine; + + intel_context_get(ce); + + __intel_gt_pm_get(engine->gt); + if (engine->fw_domain && !engine->fw_active++) + intel_uncore_forcewake_get(engine->uncore, engine->fw_domain); + + intel_engine_context_in(engine); + + CE_TRACE(ce, "schedule-in\n"); + + return engine; +} + +static void schedule_in(struct i915_request *rq) +{ + struct intel_context * const ce = rq->context; + struct intel_engine_cs *old; + + GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); + + old = ce->inflight; + if (!old) + old = __schedule_in(rq); + WRITE_ONCE(ce->inflight, ptr_inc(old)); + + GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); + GEM_BUG_ON(!intel_context_inflight_count(ce)); +} + +static void __schedule_out(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + struct intel_engine_cs *engine = rq->engine; + + CE_TRACE(ce, "schedule-out\n"); + + if (intel_timeline_is_last(ce->timeline, rq)) + intel_engine_add_retire(engine, ce->timeline); + else + i915_request_update_deadline(list_next_entry(rq, link)); + + intel_engine_context_out(engine); + + if (engine->fw_domain && !--engine->fw_active) + intel_uncore_forcewake_put(engine->uncore, engine->fw_domain); + intel_gt_pm_put_async(engine->gt); +} + +static void schedule_out(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + + GEM_BUG_ON(!ce->inflight); + ce->inflight = ptr_dec(ce->inflight); + if (!intel_context_inflight_count(ce)) { + GEM_BUG_ON(ce->inflight != rq->engine); + __schedule_out(rq); + WRITE_ONCE(ce->inflight, NULL); + intel_context_put(ce); + } + + i915_request_put(rq); +} + +static u32 *ring_map(struct intel_ring *ring, u32 len) +{ + u32 *va; + + if (unlikely(ring->tail + len > ring->effective_size)) { + memset(ring->vaddr + ring->tail, 0, ring->size - ring->tail); + ring->tail = 0; + } + + va = ring->vaddr + ring->tail; + ring->tail = intel_ring_wrap(ring, ring->tail + len); + + return va; +} + +static inline u32 *ring_map_dw(struct intel_ring *ring, u32 len) +{ + return ring_map(ring, len * sizeof(u32)); +} + +static inline void ring_advance(struct intel_ring *ring, void *map) +{ + GEM_BUG_ON(intel_ring_wrap(ring, map - ring->vaddr) != ring->tail); +} + +static void ring_copy(struct intel_ring *dst, + const struct intel_ring *src, + u32 start, u32 end) +{ + unsigned int len; + void *out; + + len = end - start; + if (end < start) + len += src->size; + out = ring_map(dst, len); + + if (end < start) { + len = src->size - start; + memcpy(out, src->vaddr + start, len); + out += len; + start = 0; + } + + memcpy(out, src->vaddr + start, end - start); +} + +static void switch_context(struct intel_ring *ring, struct i915_request *rq) +{ +} + +static struct i915_request *ring_submit(struct i915_request *rq) +{ + struct intel_ring *ring = rq->engine->legacy.ring; + + __i915_request_submit(rq); + + if (rq->engine->legacy.context != rq->context) { + switch_context(ring, rq); + set_current_context(&rq->engine->legacy.context, rq->context); + } + + ring_copy(ring, rq->ring, rq->head, rq->tail); + return rq; +} + +static struct i915_request ** +copy_active(struct i915_request **port, struct i915_request * const *active) +{ + while (*active) + *port++ = *active++; + + return port; +} + +static inline void +copy_ports(struct i915_request **dst, struct i915_request **src, int count) +{ + /* A memcpy_p() would be very useful here! */ + while (count--) + WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ +} + +static inline void write_tail(const struct intel_engine_cs *engine) +{ + wmb(); /* paranoid flush of WCB before RING_TAIL write */ + ENGINE_WRITE(engine, RING_TAIL, engine->legacy.ring->tail); +} + +static void dequeue(struct i915_sched *se, struct intel_engine_cs *engine) +{ + struct intel_engine_execlists * const el = &engine->execlists; + struct i915_request ** const last_port = el->pending + el->port_mask; + struct i915_request **port, **first, *last; + struct i915_priolist *p; + + first = copy_active(el->pending, el->active); + if (first > last_port) + return; + + local_irq_disable(); + + last = NULL; + port = first; + spin_lock(&se->lock); + for_each_priolist(p, &se->queue) { + struct i915_request *rq, *rn; + + priolist_for_each_request_safe(rq, rn, p) { + GEM_BUG_ON(rq == last); + if (last && rq->context != last->context) { + if (port == last_port) + goto done; + + *port++ = i915_request_get(last); + } + + last = ring_submit(rq); + } + + i915_priolist_advance(&se->queue, p); + } +done: + spin_unlock(&se->lock); + + if (last) { + *port++ = i915_request_get(last); + *port = NULL; + + if (!*el->active) + runtime_start((*el->pending)->context); + WRITE_ONCE(el->active, el->pending); + + copy_ports(el->inflight, el->pending, port - el->pending + 1); + while (port-- != first) + schedule_in(*port); + + write_tail(engine); + + WRITE_ONCE(el->active, el->inflight); + GEM_BUG_ON(!*el->active); + } + + WRITE_ONCE(el->pending[0], NULL); + + local_irq_enable(); /* flush irq_work *after* RING_TAIL write */ +} + +static void post_process_csb(struct i915_request **port, + struct i915_request **last) +{ + while (port != last) + schedule_out(*port++); +} + +static struct i915_request ** +process_csb(struct intel_engine_execlists *el, struct i915_request **inactive) +{ + struct i915_request *rq; + + while ((rq = *el->active)) { + if (!__i915_request_is_complete(rq)) { + runtime_start(rq->context); + break; + } + + *inactive++ = rq; + el->active++; + + runtime_stop(rq->context); + } + + return inactive; +} + +static void submission_tasklet(struct tasklet_struct *t) +{ + struct i915_sched *se = from_tasklet(se, t, tasklet); + struct intel_engine_cs * const engine = + container_of(se, typeof(*engine), sched); + struct i915_request *post[2 * EXECLIST_MAX_PORTS]; + struct i915_request **inactive; + + rcu_read_lock(); + inactive = process_csb(&engine->execlists, post); + GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); + + if (!i915_sched_is_idle(se)) + dequeue(se, engine); + + post_process_csb(post, inactive); + rcu_read_unlock(); +} + +static void reset_prepare(struct intel_engine_cs *engine) +{ + GEM_TRACE("%s\n", engine->name); + + i915_sched_disable(intel_engine_get_scheduler(engine)); + + intel_ring_submission_reset_prepare(engine); +} + +static inline void clear_ports(struct i915_request **ports, int count) +{ + memset_p((void **)ports, NULL, count); +} + +static struct i915_request ** +cancel_port_requests(struct intel_engine_execlists * const el, + struct i915_request **inactive) +{ + struct i915_request * const *port; + + clear_ports(el->pending, ARRAY_SIZE(el->pending)); + + /* Mark the end of active before we overwrite *active */ + for (port = xchg(&el->active, el->pending); *port; port++) + *inactive++ = *port; + clear_ports(el->inflight, ARRAY_SIZE(el->inflight)); + + smp_wmb(); /* complete the seqlock for execlists_active() */ + WRITE_ONCE(el->active, el->inflight); + + return inactive; +} + +static void __ring_rewind(struct intel_engine_cs *engine, bool stalled) +{ + struct i915_sched *se = intel_engine_get_scheduler(engine); + struct i915_request *rq; + unsigned long flags; + + rcu_read_lock(); + spin_lock_irqsave(&se->lock, flags); + rq = __i915_sched_rewind_requests(engine); + spin_unlock_irqrestore(&se->lock, flags); + if (rq && __i915_request_has_started(rq)) + __i915_request_reset(rq, stalled); + rcu_read_unlock(); +} + +static void ring_reset_csb(struct intel_engine_cs *engine) +{ + struct intel_engine_execlists * const el = &engine->execlists; + struct i915_request *post[2 * EXECLIST_MAX_PORTS]; + struct i915_request **inactive; + + rcu_read_lock(); + inactive = cancel_port_requests(el, post); + + /* Clear the global submission state, we will submit from scratch */ + intel_ring_reset(engine->legacy.ring, 0); + set_current_context(&engine->legacy.context, NULL); + + post_process_csb(post, inactive); + rcu_read_unlock(); +} + +static void ring_reset_rewind(struct intel_engine_cs *engine, bool stalled) +{ + ring_reset_csb(engine); + __ring_rewind(engine, stalled); +} + +static void ring_reset_cancel(struct intel_engine_cs *engine) +{ + struct i915_sched *se = intel_engine_get_scheduler(engine); + struct i915_request *rq, *rn; + struct i915_priolist *p; + unsigned long flags; + + ring_reset_csb(engine); + + rcu_read_lock(); + spin_lock_irqsave(&se->lock, flags); + + /* Mark all submitted requests as skipped. */ + list_for_each_entry(rq, &se->requests, sched.link) + i915_request_mark_eio(rq); + intel_engine_signal_breadcrumbs(engine); + + /* Flush the queued requests to the timeline list (for retiring). */ + for_each_priolist(p, &se->queue) { + priolist_for_each_request_safe(rq, rn, p) { + i915_request_mark_eio(rq); + __i915_request_submit(rq); + } + i915_priolist_advance(&se->queue, p); + } + GEM_BUG_ON(!i915_sched_is_idle(se)); + + /* Remaining _unready_ requests will be nop'ed when submitted */ + + spin_unlock_irqrestore(&se->lock, flags); + rcu_read_unlock(); +} + +static void reset_finish(struct intel_engine_cs *engine) +{ + intel_ring_submission_reset_finish(engine); + i915_sched_enable(intel_engine_get_scheduler(engine)); +} + +static void submission_park(struct intel_engine_cs *engine) +{ + /* drain the submit queue */ + intel_breadcrumbs_unpin_irq(engine->breadcrumbs); + intel_engine_kick_scheduler(engine); +} + +static void submission_unpark(struct intel_engine_cs *engine) +{ + intel_breadcrumbs_pin_irq(engine->breadcrumbs); +} + +static void ring_context_destroy(struct kref *ref) +{ + struct intel_context *ce = container_of(ref, typeof(*ce), ref); + + GEM_BUG_ON(intel_context_is_pinned(ce)); + + if (ce->state) + i915_vma_put(ce->state); + if (test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) + intel_ring_put(ce->ring); + + intel_context_fini(ce); + intel_context_free(ce); +} + +static int alloc_context_vma(struct intel_context *ce) + +{ + struct intel_engine_cs *engine = ce->engine; + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err; + + obj = i915_gem_object_create_shmem(engine->i915, engine->context_size); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + /* + * Try to make the context utilize L3 as well as LLC. + * + * On VLV we don't have L3 controls in the PTEs so we + * shouldn't touch the cache level, especially as that + * would make the object snooped which might have a + * negative performance impact. + * + * Snooping is required on non-llc platforms in execlist + * mode, but since all GGTT accesses use PAT entry 0 we + * get snooping anyway regardless of cache_level. + * + * This is only applicable for Ivy Bridge devices since + * later platforms don't have L3 control bits in the PTE. + */ + if (IS_IVYBRIDGE(engine->i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC); + + if (engine->default_state) { + void *vaddr; + + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + goto err_obj; + } + + shmem_read(engine->default_state, 0, + vaddr, engine->context_size); + __set_bit(CONTEXT_VALID_BIT, &ce->flags); + + i915_gem_object_flush_map(obj); + i915_gem_object_unpin_map(obj); + } + + vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_obj; + } + + ce->state = vma; + return 0; + +err_obj: + i915_gem_object_put(obj); + return err; +} + +static struct intel_timeline *pinned_timeline(struct intel_context *ce) +{ + struct intel_timeline *tl = fetch_and_zero(&ce->timeline); + + return intel_timeline_create_from_engine(ce->engine, + page_unmask_bits(tl)); +} + +static int alloc_timeline(struct intel_context *ce) +{ + struct intel_engine_cs *engine = ce->engine; + struct intel_timeline *tl; + + if (unlikely(ce->timeline)) + tl = pinned_timeline(ce); + else + tl = intel_timeline_create(engine->gt); + if (IS_ERR(tl)) + return PTR_ERR(tl); + + ce->timeline = tl; + return 0; +} + +static int ring_context_alloc(struct intel_context *ce) +{ + struct intel_engine_cs *engine = ce->engine; + struct intel_ring *ring; + int err; + + GEM_BUG_ON(ce->state); + if (engine->context_size) { + err = alloc_context_vma(ce); + if (err) + return err; + } + + if (!page_mask_bits(ce->timeline)) { + err = alloc_timeline(ce); + if (err) + goto err_vma; + } + + ring = intel_engine_create_ring(engine, + (unsigned long)ce->ring | + INTEL_RING_CREATE_INTERNAL); + if (IS_ERR(ring)) { + err = PTR_ERR(ring); + goto err_timeline; + } + ce->ring = ring; + + return 0; + +err_timeline: + intel_timeline_put(ce->timeline); +err_vma: + if (ce->state) { + i915_vma_put(ce->state); + ce->state = NULL; + } + return err; +} + +static int ring_context_pre_pin(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, + void **unused) +{ + return 0; +} + +static int ring_context_pin(struct intel_context *ce, void *unused) +{ + return 0; +} + +static void ring_context_unpin(struct intel_context *ce) +{ +} + +static void ring_context_post_unpin(struct intel_context *ce) +{ +} + +static void ring_context_reset(struct intel_context *ce) +{ + intel_ring_reset(ce->ring, 0); + clear_bit(CONTEXT_VALID_BIT, &ce->flags); +} + +static const struct intel_context_ops ring_context_ops = { + .flags = COPS_HAS_INFLIGHT, + + .alloc = ring_context_alloc, + + .pre_pin = ring_context_pre_pin, + .pin = ring_context_pin, + .unpin = ring_context_unpin, + .post_unpin = ring_context_post_unpin, + + .enter = intel_context_enter_engine, + .exit = intel_context_exit_engine, + + .reset = ring_context_reset, + .destroy = ring_context_destroy, +}; + +static int ring_request_alloc(struct i915_request *rq) +{ + int ret; + + GEM_BUG_ON(!intel_context_is_pinned(rq->context)); + + /* + * Flush enough space to reduce the likelihood of waiting after + * we start building the request - in which case we will just + * have to repeat work. + */ + rq->reserved_space += LEGACY_REQUEST_SIZE; + + /* Unconditionally invalidate GPU caches and TLBs. */ + ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (ret) + return ret; + + rq->reserved_space -= LEGACY_REQUEST_SIZE; + return 0; +} + +static void set_default_submission(struct intel_engine_cs *engine) +{ + engine->sched.submit_request = i915_request_enqueue; +} + +static void ring_release(struct intel_engine_cs *engine) +{ + intel_engine_cleanup_common(engine); + + set_current_context(&engine->legacy.context, NULL); + + intel_ring_unpin(engine->legacy.ring); + intel_ring_put(engine->legacy.ring); +} + +static void setup_irq(struct intel_engine_cs *engine) +{ +} + +static void setup_common(struct intel_engine_cs *engine) +{ + struct drm_i915_private *i915 = engine->i915; + + /* gen8+ are only supported with execlists */ + GEM_BUG_ON(INTEL_GEN(i915) >= 8); + GEM_BUG_ON(INTEL_GEN(i915) < 8); + + setup_irq(engine); + + engine->park = submission_park; + engine->unpark = submission_unpark; + + engine->resume = intel_ring_submission_resume; + engine->sanitize = intel_ring_submission_sanitize; + + engine->reset.prepare = reset_prepare; + engine->reset.rewind = ring_reset_rewind; + engine->reset.cancel = ring_reset_cancel; + engine->reset.finish = reset_finish; + + engine->cops = &ring_context_ops; + engine->request_alloc = ring_request_alloc; + + engine->set_default_submission = set_default_submission; +} + +static void setup_rcs(struct intel_engine_cs *engine) +{ +} + +static void setup_vcs(struct intel_engine_cs *engine) +{ +} + +static void setup_bcs(struct intel_engine_cs *engine) +{ +} + +static void setup_vecs(struct intel_engine_cs *engine) +{ + GEM_BUG_ON(!IS_HASWELL(engine->i915)); +} + +static unsigned int global_ring_size(void) +{ + /* Enough space to hold 2 clients and the context switch */ + return roundup_pow_of_two(EXECLIST_MAX_PORTS * SZ_16K + SZ_4K); +} + +int intel_ring_scheduler_setup(struct intel_engine_cs *engine) +{ + struct intel_ring *ring; + int err; + + GEM_BUG_ON(HAS_EXECLISTS(engine->i915)); + + tasklet_setup(&engine->sched.tasklet, submission_tasklet); + __set_bit(I915_SCHED_ACTIVE_BIT, &engine->sched.flags); + __set_bit(I915_SCHED_NEEDS_BREADCRUMB_BIT, &engine->sched.flags); + + setup_common(engine); + + switch (engine->class) { + case RENDER_CLASS: + setup_rcs(engine); + break; + case VIDEO_DECODE_CLASS: + setup_vcs(engine); + break; + case COPY_ENGINE_CLASS: + setup_bcs(engine); + break; + case VIDEO_ENHANCEMENT_CLASS: + setup_vecs(engine); + break; + default: + MISSING_CASE(engine->class); + return -ENODEV; + } + + ring = intel_engine_create_ring(engine, global_ring_size()); + if (IS_ERR(ring)) { + err = PTR_ERR(ring); + goto err; + } + + err = intel_ring_pin(ring, NULL); + if (err) + goto err_ring; + + GEM_BUG_ON(engine->legacy.ring); + engine->legacy.ring = ring; + + engine->flags |= I915_ENGINE_SUPPORTS_STATS; + + /* Finally, take ownership and responsibility for cleanup! */ + engine->release = ring_release; + return 0; + +err_ring: + intel_ring_put(ring); +err: + intel_engine_cleanup_common(engine); + return err; +} diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index ef80f47f468a..ede148c7b2bd 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -14,6 +14,7 @@ #include "intel_gt.h" #include "intel_reset.h" #include "intel_ring.h" +#include "intel_ring_submission.h" #include "shmem_utils.h" /* Rough estimate of the typical request size, performing a flush, @@ -176,7 +177,7 @@ static bool stop_ring(struct intel_engine_cs *engine) return (ENGINE_READ_FW(engine, RING_HEAD) & HEAD_ADDR) == 0; } -static int xcs_resume(struct intel_engine_cs *engine) +int intel_ring_submission_resume(struct intel_engine_cs *engine) { struct intel_ring *ring = engine->legacy.ring; @@ -264,7 +265,7 @@ static void sanitize_hwsp(struct intel_engine_cs *engine) intel_timeline_reset_seqno(tl); } -static void xcs_sanitize(struct intel_engine_cs *engine) +void intel_ring_submission_sanitize(struct intel_engine_cs *engine) { /* * Poison residual state on resume, in case the suspend didn't! @@ -289,7 +290,7 @@ static void xcs_sanitize(struct intel_engine_cs *engine) clflush_cache_range(engine->status_page.addr, PAGE_SIZE); } -static void reset_prepare(struct intel_engine_cs *engine) +void intel_ring_submission_reset_prepare(struct intel_engine_cs *engine) { /* * We stop engines, otherwise we might get failed reset and a @@ -388,7 +389,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled) spin_unlock_irqrestore(&se->lock, flags); } -static void reset_finish(struct intel_engine_cs *engine) +void intel_ring_submission_reset_finish(struct intel_engine_cs *engine) { } @@ -1027,13 +1028,13 @@ static void setup_common(struct intel_engine_cs *engine) setup_irq(engine); - engine->resume = xcs_resume; - engine->sanitize = xcs_sanitize; + engine->resume = intel_ring_submission_resume; + engine->sanitize = intel_ring_submission_sanitize; - engine->reset.prepare = reset_prepare; + engine->reset.prepare = intel_ring_submission_reset_prepare; engine->reset.rewind = reset_rewind; engine->reset.cancel = reset_cancel; - engine->reset.finish = reset_finish; + engine->reset.finish = intel_ring_submission_reset_finish; engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.h b/drivers/gpu/drm/i915/gt/intel_ring_submission.h new file mode 100644 index 000000000000..59a43c221748 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2020 Intel Corporation + */ + +#ifndef __INTEL_RING_SUBMISSION_H__ +#define __INTEL_RING_SUBMISSION_H__ + +struct intel_engine_cs; + +void intel_ring_submission_reset_prepare(struct intel_engine_cs *engine); +void intel_ring_submission_reset_finish(struct intel_engine_cs *engine); + +int intel_ring_submission_resume(struct intel_engine_cs *engine); +void intel_ring_submission_sanitize(struct intel_engine_cs *engine); + +#endif /* __INTEL_RING_SUBMISSION_H__ */ From patchwork Mon Feb 1 08:57:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058429 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66C78C433DB for ; Mon, 1 Feb 2021 08:57:57 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 00CC264E33 for ; Mon, 1 Feb 2021 08:57:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 00CC264E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2905D6E48B; Mon, 1 Feb 2021 08:57:36 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 663B36E499 for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757788-1500050 for multiple; Mon, 01 Feb 2021 08:57:25 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:13 +0000 Message-Id: <20210201085715.27435-55-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 55/57] drm/i915/gt: Implement ring scheduler for gen4-7 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" A key prolem with legacy ring buffer submission is that it is an inheret FIFO queue across all clients; if one blocks, they all block. A scheduler allows us to avoid that limitation, and ensures that all clients can submit in parallel, removing the resource contention of the global ringbuffer. Having built the ring scheduler infrastructure over top of the global ringbuffer submission, we now need to provide the HW knowledge required to build command packets and implement context switching. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_ring_scheduler.c | 459 +++++++++++++++++- drivers/gpu/drm/i915/i915_reg.h | 10 + 2 files changed, 466 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c index b6fcb18ef0a6..46372116011b 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c @@ -7,7 +7,12 @@ #include +#include "gen2_engine_cs.h" +#include "gen6_engine_cs.h" +#include "gen6_ppgtt.h" +#include "gen7_renderclear.h" #include "i915_drv.h" +#include "i915_mitigations.h" #include "intel_breadcrumbs.h" #include "intel_context.h" #include "intel_engine_pm.h" @@ -182,8 +187,270 @@ static void ring_copy(struct intel_ring *dst, memcpy(out, src->vaddr + start, end - start); } +static void mi_set_context(struct intel_ring *ring, + struct intel_engine_cs *engine, + struct intel_context *ce, + u32 flags) +{ + struct drm_i915_private *i915 = engine->i915; + enum intel_engine_id id; + const int num_engines = + IS_HASWELL(i915) ? engine->gt->info.num_engines - 1 : 0; + int len; + u32 *cs; + + len = 4; + if (IS_GEN(i915, 7)) + len += 2 + (num_engines ? 4 * num_engines + 6 : 0); + else if (IS_GEN(i915, 5)) + len += 2; + + cs = ring_map_dw(ring, len); + + /* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */ + if (IS_GEN(i915, 7)) { + *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; + if (num_engines) { + struct intel_engine_cs *signaller; + + *cs++ = MI_LOAD_REGISTER_IMM(num_engines); + for_each_engine(signaller, engine->gt, id) { + if (signaller == engine) + continue; + + *cs++ = i915_mmio_reg_offset( + RING_PSMI_CTL(signaller->mmio_base)); + *cs++ = _MASKED_BIT_ENABLE( + GEN6_PSMI_SLEEP_MSG_DISABLE); + } + } + } else if (IS_GEN(i915, 5)) { + /* + * This w/a is only listed for pre-production ilk a/b steppings, + * but is also mentioned for programming the powerctx. To be + * safe, just apply the workaround; we do not use SyncFlush so + * this should never take effect and so be a no-op! + */ + *cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN; + } + + *cs++ = MI_NOOP; + *cs++ = MI_SET_CONTEXT; + *cs++ = i915_ggtt_offset(ce->state) | flags; + /* + * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP + * WaMiSetContext_Hang:snb,ivb,vlv + */ + *cs++ = MI_NOOP; + + if (IS_GEN(i915, 7)) { + if (num_engines) { + struct intel_engine_cs *signaller; + i915_reg_t last_reg = {}; /* keep gcc quiet */ + + *cs++ = MI_LOAD_REGISTER_IMM(num_engines); + for_each_engine(signaller, engine->gt, id) { + if (signaller == engine) + continue; + + last_reg = RING_PSMI_CTL(signaller->mmio_base); + *cs++ = i915_mmio_reg_offset(last_reg); + *cs++ = _MASKED_BIT_DISABLE( + GEN6_PSMI_SLEEP_MSG_DISABLE); + } + + /* Insert a delay before the next switch! */ + *cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT; + *cs++ = i915_mmio_reg_offset(last_reg); + *cs++ = intel_gt_scratch_offset(engine->gt, + INTEL_GT_SCRATCH_FIELD_DEFAULT); + *cs++ = MI_NOOP; + } + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + } else if (IS_GEN(i915, 5)) { + *cs++ = MI_SUSPEND_FLUSH; + } + + ring_advance(ring, cs); +} + +static struct i915_address_space *vm_alias(struct i915_address_space *vm) +{ + if (i915_is_ggtt(vm)) + vm = &i915_vm_to_ggtt(vm)->alias->vm; + + return vm; +} + +static u32 pp_dir(const struct i915_ppgtt *ppgtt) +{ + return container_of(ppgtt, const struct gen6_ppgtt, base)->pp_dir; +} + +static void load_pd_dir(struct intel_ring *ring, + struct intel_engine_cs *engine, + const struct i915_ppgtt *ppgtt) +{ + u32 *cs = ring_map_dw(ring, 10); + + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base)); + *cs++ = PP_DIR_DCLV_2G; + + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base)); + *cs++ = pp_dir(ppgtt); + + /* Stall until the page table load is complete? */ + *cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT; + *cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base)); + *cs++ = intel_gt_scratch_offset(engine->gt, + INTEL_GT_SCRATCH_FIELD_DEFAULT); + *cs++ = MI_NOOP; + + ring_advance(ring, cs); +} + +static struct i915_address_space *current_vm(struct intel_engine_cs *engine) +{ + struct intel_context *old = engine->legacy.context; + + return old ? vm_alias(old->vm) : NULL; +} + +static void gen4_emit_invalidate_rcs(struct intel_ring *ring, + struct intel_engine_cs *engine) +{ + u32 addr, flags; + u32 *cs; + + addr = intel_gt_scratch_offset(engine->gt, + INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH); + + flags = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL; + flags |= PIPE_CONTROL_TLB_INVALIDATE; + + if (INTEL_GEN(engine->i915) >= 7) + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; + else + addr |= PIPE_CONTROL_GLOBAL_GTT; + + cs = ring_map_dw(ring, 4); + *cs++ = GFX_OP_PIPE_CONTROL(4); + *cs++ = flags; + *cs++ = addr; + *cs++ = 0; + ring_advance(ring, cs); +} + +static struct i915_address_space * +clear_residuals(struct intel_ring *ring, struct intel_engine_cs *engine) +{ + struct intel_context *ce = engine->kernel_context; + struct i915_address_space *vm = vm_alias(engine->gt->vm); + u32 flags; + + if (vm != current_vm(engine)) + load_pd_dir(ring, engine, i915_vm_to_ppgtt(vm)); + + if (ce->state) + mi_set_context(ring, engine, ce, + MI_MM_SPACE_GTT | MI_RESTORE_INHIBIT); + + if (IS_HASWELL(engine->i915)) + flags = MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW; + else + flags = MI_BATCH_NON_SECURE_I965; + + __gen6_emit_bb_start(ring_map_dw(ring, 2), + engine->wa_ctx.vma->node.start, flags); + + return vm; +} + +static void remap_l3_slice(struct intel_ring *ring, + struct intel_engine_cs *engine, + int slice) +{ + u32 *cs, *remap_info = engine->i915->l3_parity.remap_info[slice]; + int i; + + if (!remap_info) + return; + + /* + * Note: We do not worry about the concurrent register cacheline hang + * here because no other code should access these registers other than + * at initialization time. + */ + cs = ring_map_dw(ring, GEN7_L3LOG_SIZE / 4 * 2 + 2); + *cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4); + for (i = 0; i < GEN7_L3LOG_SIZE / 4; i++) { + *cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i)); + *cs++ = remap_info[i]; + } + *cs++ = MI_NOOP; + ring_advance(ring, cs); +} + +static void remap_l3(struct intel_ring *ring, + struct intel_engine_cs *engine, + struct intel_context *ce) +{ + struct i915_gem_context *ctx = + rcu_dereference_protected(ce->gem_context, true); + int bit, idx = -1; + + if (!ctx || !ctx->remap_slice) + return; + + do { + bit = ffs(ctx->remap_slice); + remap_l3_slice(ring, engine, idx += bit); + } while (ctx->remap_slice >>= bit); +} + static void switch_context(struct intel_ring *ring, struct i915_request *rq) { + struct intel_engine_cs *engine = rq->engine; + struct i915_address_space *cvm = current_vm(engine); + struct intel_context *ce = rq->context; + struct i915_address_space *vm; + + if (engine->wa_ctx.vma && ce != engine->kernel_context) { + if (engine->wa_ctx.vma->private != ce && + i915_mitigate_clear_residuals()) { + cvm = clear_residuals(ring, engine); + intel_context_put(engine->wa_ctx.vma->private); + engine->wa_ctx.vma->private = intel_context_get(ce); + } + } + + vm = vm_alias(ce->vm); + if (vm != cvm) + load_pd_dir(ring, engine, i915_vm_to_ppgtt(vm)); + + if (ce->state) { + u32 flags; + + GEM_BUG_ON(engine->id != RCS0); + + /* For resource streamer on HSW+ and power context elsewhere */ + BUILD_BUG_ON(HSW_MI_RS_SAVE_STATE_EN != MI_SAVE_EXT_STATE_EN); + BUILD_BUG_ON(HSW_MI_RS_RESTORE_STATE_EN != MI_RESTORE_EXT_STATE_EN); + + flags = MI_SAVE_EXT_STATE_EN | MI_MM_SPACE_GTT; + if (test_bit(CONTEXT_VALID_BIT, &ce->flags)) { + gen4_emit_invalidate_rcs(ring, engine); + flags |= MI_RESTORE_EXT_STATE_EN; + } else { + flags |= MI_RESTORE_INHIBIT; + } + + mi_set_context(ring, engine, ce, flags); + } + + remap_l3(ring, engine, ce); } static struct i915_request *ring_submit(struct i915_request *rq) @@ -218,10 +485,48 @@ copy_ports(struct i915_request **dst, struct i915_request **src, int count) WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ } +static inline void __write_tail(const struct intel_engine_cs *engine) +{ + ENGINE_WRITE(engine, RING_TAIL, engine->legacy.ring->tail); +} + +static void wa_write_tail(const struct intel_engine_cs *engine) +{ + const i915_reg_t psmi = RING_PSMI_CTL(engine->mmio_base); + struct intel_uncore *uncore = engine->uncore; + + intel_uncore_write_fw(uncore, psmi, + _MASKED_BIT_ENABLE(PSMI_SLEEP_MSG_DISABLE)); + + /* Clear the context id. Here be magic! */ + intel_uncore_write64_fw(uncore, RING_RNCID(engine->mmio_base), 0x0); + + /* Wait for the ring not to be idle, i.e. for it to wake up. */ + if (__intel_wait_for_register_fw(uncore, psmi, + PSMI_SLEEP_INDICATOR, 0, + 1000, 0, NULL)) + drm_err(&uncore->i915->drm, + "timed out waiting for %s to wake up\n", + engine->name); + + /* Now that the ring is fully powered up, update the tail */ + __write_tail(engine); + + /* + * Let the ring send IDLE messages to the GT again, + * and so let it sleep to conserve power when idle. + */ + intel_uncore_write_fw(uncore, psmi, + _MASKED_BIT_DISABLE(PSMI_SLEEP_MSG_DISABLE)); +} + static inline void write_tail(const struct intel_engine_cs *engine) { wmb(); /* paranoid flush of WCB before RING_TAIL write */ - ENGINE_WRITE(engine, RING_TAIL, engine->legacy.ring->tail); + if (!engine->fw_active) + __write_tail(engine); + else + wa_write_tail(engine); } static void dequeue(struct i915_sched *se, struct intel_engine_cs *engine) @@ -595,7 +900,14 @@ static int ring_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **unused) { - return 0; + struct i915_address_space *vm; + int err = 0; + + vm = vm_alias(ce->vm); + if (vm) + err = gen6_ppgtt_pin(i915_vm_to_ppgtt((vm)), ww); + + return err; } static int ring_context_pin(struct intel_context *ce, void *unused) @@ -603,12 +915,22 @@ static int ring_context_pin(struct intel_context *ce, void *unused) return 0; } +static void __context_unpin_ppgtt(struct intel_context *ce) +{ + struct i915_address_space *vm; + + vm = vm_alias(ce->vm); + if (vm) + gen6_ppgtt_unpin(i915_vm_to_ppgtt(vm)); +} + static void ring_context_unpin(struct intel_context *ce) { } static void ring_context_post_unpin(struct intel_context *ce) { + __context_unpin_ppgtt(ce); } static void ring_context_reset(struct intel_context *ce) @@ -667,12 +989,27 @@ static void ring_release(struct intel_engine_cs *engine) set_current_context(&engine->legacy.context, NULL); + if (engine->wa_ctx.vma) { + intel_context_put(engine->wa_ctx.vma->private); + i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0); + } + intel_ring_unpin(engine->legacy.ring); intel_ring_put(engine->legacy.ring); } static void setup_irq(struct intel_engine_cs *engine) { + if (INTEL_GEN(engine->i915) >= 6) { + engine->irq_enable = gen6_irq_enable; + engine->irq_disable = gen6_irq_disable; + } else if (INTEL_GEN(engine->i915) >= 5) { + engine->irq_enable = gen5_irq_enable; + engine->irq_disable = gen5_irq_disable; + } else { + engine->irq_enable = gen3_irq_enable; + engine->irq_disable = gen3_irq_disable; + } } static void setup_common(struct intel_engine_cs *engine) @@ -681,7 +1018,7 @@ static void setup_common(struct intel_engine_cs *engine) /* gen8+ are only supported with execlists */ GEM_BUG_ON(INTEL_GEN(i915) >= 8); - GEM_BUG_ON(INTEL_GEN(i915) < 8); + GEM_BUG_ON(INTEL_GEN(i915) < 4); setup_irq(engine); @@ -699,24 +1036,80 @@ static void setup_common(struct intel_engine_cs *engine) engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; + engine->emit_init_breadcrumb = gen4_emit_init_breadcrumb_xcs; + + if (INTEL_GEN(i915) >= 6) + engine->emit_bb_start = gen6_emit_bb_start; + else + engine->emit_bb_start = gen4_emit_bb_start; + + if (INTEL_GEN(i915) >= 7) + engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs; + else if (INTEL_GEN(i915) >= 6) + engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs; + else + engine->emit_fini_breadcrumb = gen4_emit_breadcrumb_xcs; + engine->set_default_submission = set_default_submission; } static void setup_rcs(struct intel_engine_cs *engine) { + struct drm_i915_private *i915 = engine->i915; + + if (HAS_L3_DPF(i915)) + engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT; + + engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT; + + if (INTEL_GEN(i915) >= 7) { + engine->emit_flush = gen7_emit_flush_rcs; + engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_rcs; + if (IS_HASWELL(i915)) + engine->emit_bb_start = hsw_emit_bb_start; + } else if (INTEL_GEN(i915) >= 6) { + engine->emit_flush = gen6_emit_flush_rcs; + engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_rcs; + } else if (INTEL_GEN(i915) >= 5) { + engine->emit_flush = gen4_emit_flush_rcs; + } else { + engine->emit_flush = gen4_emit_flush_rcs; + engine->irq_enable_mask = I915_USER_INTERRUPT; + } } static void setup_vcs(struct intel_engine_cs *engine) { + if (INTEL_GEN(engine->i915) >= 6) { + if (IS_GEN(engine->i915, 6)) + engine->fw_domain = FORCEWAKE_ALL; + engine->emit_flush = gen6_emit_flush_vcs; + engine->irq_enable_mask = GT_BSD_USER_INTERRUPT; + } else if (INTEL_GEN(engine->i915) >= 5) { + engine->emit_flush = gen4_emit_flush_vcs; + engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT; + } else { + engine->emit_flush = gen4_emit_flush_vcs; + engine->irq_enable_mask = I915_BSD_USER_INTERRUPT; + } } static void setup_bcs(struct intel_engine_cs *engine) { + GEM_BUG_ON(INTEL_GEN(engine->i915) < 6); + + engine->emit_flush = gen6_emit_flush_xcs; + engine->irq_enable_mask = GT_BLT_USER_INTERRUPT; } static void setup_vecs(struct intel_engine_cs *engine) { GEM_BUG_ON(!IS_HASWELL(engine->i915)); + + engine->emit_flush = gen6_emit_flush_xcs; + engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT; + engine->irq_enable = hsw_irq_enable_vecs; + engine->irq_disable = hsw_irq_disable_vecs; } static unsigned int global_ring_size(void) @@ -725,6 +1118,58 @@ static unsigned int global_ring_size(void) return roundup_pow_of_two(EXECLIST_MAX_PORTS * SZ_16K + SZ_4K); } +static int gen7_ctx_switch_bb_init(struct intel_engine_cs *engine) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int size; + int err; + + size = gen7_setup_clear_gpr_bb(engine, NULL /* probe size */); + if (size <= 0) + return size; + + size = ALIGN(size, PAGE_SIZE); + obj = i915_gem_object_create_internal(engine->i915, size); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + vma = i915_vma_instance(obj, engine->gt->vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_obj; + } + + vma->private = intel_context_create(engine); /* dummy residuals */ + if (IS_ERR(vma->private)) { + err = PTR_ERR(vma->private); + goto err_obj; + } + + err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH); + if (err) + goto err_private; + + err = i915_vma_sync(vma); + if (err) + goto err_unpin; + + size = gen7_setup_clear_gpr_bb(engine, vma); + if (err) + goto err_unpin; + + engine->wa_ctx.vma = vma; + return 0; + +err_unpin: + i915_vma_unpin(vma); +err_private: + intel_context_put(vma->private); +err_obj: + i915_gem_object_put(obj); + return err; +} + int intel_ring_scheduler_setup(struct intel_engine_cs *engine) { struct intel_ring *ring; @@ -769,12 +1214,20 @@ int intel_ring_scheduler_setup(struct intel_engine_cs *engine) GEM_BUG_ON(engine->legacy.ring); engine->legacy.ring = ring; + if (IS_GEN(engine->i915, 7) && engine->class == RENDER_CLASS) { + err = gen7_ctx_switch_bb_init(engine); + if (err) + goto err_ring_unpin; + } + engine->flags |= I915_ENGINE_SUPPORTS_STATS; /* Finally, take ownership and responsibility for cleanup! */ engine->release = ring_release; return 0; +err_ring_unpin: + intel_ring_unpin(ring); err_ring: intel_ring_put(ring); err: diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 224ad897af34..2f4584202e5d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2532,7 +2532,16 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN6_VERSYNC (RING_SYNC_1(VEBOX_RING_BASE)) #define GEN6_VEVSYNC (RING_SYNC_2(VEBOX_RING_BASE)) #define GEN6_NOSYNC INVALID_MMIO_REG + #define RING_PSMI_CTL(base) _MMIO((base) + 0x50) +#define PSMI_SLEEP_MSG_DISABLE REG_BIT(0) +#define PSMI_SLEEP_FLUSH_DISABLE REG_BIT(2) +#define PSMI_SLEEP_INDICATOR REG_BIT(3) +#define PSMI_GO_INDICATOR REG_BIT(4) +#define GEN12_PSMI_WAIT_FOR_EVENT_POWER_DOWN_DISABLE REG_BIT(7) +#define GEN8_PSMI_FF_DOP_CLOCK_GATE_DISABLE REG_BIT(10) +#define GEN8_PSMI_RC_SEMA_IDLE_MSG_DISABLE REG_BIT(12) + #define RING_MAX_IDLE(base) _MMIO((base) + 0x54) #define RING_HWS_PGA(base) _MMIO((base) + 0x80) #define RING_ID(base) _MMIO((base) + 0x8c) @@ -2542,6 +2551,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define RESET_CTL_READY_TO_RESET REG_BIT(1) #define RESET_CTL_REQUEST_RESET REG_BIT(0) +#define RING_RNCID(base) _MMIO((base) + 0x198) #define RING_SEMA_WAIT_POLL(base) _MMIO((base) + 0x24c) #define HSW_GTT_CACHE_EN _MMIO(0x4024) From patchwork Mon Feb 1 08:57:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97D6DC43381 for ; Mon, 1 Feb 2021 08:57:54 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2B8DE64E33 for ; Mon, 1 Feb 2021 08:57:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B8DE64E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7FF606E4B0; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5DAE06E48C for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757790-1500050 for multiple; Mon, 01 Feb 2021 08:57:25 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:14 +0000 Message-Id: <20210201085715.27435-56-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 56/57] drm/i915/gt: Enable ring scheduling for gen5-7 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Switch over from FIFO global submission to the priority-sorted topographical scheduler. At the cost of more busy work on the CPU to keep the GPU supplied with the next packet of requests, this allows us to reorder requests around submission stalls and so allow low latency under load while maintaining fairness between clients. The downside is that we enable interrupts on all requests (unlike with execlists where we have an interrupt for context switches). This means that instead of receiving an interrupt for when we are waitng for completion, we are processing them all the time, with noticeable overhead of cpu time absorbed by the interrupt handler. The effect is most pronounced on CPU-throughput limited renderers like uxa, where performance can be degraded by 20% in the worst case. Nevertheless, this is a pathological example of an obsolete userspace driver. (There are also cases where uxa performs better by 20%, which is an interesting quirk...) The glxgears-not-a-benchmark (cpu throughtput bound) is one such example of a performance hit, only affecting uxa. The expectation is that allowing request reordering will allow much smoother UX that greatly compensates for reduced throughput under high submission load (but low GPU load). This also enables the timer based RPS for better powersaving, with the exception of Valleyview whose PCU doesn't take kindly to our interference. References: 0f46832fab77 ("drm/i915: Mask USER interrupts on gen6 (until required)") Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 5 ++++- drivers/gpu/drm/i915/gt/intel_gt_types.h | 1 + drivers/gpu/drm/i915/gt/intel_rps.c | 6 ++---- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index d3f87dc4eda3..2246b5c308dc 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -94,7 +94,7 @@ static int live_nop_switch(void *arg) rq = i915_request_get(this); i915_request_add(this); } - if (i915_request_wait(rq, 0, HZ / 5) < 0) { + if (i915_request_wait(rq, 0, HZ) < 0) { pr_err("Failed to populated %d contexts\n", nctx); intel_gt_set_wedged(&i915->gt); i915_request_put(rq); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index f91c38124871..c8136ded5bbe 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -875,8 +875,11 @@ int intel_engines_init(struct intel_gt *gt) } else if (HAS_EXECLISTS(gt->i915)) { gt->submission_method = INTEL_SUBMISSION_ELSP; setup = intel_execlists_submission_setup; - } else { + } else if (INTEL_GEN(gt->i915) >= 5) { gt->submission_method = INTEL_SUBMISSION_RING; + setup = intel_ring_scheduler_setup; + } else { + gt->submission_method = INTEL_SUBMISSION_LEGACY; setup = intel_ring_submission_setup; } diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index 626af37c7790..125b40f62644 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -30,6 +30,7 @@ struct intel_engine_cs; struct intel_uncore; enum intel_submission_method { + INTEL_SUBMISSION_LEGACY, INTEL_SUBMISSION_RING, INTEL_SUBMISSION_ELSP, INTEL_SUBMISSION_GUC, diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 900c20a6d073..2c78d61e7ea9 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1081,9 +1081,7 @@ static bool gen6_rps_enable(struct intel_rps *rps) intel_uncore_write_fw(uncore, GEN6_RP_DOWN_TIMEOUT, 50000); intel_uncore_write_fw(uncore, GEN6_RP_IDLE_HYSTERSIS, 10); - rps->pm_events = (GEN6_PM_RP_UP_THRESHOLD | - GEN6_PM_RP_DOWN_THRESHOLD | - GEN6_PM_RP_DOWN_TIMEOUT); + rps->pm_events = GEN6_PM_RP_UP_THRESHOLD | GEN6_PM_RP_DOWN_THRESHOLD; return rps_reset(rps); } @@ -1391,7 +1389,7 @@ void intel_rps_enable(struct intel_rps *rps) GEM_BUG_ON(rps->efficient_freq < rps->min_freq); GEM_BUG_ON(rps->efficient_freq > rps->max_freq); - if (has_busy_stats(rps)) + if (has_busy_stats(rps) && !IS_VALLEYVIEW(i915)) intel_rps_set_timer(rps); else if (INTEL_GEN(i915) >= 6) intel_rps_set_interrupts(rps); From patchwork Mon Feb 1 08:57:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12058433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3644C433E0 for ; Mon, 1 Feb 2021 08:57:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7BF7664E33 for ; Mon, 1 Feb 2021 08:57:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BF7664E33 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9D1606E4E3; Mon, 1 Feb 2021 08:57:35 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5D4916E48B for ; Mon, 1 Feb 2021 08:57:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23757791-1500050 for multiple; Mon, 01 Feb 2021 08:57:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Feb 2021 08:57:15 +0000 Message-Id: <20210201085715.27435-57-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210201085715.27435-1-chris@chris-wilson.co.uk> References: <20210201085715.27435-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 57/57] drm/i915: Support secure dispatch on gen6/gen7 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Re-enable secure dispatch for gen6/gen7, primarily to workaround the command parser and overly zealous command validation on Haswell. For example this prevents making accurate measurements using a journal for store results from the GPU without CPU intervention. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0e4d7998be53..54063d65d330 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1666,7 +1666,7 @@ tgl_stepping_get(struct drm_i915_private *dev_priv) #define HAS_LLC(dev_priv) (INTEL_INFO(dev_priv)->has_llc) #define HAS_SNOOP(dev_priv) (INTEL_INFO(dev_priv)->has_snoop) #define HAS_EDRAM(dev_priv) ((dev_priv)->edram_size_mb) -#define HAS_SECURE_BATCHES(dev_priv) (INTEL_GEN(dev_priv) < 6) +#define HAS_SECURE_BATCHES(dev_priv) (INTEL_GEN(dev_priv) < 8) #define HAS_WT(dev_priv) HAS_EDRAM(dev_priv) #define HWS_NEEDS_PHYSICAL(dev_priv) (INTEL_INFO(dev_priv)->hws_needs_physical)