From patchwork Thu Dec 24 13:55:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B35DBC433E6 for ; Thu, 24 Dec 2020 13:55:54 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 03F6522287 for ; Thu, 24 Dec 2020 13:55:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 03F6522287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 62C3589C80; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 59CC089C46 for ; Thu, 24 Dec 2020 13:55:47 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423423-1500050 for ; Thu, 24 Dec 2020 13:55:43 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:36 +0000 Message-Id: <20201224135544.1713-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Intel-gfx] [CI 1/9] drm/i915/gt: Replace direct submit with direct call to tasklet X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala Reported-by: kernel test robot --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 35 +++-- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 +- .../drm/i915/gt/intel_execlists_submission.c | 140 +++++++----------- drivers/gpu/drm/i915/gt/intel_reset.c | 60 +++++--- drivers/gpu/drm/i915/gt/intel_reset.h | 2 + drivers/gpu/drm/i915/gt/selftest_context.c | 2 +- drivers/gpu/drm/i915/gt/selftest_execlists.c | 10 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 7 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 17 ++- drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +- drivers/gpu/drm/i915/i915_request.c | 12 +- drivers/gpu/drm/i915/i915_request.h | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 4 - drivers/gpu/drm/i915/selftests/i915_request.c | 6 +- drivers/gpu/drm/i915/selftests/igt_spinner.c | 3 + 16 files changed, 162 insertions(+), 150 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 8acb922b69f9..1847d3c2ea99 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1019,32 +1019,39 @@ static unsigned long stop_timeout(const struct intel_engine_cs *engine) return READ_ONCE(engine->props.stop_timeout_ms); } -int intel_engine_stop_cs(struct intel_engine_cs *engine) +static int __intel_engine_stop_cs(struct intel_engine_cs *engine, + int fast_timeout_us, + int slow_timeout_ms) { struct intel_uncore *uncore = engine->uncore; - const u32 base = engine->mmio_base; - const i915_reg_t mode = RING_MI_MODE(base); + const i915_reg_t mode = RING_MI_MODE(engine->mmio_base); int err; + intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING)); + err = __intel_wait_for_register_fw(engine->uncore, mode, + MODE_IDLE, MODE_IDLE, + fast_timeout_us, + slow_timeout_ms, + NULL); + + /* A final mmio read to let GPU writes be hopefully flushed to memory */ + intel_uncore_posting_read_fw(uncore, mode); + return err; +} + +int intel_engine_stop_cs(struct intel_engine_cs *engine) +{ + int err = 0; + if (INTEL_GEN(engine->i915) < 3) return -ENODEV; ENGINE_TRACE(engine, "\n"); - - intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING)); - - err = 0; - if (__intel_wait_for_register_fw(uncore, - mode, MODE_IDLE, MODE_IDLE, - 1000, stop_timeout(engine), - NULL)) { + if (__intel_engine_stop_cs(engine, 1000, stop_timeout(engine))) { ENGINE_TRACE(engine, "timed out on STOP_RING -> IDLE\n"); err = -ETIMEDOUT; } - /* A final mmio read to let GPU writes be hopefully flushed to memory */ - intel_uncore_posting_read_fw(uncore, mode); - return err; } diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 8b353bc8c100..2843db731b7d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -144,7 +144,7 @@ __queue_and_release_pm(struct i915_request *rq, list_add_tail(&tl->link, &timelines->active_list); /* Hand the request over to HW and so engine_retire() */ - __i915_request_queue(rq, NULL); + __i915_request_queue_bh(rq); /* Let new submissions commence (and maybe retire this timeline) */ __intel_wakeref_defer_park(&engine->wakeref); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 02ee1e736982..c28f4e190fe6 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -184,7 +184,8 @@ struct intel_engine_execlists { * Reserve the upper 16b for tracking internal errors. */ u32 error_interrupt; -#define ERROR_CSB BIT(31) +#define ERROR_CSB BIT(31) +#define ERROR_PREEMPT BIT(30) /** * @reset_ccid: Active CCID [EXECLISTS_STATUS_HI] at the time of reset diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 695a2d566d76..ac3e19da1a7d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -571,8 +571,7 @@ __execlists_schedule_in(struct i915_request *rq) return engine; } -static inline struct i915_request * -execlists_schedule_in(struct i915_request *rq, int idx) +static inline void execlists_schedule_in(struct i915_request *rq, int idx) { struct intel_context * const ce = rq->context; struct intel_engine_cs *old; @@ -589,7 +588,6 @@ execlists_schedule_in(struct i915_request *rq, int idx) } while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old))); GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); - return i915_request_get(rq); } static void kick_siblings(struct i915_request *rq, struct intel_context *ce) @@ -1257,8 +1255,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; - struct i915_request * const *active; - struct i915_request *last; + struct i915_request *last = *execlists->active; struct rb_node *rb; bool submit = false; @@ -1284,6 +1281,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ + spin_lock(&engine->active.lock); + for (rb = rb_first_cached(&execlists->virtual); rb; ) { struct virtual_engine *ve = rb_entry(rb, typeof(*ve), nodes[engine->id].rb); @@ -1311,10 +1310,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * the active context to interject the preemption request, * i.e. we will retrigger preemption following the ack in case * of trouble. - */ - active = READ_ONCE(execlists->active); - - /* + * * In theory we can skip over completed contexts that have not * yet been processed by events (as those events are in flight): * @@ -1326,7 +1322,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * completed and barf. */ - if ((last = *active)) { + if (last) { if (i915_request_completed(last)) { goto check_secondary; } else if (need_preempt(engine, last, rb)) { @@ -1399,6 +1395,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * Even if ELSP[1] is occupied and not worthy * of timeslices, our queue might be. */ + spin_unlock(&engine->active.lock); start_timeslice(engine, queue_prio(execlists)); return; } @@ -1434,6 +1431,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last && !can_merge_rq(last, rq)) { spin_unlock(&ve->base.active.lock); + spin_unlock(&engine->active.lock); start_timeslice(engine, rq_prio(rq)); return; /* leave this for another sibling */ } @@ -1551,8 +1549,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (__i915_request_submit(rq)) { if (!merge) { - *port = execlists_schedule_in(last, port - execlists->pending); - port++; + *port++ = i915_request_get(last); last = NULL; } @@ -1571,8 +1568,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine) rb_erase_cached(&p->node, &execlists->queue); i915_priolist_free(p); } - done: + *port++ = i915_request_get(last); + /* * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. * @@ -1590,36 +1588,45 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * interrupt for secondary ports). */ execlists->queue_priority_hint = queue_prio(execlists); + spin_unlock(&engine->active.lock); if (submit) { - *port = execlists_schedule_in(last, port - execlists->pending); - execlists->switch_priority_hint = - switch_prio(engine, *execlists->pending); - /* * Skip if we ended up with exactly the same set of requests, * e.g. trying to timeslice a pair of ordered contexts */ - if (!memcmp(active, execlists->pending, - (port - execlists->pending + 1) * sizeof(*port))) { - do - execlists_schedule_out(fetch_and_zero(port)); - while (port-- != execlists->pending); - + if (!memcmp(execlists->active, + execlists->pending, + (port - execlists->pending) * sizeof(*port))) goto skip_submit; - } - clear_ports(port + 1, last_port - port); + + *port = NULL; + while (port-- != execlists->pending) + execlists_schedule_in(*port, port - execlists->pending); + + execlists->switch_priority_hint = + switch_prio(engine, *execlists->pending); WRITE_ONCE(execlists->yield, -1); - set_preempt_timeout(engine, *active); + set_preempt_timeout(engine, *execlists->active); execlists_submit_ports(engine); } else { start_timeslice(engine, execlists->queue_priority_hint); skip_submit: ring_set_paused(engine, 0); + while (port-- != execlists->pending) + i915_request_put(*port); + *execlists->pending = NULL; } } +static void execlists_dequeue_irq(struct intel_engine_cs *engine) +{ + local_irq_disable(); /* Suspend interrupts across request submission */ + execlists_dequeue(engine); + local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ +} + static void cancel_port_requests(struct intel_engine_execlists * const execlists) { @@ -1962,16 +1969,6 @@ static void process_csb(struct intel_engine_cs *engine) invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); } -static void __execlists_submission_tasklet(struct intel_engine_cs *const engine) -{ - lockdep_assert_held(&engine->active.lock); - if (!READ_ONCE(engine->execlists.pending[0])) { - rcu_read_lock(); /* protect peeking at execlists->active */ - execlists_dequeue(engine); - rcu_read_unlock(); - } -} - static void __execlists_hold(struct i915_request *rq) { LIST_HEAD(list); @@ -2363,7 +2360,7 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) if (!timer_expired(t)) return false; - return READ_ONCE(engine->execlists.pending[0]); + return engine->execlists.pending[0]; } /* @@ -2373,10 +2370,14 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) static void execlists_submission_tasklet(unsigned long data) { struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; - bool timeout = preempt_timeout(engine); process_csb(engine); + if (unlikely(preempt_timeout(engine))) { + cancel_timer(&engine->execlists.preempt); + engine->execlists.error_interrupt |= ERROR_PREEMPT; + } + if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) { const char *msg; @@ -2385,6 +2386,8 @@ static void execlists_submission_tasklet(unsigned long data) msg = "CS error"; /* thrown by a user payload */ else if (engine->execlists.error_interrupt & ERROR_CSB) msg = "invalid CSB event"; + else if (engine->execlists.error_interrupt & ERROR_PREEMPT) + msg = "preemption time out"; else msg = "internal error"; @@ -2392,19 +2395,8 @@ static void execlists_submission_tasklet(unsigned long data) execlists_reset(engine, msg); } - if (!READ_ONCE(engine->execlists.pending[0]) || timeout) { - unsigned long flags; - - spin_lock_irqsave(&engine->active.lock, flags); - __execlists_submission_tasklet(engine); - spin_unlock_irqrestore(&engine->active.lock, flags); - - /* Recheck after serialising with direct-submission */ - if (unlikely(timeout && preempt_timeout(engine))) { - cancel_timer(&engine->execlists.preempt); - execlists_reset(engine, "preemption time out"); - } - } + if (!engine->execlists.pending[0]) + execlists_dequeue_irq(engine); } static void __execlists_kick(struct intel_engine_execlists *execlists) @@ -2435,26 +2427,16 @@ static void queue_request(struct intel_engine_cs *engine, set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } -static void __submit_queue_imm(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists * const execlists = &engine->execlists; - - if (reset_in_progress(execlists)) - return; /* defer until we restart the engine following reset */ - - __execlists_submission_tasklet(engine); -} - -static void submit_queue(struct intel_engine_cs *engine, +static bool submit_queue(struct intel_engine_cs *engine, const struct i915_request *rq) { struct intel_engine_execlists *execlists = &engine->execlists; if (rq_prio(rq) <= execlists->queue_priority_hint) - return; + return false; execlists->queue_priority_hint = rq_prio(rq); - __submit_queue_imm(engine); + return true; } static bool ancestor_on_hold(const struct intel_engine_cs *engine, @@ -2464,25 +2446,11 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine, return !list_empty(&engine->active.hold) && hold_request(rq); } -static void flush_csb(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists *el = &engine->execlists; - - if (READ_ONCE(el->pending[0]) && tasklet_trylock(&el->tasklet)) { - if (!reset_in_progress(el)) - process_csb(engine); - tasklet_unlock(&el->tasklet); - } -} - static void execlists_submit_request(struct i915_request *request) { struct intel_engine_cs *engine = request->engine; unsigned long flags; - /* Hopefully we clear execlists->pending[] to let us through */ - flush_csb(engine); - /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&engine->active.lock, flags); @@ -2496,7 +2464,8 @@ static void execlists_submit_request(struct i915_request *request) GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); GEM_BUG_ON(list_empty(&request->sched.link)); - submit_queue(engine, request); + if (submit_queue(engine, request)) + __execlists_kick(&engine->execlists); } spin_unlock_irqrestore(&engine->active.lock, flags); @@ -2837,7 +2806,6 @@ static int execlists_resume(struct intel_engine_cs *engine) static void execlists_reset_prepare(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; - unsigned long flags; ENGINE_TRACE(engine, "depth<-%d\n", atomic_read(&execlists->tasklet.count)); @@ -2854,10 +2822,6 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) __tasklet_disable_sync_once(&execlists->tasklet); GEM_BUG_ON(!reset_in_progress(execlists)); - /* And flush any current direct submission. */ - spin_lock_irqsave(&engine->active.lock, flags); - spin_unlock_irqrestore(&engine->active.lock, flags); - /* * We stop engines, otherwise we might get failed reset and a * dead gpu (on elk). Also as modern gpu as kbl can suffer @@ -3082,12 +3046,12 @@ static void execlists_reset_finish(struct intel_engine_cs *engine) * to sleep before we restart and reload a context. */ GEM_BUG_ON(!reset_in_progress(execlists)); - if (!RB_EMPTY_ROOT(&execlists->queue.rb_root)) - execlists->tasklet.func(execlists->tasklet.data); + GEM_BUG_ON(engine->execlists.pending[0]); + /* And kick in case we missed a new request submission. */ if (__tasklet_enable(&execlists->tasklet)) - /* And kick in case we missed a new request submission. */ - tasklet_hi_schedule(&execlists->tasklet); + __execlists_kick(execlists); + ENGINE_TRACE(engine, "depth->%d\n", atomic_read(&execlists->tasklet.count)); } diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 000d63588e9e..b85b6f3dcd60 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -40,20 +40,19 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr) intel_uncore_rmw_fw(uncore, reg, clr, 0); } -static void engine_skip_context(struct i915_request *rq) +static void skip_context(struct i915_request *rq) { - struct intel_engine_cs *engine = rq->engine; struct intel_context *hung_ctx = rq->context; - if (!i915_request_is_active(rq)) - return; + list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) { + if (!i915_request_is_active(rq)) + return; - lockdep_assert_held(&engine->active.lock); - list_for_each_entry_continue(rq, &engine->active.requests, sched.link) if (rq->context == hung_ctx) { i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq); } + } } static void client_mark_guilty(struct i915_gem_context *ctx, bool banned) @@ -160,7 +159,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq); if (mark_guilty(rq)) - engine_skip_context(rq); + skip_context(rq); } else { i915_request_set_error_once(rq, -EAGAIN); mark_innocent(rq); @@ -753,8 +752,10 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) if (err) return err; + local_bh_disable(); for_each_engine(engine, gt, id) __intel_engine_reset(engine, stalled_mask & engine->mask); + local_bh_enable(); intel_ggtt_restore_fences(gt->ggtt); @@ -832,9 +833,11 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) set_bit(I915_WEDGED, >->reset.flags); /* Mark all executing requests as skipped */ + local_bh_disable(); for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine); + local_bh_enable(); reset_finish(gt, awake); @@ -1109,20 +1112,7 @@ static inline int intel_gt_reset_engine(struct intel_engine_cs *engine) return __intel_gt_reset(engine->gt, engine->mask); } -/** - * intel_engine_reset - reset GPU engine to recover from a hang - * @engine: engine to reset - * @msg: reason for GPU reset; or NULL for no drm_notice() - * - * Reset a specific GPU engine. Useful if a hang is detected. - * Returns zero on successful reset or otherwise an error code. - * - * Procedure is: - * - identifies the request that caused the hang and it is dropped - * - reset engine (which will force the engine to idle) - * - re-init/configure engine - */ -int intel_engine_reset(struct intel_engine_cs *engine, const char *msg) +int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) { struct intel_gt *gt = engine->gt; bool uses_guc = intel_engine_in_guc_submission_mode(engine); @@ -1172,6 +1162,30 @@ int intel_engine_reset(struct intel_engine_cs *engine, const char *msg) return ret; } +/** + * intel_engine_reset - reset GPU engine to recover from a hang + * @engine: engine to reset + * @msg: reason for GPU reset; or NULL for no drm_notice() + * + * Reset a specific GPU engine. Useful if a hang is detected. + * Returns zero on successful reset or otherwise an error code. + * + * Procedure is: + * - identifies the request that caused the hang and it is dropped + * - reset engine (which will force the engine to idle) + * - re-init/configure engine + */ +int intel_engine_reset(struct intel_engine_cs *engine, const char *msg) +{ + int err; + + local_bh_disable(); + err = __intel_engine_reset_bh(engine, msg); + local_bh_enable(); + + return err; +} + static void intel_gt_reset_global(struct intel_gt *gt, u32 engine_mask, const char *reason) @@ -1258,18 +1272,20 @@ void intel_gt_handle_error(struct intel_gt *gt, * single reset fails. */ if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { + local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)) continue; - if (intel_engine_reset(engine, msg) == 0) + if (__intel_engine_reset_bh(engine, msg) == 0) engine_mask &= ~engine->mask; clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, >->reset.flags); } + local_bh_enable(); } if (!engine_mask) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h index a0eec7c11c0c..7dbf5cc8a333 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.h +++ b/drivers/gpu/drm/i915/gt/intel_reset.h @@ -34,6 +34,8 @@ void intel_gt_reset(struct intel_gt *gt, const char *reason); int intel_engine_reset(struct intel_engine_cs *engine, const char *reason); +int __intel_engine_reset_bh(struct intel_engine_cs *engine, + const char *reason); void __i915_request_reset(struct i915_request *rq, bool guilty); diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 1f4020e906a8..db738d400168 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -25,7 +25,7 @@ static int request_sync(struct i915_request *rq) /* Opencode i915_request_add() so we can keep the timeline locked. */ __i915_request_commit(rq); rq->sched.attr.priority = I915_PRIORITY_BARRIER; - __i915_request_queue(rq, NULL); + __i915_request_queue_bh(rq); timeout = i915_request_wait(rq, 0, HZ / 10); if (timeout < 0) diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index fa51cf6d840a..47b12ce4b132 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -599,8 +599,10 @@ static int live_hold_reset(void *arg) /* We have our request executing, now remove it and reset */ + local_bh_disable(); if (test_and_set_bit(I915_RESET_ENGINE + id, >->reset.flags)) { + local_bh_enable(); intel_gt_set_wedged(gt); err = -EBUSY; goto out; @@ -614,12 +616,13 @@ static int live_hold_reset(void *arg) execlists_hold(engine, rq); GEM_BUG_ON(!i915_request_on_hold(rq)); - intel_engine_reset(engine, NULL); + __intel_engine_reset_bh(engine, NULL); GEM_BUG_ON(rq->fence.error != -EIO); tasklet_enable(&engine->execlists.tasklet); clear_and_wake_up_bit(I915_RESET_ENGINE + id, >->reset.flags); + local_bh_enable(); /* Check that we do not resubmit the held request */ if (!i915_request_wait(rq, 0, HZ / 5)) { @@ -4546,8 +4549,10 @@ static int reset_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(engine == ve->engine); /* Take ownership of the reset and tasklet */ + local_bh_disable(); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)) { + local_bh_enable(); intel_gt_set_wedged(gt); err = -EBUSY; goto out_heartbeat; @@ -4567,12 +4572,13 @@ static int reset_virtual_engine(struct intel_gt *gt, execlists_hold(engine, rq); GEM_BUG_ON(!i915_request_on_hold(rq)); - intel_engine_reset(engine, NULL); + __intel_engine_reset_bh(engine, NULL); GEM_BUG_ON(rq->fence.error != -EIO); /* Release our grasp on the engine, letting CS flow again */ tasklet_enable(&engine->execlists.tasklet); clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, >->reset.flags); + local_bh_enable(); /* Check that we do not resubmit the held request */ i915_request_get(rq); diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index fb5ebf930ab2..c28d1fcad673 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1576,12 +1576,17 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine, engine->name, mode, p->name); tasklet_disable(t); + if (strcmp(p->name, "softirq")) + local_bh_disable(); p->critical_section_begin(); - err = intel_engine_reset(engine, NULL); + err = __intel_engine_reset_bh(engine, NULL); p->critical_section_end(); + if (strcmp(p->name, "softirq")) + local_bh_enable(); tasklet_enable(t); + tasklet_hi_schedule(t); if (err) pr_err("i915_reset_engine(%s:%s) failed under %s\n", diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index d55421f6a250..ba6c2be5c8ff 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1607,16 +1607,17 @@ static void garbage_reset(struct intel_engine_cs *engine, const unsigned int bit = I915_RESET_ENGINE + engine->id; unsigned long *lock = &engine->gt->reset.flags; - if (test_and_set_bit(bit, lock)) - return; + local_bh_disable(); + if (!test_and_set_bit(bit, lock)) { + tasklet_disable(&engine->execlists.tasklet); - tasklet_disable(&engine->execlists.tasklet); + if (!rq->fence.error) + __intel_engine_reset_bh(engine, NULL); - if (!rq->fence.error) - intel_engine_reset(engine, NULL); - - tasklet_enable(&engine->execlists.tasklet); - clear_and_wake_up_bit(bit, lock); + tasklet_enable(&engine->execlists.tasklet); + clear_and_wake_up_bit(bit, lock); + } + local_bh_enable(); } static struct i915_request *garbage(struct intel_context *ce, diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index e4645c8bb00a..5ec8d4e9983f 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -327,11 +327,16 @@ static int igt_atomic_engine_reset(void *arg) for (p = igt_atomic_phases; p->name; p++) { GEM_TRACE("intel_engine_reset(%s) under %s\n", engine->name, p->name); + if (strcmp(p->name, "softirq")) + local_bh_disable(); p->critical_section_begin(); - err = intel_engine_reset(engine, NULL); + err = __intel_engine_reset_bh(engine, NULL); p->critical_section_end(); + if (strcmp(p->name, "softirq")) + local_bh_enable(); + if (err) { pr_err("intel_engine_reset(%s) failed under %s\n", engine->name, p->name); @@ -341,6 +346,7 @@ static int igt_atomic_engine_reset(void *arg) intel_engine_pm_put(engine); tasklet_enable(&engine->execlists.tasklet); + tasklet_hi_schedule(&engine->execlists.tasklet); if (err) break; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 45744c3ef7c4..6578faf6eed8 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1584,6 +1584,12 @@ struct i915_request *__i915_request_commit(struct i915_request *rq) return __i915_request_add_to_timeline(rq); } +void __i915_request_queue_bh(struct i915_request *rq) +{ + i915_sw_fence_commit(&rq->semaphore); + i915_sw_fence_commit(&rq->submit); +} + void __i915_request_queue(struct i915_request *rq, const struct i915_sched_attr *attr) { @@ -1600,8 +1606,10 @@ void __i915_request_queue(struct i915_request *rq, */ if (attr && rq->engine->schedule) rq->engine->schedule(rq, attr); - i915_sw_fence_commit(&rq->semaphore); - i915_sw_fence_commit(&rq->submit); + + local_bh_disable(); + __i915_request_queue_bh(rq); + local_bh_enable(); /* kick tasklets */ } void i915_request_add(struct i915_request *rq) diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 7c4453e60323..a8c413203f72 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -315,6 +315,7 @@ void __i915_request_skip(struct i915_request *rq); struct i915_request *__i915_request_commit(struct i915_request *request); void __i915_request_queue(struct i915_request *rq, const struct i915_sched_attr *attr); +void __i915_request_queue_bh(struct i915_request *rq); bool i915_request_retire(struct i915_request *rq); void i915_request_retire_upto(struct i915_request *rq); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index b9cf9931ebd7..318e359bf5c3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -458,14 +458,10 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, if (!dep) return -ENOMEM; - local_bh_disable(); - if (!__i915_sched_node_add_dependency(node, signal, dep, flags | I915_DEPENDENCY_ALLOC)) i915_dependency_free(dep); - local_bh_enable(); /* kick submission tasklet */ - return 0; } diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index ddf76069066e..d2a678a2497e 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1933,9 +1933,7 @@ static int measure_inter_request(struct intel_context *ce) intel_ring_advance(rq, cs); i915_request_add(rq); } - local_bh_disable(); i915_sw_fence_commit(submit); - local_bh_enable(); intel_engine_flush_submission(ce->engine); heap_fence_put(submit); @@ -2221,11 +2219,9 @@ static int measure_completion(struct intel_context *ce) intel_ring_advance(rq, cs); dma_fence_add_callback(&rq->fence, &cb.base, signal_cb); - - local_bh_disable(); i915_request_add(rq); - local_bh_enable(); + intel_engine_flush_submission(ce->engine); if (wait_for(READ_ONCE(sema[i]) == -1, 50)) { err = -EIO; goto err; diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index 1216d919185e..83f6e5f31fb3 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -220,6 +220,9 @@ void igt_spinner_fini(struct igt_spinner *spin) bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq) { + if (i915_request_is_ready(rq)) + intel_engine_flush_submission(rq->engine); + return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq), rq->fence.seqno), 100) && From patchwork Thu Dec 24 13:55:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19FCFC433E9 for ; Thu, 24 Dec 2020 13:56:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C3D9022287 for ; Thu, 24 Dec 2020 13:56:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3D9022287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7098A89CCE; Thu, 24 Dec 2020 13:55:59 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id E23D089C59 for ; Thu, 24 Dec 2020 13:55:51 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423424-1500050 for ; Thu, 24 Dec 2020 13:55:43 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:37 +0000 Message-Id: <20201224135544.1713-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 2/9] drm/i915/gt: Use virtual_engine during execlists_dequeue X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Rather than going back and forth between the rb_node entry and the virtual_engine type, store the ve local and reuse it. As the container_of conversion from rb_node to virtual_engine requires a variable offset, performing that conversion just once shaves off a bit of code. v2: Keep a single virtual engine lookup, for typical use. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Matthew Auld --- .../drm/i915/gt/intel_execlists_submission.c | 239 ++++++++---------- 1 file changed, 105 insertions(+), 134 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index ac3e19da1a7d..ddfd6b271b3b 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -293,9 +293,15 @@ static int queue_prio(const struct intel_engine_execlists *execlists) return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used); } +static int virtual_prio(const struct intel_engine_execlists *el) +{ + struct rb_node *rb = rb_first_cached(&el->virtual); + + return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; +} + static inline bool need_preempt(const struct intel_engine_cs *engine, - const struct i915_request *rq, - struct rb_node *rb) + const struct i915_request *rq) { int last_prio; @@ -332,25 +338,6 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, rq_prio(list_next_entry(rq, sched.link)) > last_prio) return true; - if (rb) { - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - bool preempt = false; - - if (engine == ve->siblings[0]) { /* only preempt one sibling */ - struct i915_request *next; - - rcu_read_lock(); - next = READ_ONCE(ve->request); - if (next) - preempt = rq_prio(next) > last_prio; - rcu_read_unlock(); - } - - if (preempt) - return preempt; - } - /* * If the inflight context did not trigger the preemption, then maybe * it was the set of queued requests? Pick the highest priority in @@ -361,7 +348,8 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same * context, it's priority would not exceed ELSP[0] aka last_prio. */ - return queue_prio(&engine->execlists) > last_prio; + return max(virtual_prio(&engine->execlists), + queue_prio(&engine->execlists)) > last_prio; } __maybe_unused static inline bool @@ -997,6 +985,35 @@ static bool virtual_matches(const struct virtual_engine *ve, return true; } +static struct virtual_engine * +first_virtual_engine(struct intel_engine_cs *engine) +{ + struct intel_engine_execlists *el = &engine->execlists; + struct rb_node *rb = rb_first_cached(&el->virtual); + + while (rb) { + struct virtual_engine *ve = + rb_entry(rb, typeof(*ve), nodes[engine->id].rb); + struct i915_request *rq = READ_ONCE(ve->request); + + /* lazily cleanup after another engine handled rq */ + if (!rq) { + rb_erase_cached(rb, &el->virtual); + RB_CLEAR_NODE(rb); + rb = rb_first_cached(&el->virtual); + continue; + } + + if (!virtual_matches(ve, rq, engine)) { + rb = rb_next(rb); + continue; + } + return ve; + } + + return NULL; +} + static void virtual_xfer_context(struct virtual_engine *ve, struct intel_engine_cs *engine) { @@ -1084,32 +1101,15 @@ static void defer_active(struct intel_engine_cs *engine) static bool need_timeslice(const struct intel_engine_cs *engine, - const struct i915_request *rq, - const struct rb_node *rb) + const struct i915_request *rq) { int hint; if (!intel_engine_has_timeslices(engine)) return false; - hint = engine->execlists.queue_priority_hint; - - if (rb) { - const struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - const struct intel_engine_cs *inflight = - intel_context_inflight(&ve->context); - - if (!inflight || inflight == engine) { - struct i915_request *next; - - rcu_read_lock(); - next = READ_ONCE(ve->request); - if (next) - hint = max(hint, rq_prio(next)); - rcu_read_unlock(); - } - } + hint = max(engine->execlists.queue_priority_hint, + virtual_prio(&engine->execlists)); if (!list_is_last(&rq->sched.link, &engine->active.requests)) hint = max(hint, rq_prio(list_next_entry(rq, sched.link))); @@ -1256,6 +1256,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; struct i915_request *last = *execlists->active; + struct virtual_engine *ve; struct rb_node *rb; bool submit = false; @@ -1283,26 +1284,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) spin_lock(&engine->active.lock); - for (rb = rb_first_cached(&execlists->virtual); rb; ) { - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - struct i915_request *rq = READ_ONCE(ve->request); - - if (!rq) { /* lazily cleanup after another engine handled rq */ - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); - rb = rb_first_cached(&execlists->virtual); - continue; - } - - if (!virtual_matches(ve, rq, engine)) { - rb = rb_next(rb); - continue; - } - - break; - } - /* * If the queue is higher priority than the last * request in the currently active context, submit afresh. @@ -1325,7 +1306,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last) { if (i915_request_completed(last)) { goto check_secondary; - } else if (need_preempt(engine, last, rb)) { + } else if (need_preempt(engine, last)) { ENGINE_TRACE(engine, "preempting last=%llx:%lld, prio=%d, hint=%d\n", last->fence.context, @@ -1351,7 +1332,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) __unwind_incomplete_requests(engine); last = NULL; - } else if (need_timeslice(engine, last, rb) && + } else if (need_timeslice(engine, last) && timeslice_expired(execlists, last)) { ENGINE_TRACE(engine, "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", @@ -1402,96 +1383,86 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } } - while (rb) { /* XXX virtual is always taking precedence */ - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); + /* XXX virtual is always taking precedence */ + while ((ve = first_virtual_engine(engine))) { struct i915_request *rq; spin_lock(&ve->base.active.lock); rq = ve->request; - if (unlikely(!rq)) { /* lost the race to a sibling */ - spin_unlock(&ve->base.active.lock); - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); - rb = rb_first_cached(&execlists->virtual); - continue; - } + if (unlikely(!rq)) /* lost the race to a sibling */ + goto unlock; - GEM_BUG_ON(rq != ve->request); GEM_BUG_ON(rq->engine != &ve->base); GEM_BUG_ON(rq->context != &ve->context); - if (rq_prio(rq) >= queue_prio(execlists)) { - if (!virtual_matches(ve, rq, engine)) { - spin_unlock(&ve->base.active.lock); - rb = rb_next(rb); - continue; - } + if (unlikely(rq_prio(rq) < queue_prio(execlists))) { + spin_unlock(&ve->base.active.lock); + break; + } - if (last && !can_merge_rq(last, rq)) { - spin_unlock(&ve->base.active.lock); - spin_unlock(&engine->active.lock); - start_timeslice(engine, rq_prio(rq)); - return; /* leave this for another sibling */ - } + GEM_BUG_ON(!virtual_matches(ve, rq, engine)); - ENGINE_TRACE(engine, - "virtual rq=%llx:%lld%s, new engine? %s\n", - rq->fence.context, - rq->fence.seqno, - i915_request_completed(rq) ? "!" : - i915_request_started(rq) ? "*" : - "", - yesno(engine != ve->siblings[0])); - - WRITE_ONCE(ve->request, NULL); - WRITE_ONCE(ve->base.execlists.queue_priority_hint, - INT_MIN); - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); + if (last && !can_merge_rq(last, rq)) { + spin_unlock(&ve->base.active.lock); + spin_unlock(&engine->active.lock); + start_timeslice(engine, rq_prio(rq)); + return; /* leave this for another sibling */ + } - GEM_BUG_ON(!(rq->execution_mask & engine->mask)); - WRITE_ONCE(rq->engine, engine); + ENGINE_TRACE(engine, + "virtual rq=%llx:%lld%s, new engine? %s\n", + rq->fence.context, + rq->fence.seqno, + i915_request_completed(rq) ? "!" : + i915_request_started(rq) ? "*" : + "", + yesno(engine != ve->siblings[0])); - if (__i915_request_submit(rq)) { - /* - * Only after we confirm that we will submit - * this request (i.e. it has not already - * completed), do we want to update the context. - * - * This serves two purposes. It avoids - * unnecessary work if we are resubmitting an - * already completed request after timeslicing. - * But more importantly, it prevents us altering - * ve->siblings[] on an idle context, where - * we may be using ve->siblings[] in - * virtual_context_enter / virtual_context_exit. - */ - virtual_xfer_context(ve, engine); - GEM_BUG_ON(ve->siblings[0] != engine); + WRITE_ONCE(ve->request, NULL); + WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN); - submit = true; - last = rq; - } - i915_request_put(rq); + rb = &ve->nodes[engine->id].rb; + rb_erase_cached(rb, &execlists->virtual); + RB_CLEAR_NODE(rb); + + GEM_BUG_ON(!(rq->execution_mask & engine->mask)); + WRITE_ONCE(rq->engine, engine); + if (__i915_request_submit(rq)) { /* - * Hmm, we have a bunch of virtual engine requests, - * but the first one was already completed (thanks - * preempt-to-busy!). Keep looking at the veng queue - * until we have no more relevant requests (i.e. - * the normal submit queue has higher priority). + * Only after we confirm that we will submit + * this request (i.e. it has not already + * completed), do we want to update the context. + * + * This serves two purposes. It avoids + * unnecessary work if we are resubmitting an + * already completed request after timeslicing. + * But more importantly, it prevents us altering + * ve->siblings[] on an idle context, where + * we may be using ve->siblings[] in + * virtual_context_enter / virtual_context_exit. */ - if (!submit) { - spin_unlock(&ve->base.active.lock); - rb = rb_first_cached(&execlists->virtual); - continue; - } + virtual_xfer_context(ve, engine); + GEM_BUG_ON(ve->siblings[0] != engine); + + submit = true; + last = rq; } + i915_request_put(rq); +unlock: spin_unlock(&ve->base.active.lock); - break; + + /* + * Hmm, we have a bunch of virtual engine requests, + * but the first one was already completed (thanks + * preempt-to-busy!). Keep looking at the veng queue + * until we have no more relevant requests (i.e. + * the normal submit queue has higher priority). + */ + if (submit) + break; } while ((rb = rb_first_cached(&execlists->queue))) { From patchwork Thu Dec 24 13:55:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2913C433DB for ; Thu, 24 Dec 2020 13:55:52 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C83022287 for ; Thu, 24 Dec 2020 13:55:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C83022287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 538E389C6C; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6064189C59 for ; Thu, 24 Dec 2020 13:55:47 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423425-1500050 for ; Thu, 24 Dec 2020 13:55:43 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:38 +0000 Message-Id: <20201224135544.1713-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 3/9] drm/i915/gt: Decouple inflight virtual engines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Once a virtual engine has been bound to a sibling, it will remain bound until we finally schedule out the last active request. We can not rebind the context to a new sibling while it is inflight as the context save will conflict, hence we wait. As we cannot then use any other sibliing while the context is inflight, only kick the bound sibling while it inflight and upon scheduling out the kick the rest (so that we can swap engines on timeslicing if the previously bound engine becomes oversubscribed). Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- .../drm/i915/gt/intel_execlists_submission.c | 28 ++++++++----------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index ddfd6b271b3b..fbd0572ed834 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -581,9 +581,8 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); - struct i915_request *next = READ_ONCE(ve->request); - if (next == rq || (next && next->execution_mask & ~rq->execution_mask)) + if (READ_ONCE(ve->request)) tasklet_hi_schedule(&ve->base.execlists.tasklet); } @@ -997,17 +996,13 @@ first_virtual_engine(struct intel_engine_cs *engine) struct i915_request *rq = READ_ONCE(ve->request); /* lazily cleanup after another engine handled rq */ - if (!rq) { + if (!rq || !virtual_matches(ve, rq, engine)) { rb_erase_cached(rb, &el->virtual); RB_CLEAR_NODE(rb); rb = rb_first_cached(&el->virtual); continue; } - if (!virtual_matches(ve, rq, engine)) { - rb = rb_next(rb); - continue; - } return ve; } @@ -3435,7 +3430,6 @@ static void virtual_submission_tasklet(unsigned long data) if (unlikely(!mask)) return; - local_irq_disable(); for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); struct ve_node * const node = &ve->nodes[sibling->id]; @@ -3445,20 +3439,19 @@ static void virtual_submission_tasklet(unsigned long data) if (!READ_ONCE(ve->request)) break; /* already handled by a sibling's tasklet */ + spin_lock_irq(&sibling->active.lock); + if (unlikely(!(mask & sibling->mask))) { if (!RB_EMPTY_NODE(&node->rb)) { - spin_lock(&sibling->active.lock); rb_erase_cached(&node->rb, &sibling->execlists.virtual); RB_CLEAR_NODE(&node->rb); - spin_unlock(&sibling->active.lock); } - continue; - } - spin_lock(&sibling->active.lock); + goto unlock_engine; + } - if (!RB_EMPTY_NODE(&node->rb)) { + if (unlikely(!RB_EMPTY_NODE(&node->rb))) { /* * Cheat and avoid rebalancing the tree if we can * reuse this node in situ. @@ -3498,9 +3491,12 @@ static void virtual_submission_tasklet(unsigned long data) if (first && prio > sibling->execlists.queue_priority_hint) tasklet_hi_schedule(&sibling->execlists.tasklet); - spin_unlock(&sibling->active.lock); +unlock_engine: + spin_unlock_irq(&sibling->active.lock); + + if (intel_context_inflight(&ve->context)) + break; } - local_irq_enable(); } static void virtual_submit_request(struct i915_request *rq) From patchwork Thu Dec 24 13:55:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51CE9C433DB for ; Thu, 24 Dec 2020 13:55:59 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0A41222287 for ; Thu, 24 Dec 2020 13:55:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A41222287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B20A889C63; Thu, 24 Dec 2020 13:55:58 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2764889C93 for ; Thu, 24 Dec 2020 13:55:51 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423426-1500050 for ; Thu, 24 Dec 2020 13:55:44 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:39 +0000 Message-Id: <20201224135544.1713-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 4/9] drm/i915/gt: Defer schedule_out until after the next dequeue X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Inside schedule_out, we do extra work upon idling the context, such as updating the runtime, kicking off retires, kicking virtual engines. However, if we are in a series of processing single requests per contexts, we may find ourselves scheduling out the context, only to immediately schedule it back in during dequeue. This is just extra work that we can avoid if we keep the context marked as inflight across the dequeue. This becomes more significant later on for minimising virtual engine misses. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gt/intel_context_types.h | 8 +- .../drm/i915/gt/intel_execlists_submission.c | 174 +++++++++++------- 2 files changed, 115 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 52fa9c132746..f7a0fb6f3a2e 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -58,8 +58,12 @@ struct intel_context { struct intel_engine_cs *engine; struct intel_engine_cs *inflight; -#define intel_context_inflight(ce) ptr_mask_bits(READ_ONCE((ce)->inflight), 2) -#define intel_context_inflight_count(ce) ptr_unmask_bits(READ_ONCE((ce)->inflight), 2) +#define __intel_context_inflight(engine) ptr_mask_bits(engine, 3) +#define __intel_context_inflight_count(engine) ptr_unmask_bits(engine, 3) +#define intel_context_inflight(ce) \ + __intel_context_inflight(READ_ONCE((ce)->inflight)) +#define intel_context_inflight_count(ce) \ + __intel_context_inflight_count(READ_ONCE((ce)->inflight)) struct i915_address_space *vm; struct i915_gem_context __rcu *gem_context; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index fbd0572ed834..b5f256be73dd 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -205,7 +205,7 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) static void mark_eio(struct i915_request *rq) { - if (i915_request_completed(rq)) + if (__i915_request_is_complete(rq)) return; GEM_BUG_ON(i915_request_signaled(rq)); @@ -221,7 +221,7 @@ active_request(const struct intel_timeline * const tl, struct i915_request *rq) rcu_read_lock(); list_for_each_entry_continue_reverse(rq, &tl->requests, link) { - if (i915_request_completed(rq)) + if (__i915_request_is_complete(rq)) break; active = rq; @@ -381,7 +381,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) list_for_each_entry_safe_reverse(rq, rn, &engine->active.requests, sched.link) { - if (i915_request_completed(rq)) { + if (__i915_request_is_complete(rq)) { list_del_init(&rq->sched.link); continue; } @@ -506,7 +506,7 @@ static void reset_active(struct i915_request *rq, rq->fence.context, rq->fence.seqno); /* On resubmission of the active request, payload will be scrubbed */ - if (i915_request_completed(rq)) + if (__i915_request_is_complete(rq)) head = rq->tail; else head = active_request(ce->timeline, rq)->head; @@ -607,7 +607,7 @@ __execlists_schedule_out(struct i915_request *rq, * idle and we want to re-enter powersaving. */ if (list_is_last_rcu(&rq->link, &ce->timeline->requests) && - i915_request_completed(rq)) + __i915_request_is_complete(rq)) intel_engine_add_retire(engine, ce->timeline); ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; @@ -728,8 +728,8 @@ dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq) prefix, rq->context->lrc.ccid, rq->fence.context, rq->fence.seqno, - i915_request_completed(rq) ? "!" : - i915_request_started(rq) ? "*" : + __i915_request_is_complete(rq) ? "!" : + __i915_request_has_started(rq) ? "*" : "", rq_prio(rq)); @@ -831,7 +831,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists, if (!spin_trylock_irqsave(&rq->lock, flags)) continue; - if (i915_request_completed(rq)) + if (__i915_request_is_complete(rq)) goto unlock; if (i915_active_is_idle(&ce->active) && @@ -944,7 +944,7 @@ static bool can_merge_rq(const struct i915_request *prev, * contexts, despite the best efforts of preempt-to-busy to confuse * us. */ - if (i915_request_completed(next)) + if (__i915_request_is_complete(next)) return true; if (unlikely((i915_request_flags(prev) ^ i915_request_flags(next)) & @@ -1065,8 +1065,8 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) /* No waiter should start before its signaler */ GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && - i915_request_started(w) && - !i915_request_completed(rq)); + __i915_request_has_started(w) && + !__i915_request_is_complete(rq)); GEM_BUG_ON(i915_request_is_active(w)); if (!i915_request_is_ready(w)) @@ -1159,7 +1159,7 @@ static unsigned long active_timeslice(const struct intel_engine_cs *engine) const struct intel_engine_execlists *execlists = &engine->execlists; const struct i915_request *rq = *execlists->active; - if (!rq || i915_request_completed(rq)) + if (!rq || __i915_request_is_complete(rq)) return 0; if (READ_ONCE(execlists->switch_priority_hint) < effective_prio(rq)) @@ -1232,19 +1232,6 @@ static void set_preempt_timeout(struct intel_engine_cs *engine, active_preempt_timeout(engine, rq)); } -static inline void clear_ports(struct i915_request **ports, int count) -{ - memset_p((void **)ports, NULL, count); -} - -static inline void -copy_ports(struct i915_request **dst, struct i915_request **src, int count) -{ - /* A memcpy_p() would be very useful here! */ - while (count--) - WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ -} - static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -1299,7 +1286,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) */ if (last) { - if (i915_request_completed(last)) { + if (__i915_request_is_complete(last)) { goto check_secondary; } else if (need_preempt(engine, last)) { ENGINE_TRACE(engine, @@ -1409,8 +1396,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) "virtual rq=%llx:%lld%s, new engine? %s\n", rq->fence.context, rq->fence.seqno, - i915_request_completed(rq) ? "!" : - i915_request_started(rq) ? "*" : + __i915_request_is_complete(rq) ? "!" : + __i915_request_has_started(rq) ? "*" : "", yesno(engine != ve->siblings[0])); @@ -1593,18 +1580,32 @@ static void execlists_dequeue_irq(struct intel_engine_cs *engine) local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ } -static void -cancel_port_requests(struct intel_engine_execlists * const execlists) +static inline void clear_ports(struct i915_request **ports, int count) +{ + memset_p((void **)ports, NULL, count); +} + +static inline void +copy_ports(struct i915_request **dst, struct i915_request **src, int count) +{ + /* A memcpy_p() would be very useful here! */ + while (count--) + WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ +} + +static struct i915_request ** +cancel_port_requests(struct intel_engine_execlists * const execlists, + struct i915_request **inactive) { struct i915_request * const *port; for (port = execlists->pending; *port; port++) - execlists_schedule_out(*port); + *inactive++ = *port; clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); /* Mark the end of active before we overwrite *active */ for (port = xchg(&execlists->active, execlists->pending); *port; port++) - execlists_schedule_out(*port); + *inactive++ = *port; clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); smp_wmb(); /* complete the seqlock for execlists_active() */ @@ -1614,6 +1615,8 @@ cancel_port_requests(struct intel_engine_execlists * const execlists) GEM_BUG_ON(execlists->pending[0]); cancel_timer(&execlists->timer); cancel_timer(&execlists->preempt); + + return inactive; } static inline void @@ -1741,7 +1744,8 @@ csb_read(const struct intel_engine_cs *engine, u64 * const csb) return entry; } -static void process_csb(struct intel_engine_cs *engine) +static struct i915_request ** +process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) { struct intel_engine_execlists * const execlists = &engine->execlists; u64 * const buf = execlists->csb_status; @@ -1770,7 +1774,7 @@ static void process_csb(struct intel_engine_cs *engine) head = execlists->csb_head; tail = READ_ONCE(*execlists->csb_write); if (unlikely(head == tail)) - return; + return inactive; /* * We will consume all events from HW, or at least pretend to. @@ -1850,7 +1854,7 @@ static void process_csb(struct intel_engine_cs *engine) /* cancel old inflight, prepare for switch */ trace_ports(execlists, "preempted", old); while (*old) - execlists_schedule_out(*old++); + *inactive++ = *old++; /* switch pending to inflight */ GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); @@ -1884,7 +1888,7 @@ static void process_csb(struct intel_engine_cs *engine) * itself... */ if (GEM_SHOW_DEBUG() && - !i915_request_completed(*execlists->active)) { + !__i915_request_is_complete(*execlists->active)) { struct i915_request *rq = *execlists->active; const u32 *regs __maybe_unused = rq->context->lrc_reg_state; @@ -1912,7 +1916,7 @@ static void process_csb(struct intel_engine_cs *engine) regs[CTX_RING_TAIL]); } - execlists_schedule_out(*execlists->active++); + *inactive++ = *execlists->active++; GEM_BUG_ON(execlists->active - execlists->inflight > execlists_num_ports(execlists)); @@ -1933,6 +1937,15 @@ static void process_csb(struct intel_engine_cs *engine) * invalidation before. */ invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); + + return inactive; +} + +static void post_process_csb(struct i915_request **port, + struct i915_request **last) +{ + while (port != last) + execlists_schedule_out(*port++); } static void __execlists_hold(struct i915_request *rq) @@ -1961,7 +1974,7 @@ static void __execlists_hold(struct i915_request *rq) if (!i915_request_is_ready(w)) continue; - if (i915_request_completed(w)) + if (__i915_request_is_complete(w)) continue; if (i915_request_on_hold(w)) @@ -1982,7 +1995,7 @@ static bool execlists_hold(struct intel_engine_cs *engine, spin_lock_irq(&engine->active.lock); - if (i915_request_completed(rq)) { /* too late! */ + if (__i915_request_is_complete(rq)) { /* too late! */ rq = NULL; goto unlock; } @@ -2208,8 +2221,8 @@ active_context(struct intel_engine_cs *engine, u32 ccid) for (port = el->active; (rq = *port); port++) { if (rq->context->lrc.ccid == ccid) { ENGINE_TRACE(engine, - "ccid found at active:%zd\n", - port - el->active); + "ccid:%x found at active:%zd\n", + ccid, port - el->active); return rq; } } @@ -2217,8 +2230,8 @@ active_context(struct intel_engine_cs *engine, u32 ccid) for (port = el->pending; (rq = *port); port++) { if (rq->context->lrc.ccid == ccid) { ENGINE_TRACE(engine, - "ccid found at pending:%zd\n", - port - el->pending); + "ccid:%x found at pending:%zd\n", + ccid, port - el->pending); return rq; } } @@ -2336,8 +2349,12 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) static void execlists_submission_tasklet(unsigned long data) { struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; + struct i915_request *post[2 * EXECLIST_MAX_PORTS]; + struct i915_request **inactive; - process_csb(engine); + rcu_read_lock(); + inactive = process_csb(engine, post); + GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); if (unlikely(preempt_timeout(engine))) { cancel_timer(&engine->execlists.preempt); @@ -2363,6 +2380,9 @@ static void execlists_submission_tasklet(unsigned long data) if (!engine->execlists.pending[0]) execlists_dequeue_irq(engine); + + post_process_csb(post, inactive); + rcu_read_unlock(); } static void __execlists_kick(struct intel_engine_execlists *execlists) @@ -2735,8 +2755,6 @@ static void enable_execlists(struct intel_engine_cs *engine) ENGINE_POSTING_READ(engine, RING_HWS_PGA); enable_error_interrupt(engine); - - engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); } static bool unexpected_starting_state(struct intel_engine_cs *engine) @@ -2806,22 +2824,30 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) engine->execlists.reset_ccid = active_ccid(engine); } -static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) +static struct i915_request ** +reset_csb(struct intel_engine_cs *engine, struct i915_request **inactive) { struct intel_engine_execlists * const execlists = &engine->execlists; - struct intel_context *ce; - struct i915_request *rq; - u32 head; mb(); /* paranoia: read the CSB pointers from after the reset */ clflush(execlists->csb_write); mb(); - process_csb(engine); /* drain preemption events */ + inactive = process_csb(engine, inactive); /* drain preemption events */ /* Following the reset, we need to reload the CSB read/write pointers */ reset_csb_pointers(engine); + return inactive; +} + +static void +execlists_reset_active(struct intel_engine_cs *engine, bool stalled) +{ + struct intel_context *ce; + struct i915_request *rq; + u32 head; + /* * Save the currently executing context, even if we completed * its request, it was still running at the time of the @@ -2829,12 +2855,12 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) */ rq = active_context(engine, engine->execlists.reset_ccid); if (!rq) - goto unwind; + return; ce = rq->context; GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); - if (i915_request_completed(rq)) { + if (__i915_request_is_complete(rq)) { /* Idle context; tidy up the ring so we can restart afresh */ head = intel_ring_wrap(ce->ring, rq->tail); goto out_replay; @@ -2862,7 +2888,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) * Otherwise, if we have not started yet, the request should replay * perfectly and we do not need to flag the result as being erroneous. */ - if (!i915_request_started(rq)) + if (!__i915_request_has_started(rq)) goto out_replay; /* @@ -2891,11 +2917,22 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) head, ce->ring->tail); lrc_reset_regs(ce, engine); ce->lrc.lrca = lrc_update_regs(ce, engine, head); +} -unwind: - /* Push back any incomplete requests for replay after the reset. */ - cancel_port_requests(execlists); - __unwind_incomplete_requests(engine); +static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) +{ + struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_request *post[2 * EXECLIST_MAX_PORTS]; + struct i915_request **inactive; + + rcu_read_lock(); + inactive = reset_csb(engine, post); + + execlists_reset_active(engine, true); + + inactive = cancel_port_requests(execlists, inactive); + post_process_csb(post, inactive); + rcu_read_unlock(); } static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) @@ -2904,11 +2941,15 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) ENGINE_TRACE(engine, "\n"); - spin_lock_irqsave(&engine->active.lock, flags); - - __execlists_reset(engine, stalled); + /* Process the csb, find the guilty context and throw away */ + execlists_reset_csb(engine, stalled); + /* Push back any incomplete requests for replay after the reset. */ + rcu_read_lock(); + spin_lock_irqsave(&engine->active.lock, flags); + __unwind_incomplete_requests(engine); spin_unlock_irqrestore(&engine->active.lock, flags); + rcu_read_unlock(); } static void nop_submission_tasklet(unsigned long data) @@ -2942,9 +2983,10 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) * submission's irq state, we also wish to remind ourselves that * it is irq state.) */ - spin_lock_irqsave(&engine->active.lock, flags); + execlists_reset_csb(engine, true); - __execlists_reset(engine, true); + rcu_read_lock(); + spin_lock_irqsave(&engine->active.lock, flags); /* Mark all executing requests as skipped. */ list_for_each_entry(rq, &engine->active.requests, sched.link) @@ -3000,6 +3042,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) execlists->tasklet.func = nop_submission_tasklet; spin_unlock_irqrestore(&engine->active.lock, flags); + rcu_read_unlock(); } static void execlists_reset_finish(struct intel_engine_cs *engine) @@ -3211,6 +3254,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) else execlists->csb_size = GEN11_CSB_ENTRIES; + engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); if (INTEL_GEN(engine->i915) >= 11) { execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); @@ -3515,12 +3559,12 @@ static void virtual_submit_request(struct i915_request *rq) old = ve->request; if (old) { /* background completion event from preempt-to-busy */ - GEM_BUG_ON(!i915_request_completed(old)); + GEM_BUG_ON(!__i915_request_is_complete(old)); __i915_request_submit(old); i915_request_put(old); } - if (i915_request_completed(rq)) { + if (__i915_request_is_complete(rq)) { __i915_request_submit(rq); ve->base.execlists.queue_priority_hint = INT_MIN; From patchwork Thu Dec 24 13:55:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B0CBC433DB for ; Thu, 24 Dec 2020 13:56:03 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B766422287 for ; Thu, 24 Dec 2020 13:56:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B766422287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1079D89C69; Thu, 24 Dec 2020 13:56:00 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0C48A89C69 for ; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423427-1500050 for ; Thu, 24 Dec 2020 13:55:44 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:40 +0000 Message-Id: <20201224135544.1713-5-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 5/9] drm/i915/gt: Remove virtual breadcrumb before transfer X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The issue with stale virtual breadcrumbs remain. Now we have the problem that if the irq-signaler is still referencing the stale breadcrumb as we transfer it to a new sibling, the list becomes spaghetti. This is a very small window, but that doesn't stop it being hit infrequently. To prevent the lists being tangled (the iterator starting on one engine's b->signalers but walking onto another list), always decouple the virtual breadcrumb on schedule-out and make sure that the walker has stepped out of the lists. Signed-off-by: Chris Wilson Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 +++-- .../gpu/drm/i915/gt/intel_execlists_submission.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index f96cd7d9b419..8f7c8595ba08 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -451,15 +451,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) { struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; struct intel_context *ce = rq->context; + unsigned long flags; bool release; if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) return; - spin_lock(&ce->signal_lock); + spin_lock_irqsave(&ce->signal_lock, flags); list_del_rcu(&rq->signal_link); release = remove_signaling_context(b, ce); - spin_unlock(&ce->signal_lock); + spin_unlock_irqrestore(&ce->signal_lock, flags); if (release) intel_context_put(ce); diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index b5f256be73dd..1f6e9fe5bcc0 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -581,6 +581,21 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); + struct intel_engine_cs *engine = rq->engine; + + /* Flush concurrent rcu iterators in signal_irq_work */ + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) { + /* + * After this point, the rq may be transferred to a new + * sibling, so before we clear ce->inflight make sure that + * the context has been removed from the b->signalers and + * furthermore we need to make sure that the concurrent + * iterator in signal_irq_work is no longer following + * ce->signal_link. + */ + i915_request_cancel_breadcrumb(rq); + irq_work_sync(&engine->breadcrumbs->irq_work); + } if (READ_ONCE(ve->request)) tasklet_hi_schedule(&ve->base.execlists.tasklet); From patchwork Thu Dec 24 13:55:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C153C433E6 for ; Thu, 24 Dec 2020 13:56:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D30FF22287 for ; Thu, 24 Dec 2020 13:56:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D30FF22287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 682CC89C93; Thu, 24 Dec 2020 13:55:59 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4474789C69 for ; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423428-1500050 for ; Thu, 24 Dec 2020 13:55:44 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:41 +0000 Message-Id: <20201224135544.1713-6-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 6/9] drm/i915/gt: Shrink the critical section for irq signaling X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Let's only wait for the list iterator when decoupling the virtual breadcrumb, as the signaling of all the requests may take a long time, during which we do not want to keep the tasklet spinning. Signed-off-by: Chris Wilson Reviewed-by: Matthew Brost --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 2 ++ drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 3 ++- 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 8f7c8595ba08..2eabb9ab5d47 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -234,6 +234,7 @@ static void signal_irq_work(struct irq_work *work) intel_breadcrumbs_disarm_irq(b); rcu_read_lock(); + atomic_inc(&b->signaler_active); list_for_each_entry_rcu(ce, &b->signalers, signal_link) { struct i915_request *rq; @@ -269,6 +270,7 @@ static void signal_irq_work(struct irq_work *work) } } } + atomic_dec(&b->signaler_active); rcu_read_unlock(); llist_for_each_safe(signal, sn, signal) { diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index d85a6f74fb87..3a084ce8ff5e 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -34,6 +34,7 @@ struct intel_breadcrumbs { spinlock_t signalers_lock; /* protects the list of signalers */ struct list_head signalers; struct llist_head signaled_requests; + atomic_t signaler_active; spinlock_t irq_lock; /* protects the interrupt from hardirq context */ struct irq_work irq_work; /* for use from inside irq_lock */ diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 1f6e9fe5bcc0..ba8229c0c75a 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -594,7 +594,8 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) * ce->signal_link. */ i915_request_cancel_breadcrumb(rq); - irq_work_sync(&engine->breadcrumbs->irq_work); + while (atomic_read(&engine->breadcrumbs->signaler_active)) + cpu_relax(); } if (READ_ONCE(ve->request)) From patchwork Thu Dec 24 13:55:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84465C433E0 for ; Thu, 24 Dec 2020 13:56:00 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 480B722287 for ; Thu, 24 Dec 2020 13:56:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 480B722287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4F99689C85; Thu, 24 Dec 2020 13:55:59 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 269F389C59 for ; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423429-1500050 for ; Thu, 24 Dec 2020 13:55:44 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:42 +0000 Message-Id: <20201224135544.1713-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 7/9] drm/i915/gt: Resubmit the virtual engine on schedule-out X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Having recognised that we do not change the sibling until we schedule out, we can then defer the decision to resubmit the virtual engine from the unwind of the active queue to scheduling out of the virtual context. This improves our resilence in virtual engine scheduling, and should eliminate the rare cases of gem_exec_balance failing. By keeping the unwind order intact on the local engine, we can preserve data dependency ordering while doing a preempt-to-busy pass until we have determined the new ELSP. This means that if we try to timeslice between a virtual engine and a data-dependent ordinary request, the pair will maintain their relative ordering and we will avoid the resubmission, cancelling the timeslicing until further change. The dilemma though is that we then may end up in a situation where the 'demotion' of the virtual request to an ordinary request in the engine queue results in filling the ELSP[] with virtual requests instead of spreading the load across the engines. To compensate for this, we mark each virtual request and refuse to resubmit a virtual request in the secondary ELSP slots, thus forcing subsequent virtual requests to be scheduled out after timeslicing. By delaying the decision until we schedule out, we will avoid unnecessary resubmission. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2079 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2098 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Matthew Auld --- .../drm/i915/gt/intel_execlists_submission.c | 126 +++++++++++------- drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 +- 2 files changed, 79 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index ba8229c0c75a..2d7a1440c31a 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -388,38 +388,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) __i915_request_unsubmit(rq); - /* - * Push the request back into the queue for later resubmission. - * If this request is not native to this physical engine (i.e. - * it came from a virtual source), push it back onto the virtual - * engine so that it can be moved across onto another physical - * engine as load dictates. - */ - if (likely(rq->execution_mask == engine->mask)) { - GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); - if (rq_prio(rq) != prio) { - prio = rq_prio(rq); - pl = i915_sched_lookup_priolist(engine, prio); - } - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); + if (rq_prio(rq) != prio) { + prio = rq_prio(rq); + pl = i915_sched_lookup_priolist(engine, prio); + } + GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - list_move(&rq->sched.link, pl); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + list_move(&rq->sched.link, pl); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - /* Check in case we rollback so far we wrap [size/2] */ - if (intel_ring_direction(rq->ring, - rq->tail, - rq->ring->tail + 8) > 0) - rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; + /* Check in case we rollback so far we wrap [size/2] */ + if (intel_ring_direction(rq->ring, + rq->tail, + rq->ring->tail + 8) > 0) + rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; - active = rq; - } else { - struct intel_engine_cs *owner = rq->context->engine; - - WRITE_ONCE(rq->engine, owner); - owner->submit_request(rq); - active = NULL; - } + active = rq; } return active; @@ -578,9 +563,9 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); } -static void kick_siblings(struct i915_request *rq, struct intel_context *ce) +static void +resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) { - struct virtual_engine *ve = container_of(ce, typeof(*ve), context); struct intel_engine_cs *engine = rq->engine; /* Flush concurrent rcu iterators in signal_irq_work */ @@ -598,6 +583,30 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) cpu_relax(); } + spin_lock_irq(&engine->active.lock); + + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + WRITE_ONCE(rq->engine, &ve->base); + ve->base.submit_request(rq); + + spin_unlock_irq(&engine->active.lock); +} + +static void kick_siblings(struct i915_request *rq, struct intel_context *ce) +{ + struct virtual_engine *ve = container_of(ce, typeof(*ve), context); + struct intel_engine_cs *engine = rq->engine; + + /* + * This engine is now too busy to run this virtual request, so + * see if we can find an alternative engine for it to execute on. + * Once a request has become bonded to this engine, we treat it the + * same as other native request. + */ + if (i915_request_in_priority_queue(rq) && + rq->execution_mask != engine->mask) + resubmit_virtual_request(rq, ve); + if (READ_ONCE(ve->request)) tasklet_hi_schedule(&ve->base.execlists.tasklet); } @@ -843,6 +852,20 @@ assert_pending_valid(const struct intel_engine_execlists *execlists, } sentinel = i915_request_has_sentinel(rq); + /* + * We want virtual requests to only be in the first slot so + * that they are never stuck behind a hog and can be immediately + * transferred onto the next idle engine. + */ + if (rq->execution_mask != engine->mask && + port != execlists->pending) { + GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n", + engine->name, + ce->timeline->fence_context, + port - execlists->pending); + return false; + } + /* Hold tightly onto the lock to prevent concurrent retires! */ if (!spin_trylock_irqsave(&rq->lock, flags)) continue; @@ -1502,6 +1525,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (i915_request_has_sentinel(last)) goto done; + /* + * We avoid submitting virtual requests into + * the secondary ports so that we can migrate + * the request immediately to another engine + * rather than wait for the primary request. + */ + if (rq->execution_mask != engine->mask) + goto done; + /* * If GVT overrides us we only ever submit * port[0], leaving port[1] empty. Note that we @@ -3562,7 +3594,6 @@ static void virtual_submission_tasklet(unsigned long data) static void virtual_submit_request(struct i915_request *rq) { struct virtual_engine *ve = to_virtual_engine(rq->engine); - struct i915_request *old; unsigned long flags; ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", @@ -3573,28 +3604,27 @@ static void virtual_submit_request(struct i915_request *rq) spin_lock_irqsave(&ve->base.active.lock, flags); - old = ve->request; - if (old) { /* background completion event from preempt-to-busy */ - GEM_BUG_ON(!__i915_request_is_complete(old)); - __i915_request_submit(old); - i915_request_put(old); - } - + /* By the time we resubmit a request, it may be completed */ if (__i915_request_is_complete(rq)) { __i915_request_submit(rq); + goto unlock; + } - ve->base.execlists.queue_priority_hint = INT_MIN; - ve->request = NULL; - } else { - ve->base.execlists.queue_priority_hint = rq_prio(rq); - ve->request = i915_request_get(rq); + if (ve->request) { /* background completion from preempt-to-busy */ + GEM_BUG_ON(!i915_request_completed(ve->request)); + __i915_request_submit(ve->request); + i915_request_put(ve->request); + } - GEM_BUG_ON(!list_empty(virtual_queue(ve))); - list_move_tail(&rq->sched.link, virtual_queue(ve)); + ve->base.execlists.queue_priority_hint = rq_prio(rq); + ve->request = i915_request_get(rq); - tasklet_hi_schedule(&ve->base.execlists.tasklet); - } + GEM_BUG_ON(!list_empty(virtual_queue(ve))); + list_move_tail(&rq->sched.link, virtual_queue(ve)); + + tasklet_hi_schedule(&ve->base.execlists.tasklet); +unlock: spin_unlock_irqrestore(&ve->base.active.lock, flags); } diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 47b12ce4b132..080b63000a4e 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -4566,7 +4566,7 @@ static int reset_virtual_engine(struct intel_gt *gt, spin_lock_irq(&engine->active.lock); __unwind_incomplete_requests(engine); spin_unlock_irq(&engine->active.lock); - GEM_BUG_ON(rq->engine != ve->engine); + GEM_BUG_ON(rq->engine != engine); /* Reset the engine while keeping our active request on hold */ execlists_hold(engine, rq); From patchwork Thu Dec 24 13:55:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 785E2C433E0 for ; Thu, 24 Dec 2020 13:55:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4EAF622287 for ; Thu, 24 Dec 2020 13:55:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4EAF622287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DB96089C88; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5FFFA89C59 for ; Thu, 24 Dec 2020 13:55:49 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423430-1500050 for ; Thu, 24 Dec 2020 13:55:44 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:43 +0000 Message-Id: <20201224135544.1713-8-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 8/9] drm/i915/gt: Simplify virtual engine handling for execlists_hold() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that the tasklet completely controls scheduling of the requests, and we postpone scheduling out the old requests, we can keep a hanging virtual request bound to the engine on which it hung, and remove it from te queue. On release, it will be returned to the same engine and remain in its queue until it is scheduled; after which point it will become eligible for transfer to a sibling. Instead, we could opt to resubmit the request along the virtual engine on unhold, making it eligible for load balancing immediately -- but that seems like a pointless optimisation for a hanging context. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- .../drm/i915/gt/intel_execlists_submission.c | 29 ------------------- 1 file changed, 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 2d7a1440c31a..dc1312d862ed 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2048,35 +2048,6 @@ static bool execlists_hold(struct intel_engine_cs *engine, goto unlock; } - if (rq->engine != engine) { /* preempted virtual engine */ - struct virtual_engine *ve = to_virtual_engine(rq->engine); - - /* - * intel_context_inflight() is only protected by virtue - * of process_csb() being called only by the tasklet (or - * directly from inside reset while the tasklet is suspended). - * Assert that neither of those are allowed to run while we - * poke at the request queues. - */ - GEM_BUG_ON(!reset_in_progress(&engine->execlists)); - - /* - * An unsubmitted request along a virtual engine will - * remain on the active (this) engine until we are able - * to process the context switch away (and so mark the - * context as no longer in flight). That cannot have happened - * yet, otherwise we would not be hanging! - */ - spin_lock(&ve->base.active.lock); - GEM_BUG_ON(intel_context_inflight(rq->context) != engine); - GEM_BUG_ON(ve->request != rq); - ve->request = NULL; - spin_unlock(&ve->base.active.lock); - i915_request_put(rq); - - rq->engine = engine; - } - /* * Transfer this request onto the hold queue to prevent it * being resumbitted to HW (and potentially completed) before we have From patchwork Thu Dec 24 13:55:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11989619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 196A0C433E0 for ; Thu, 24 Dec 2020 13:55:51 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A8EF22287 for ; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A8EF22287 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 008BD89C46; Thu, 24 Dec 2020 13:55:50 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 437AF89C46 for ; Thu, 24 Dec 2020 13:55:49 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23423431-1500050 for ; Thu, 24 Dec 2020 13:55:44 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 24 Dec 2020 13:55:44 +0000 Message-Id: <20201224135544.1713-9-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201224135544.1713-1-chris@chris-wilson.co.uk> References: <20201224135544.1713-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 9/9] drm/i915/gt: ce->inflight updates are now serialised X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since schedule-in and schedule-out are now both always under the tasklet bitlock, we can reduce the individual atomic operations to simple instructions and worry less. This notably eliminates the race observed with intel_context_inflight in __engine_unpark(). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2583 Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../drm/i915/gt/intel_execlists_submission.c | 52 +++++++++---------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index dc1312d862ed..1fae6c6f3868 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -524,11 +524,11 @@ __execlists_schedule_in(struct i915_request *rq) ce->lrc.ccid = ce->tag; } else { /* We don't need a strict matching tag, just different values */ - unsigned int tag = ffs(READ_ONCE(engine->context_tag)); + unsigned int tag = __ffs(engine->context_tag); - GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG); - clear_bit(tag - 1, &engine->context_tag); - ce->lrc.ccid = tag << (GEN11_SW_CTX_ID_SHIFT - 32); + GEM_BUG_ON(tag >= BITS_PER_LONG); + __clear_bit(tag, &engine->context_tag); + ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32); BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); } @@ -541,6 +541,8 @@ __execlists_schedule_in(struct i915_request *rq) execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); intel_engine_context_in(engine); + CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid); + return engine; } @@ -552,13 +554,10 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); trace_i915_request_in(rq, idx); - old = READ_ONCE(ce->inflight); - do { - if (!old) { - WRITE_ONCE(ce->inflight, __execlists_schedule_in(rq)); - break; - } - } while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old))); + old = ce->inflight; + if (!old) + old = __execlists_schedule_in(rq); + WRITE_ONCE(ce->inflight, ptr_inc(old)); GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); } @@ -611,12 +610,11 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) tasklet_hi_schedule(&ve->base.execlists.tasklet); } -static inline void -__execlists_schedule_out(struct i915_request *rq, - struct intel_engine_cs * const engine, - unsigned int ccid) +static inline void __execlists_schedule_out(struct i915_request *rq) { struct intel_context * const ce = rq->context; + struct intel_engine_cs * const engine = rq->engine; + unsigned int ccid; /* * NB process_csb() is not under the engine->active.lock and hence @@ -624,6 +622,8 @@ __execlists_schedule_out(struct i915_request *rq, * refrain from doing non-trivial work here. */ + CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid); + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) lrc_check_regs(ce, engine, "after"); @@ -635,12 +635,13 @@ __execlists_schedule_out(struct i915_request *rq, __i915_request_is_complete(rq)) intel_engine_add_retire(engine, ce->timeline); + ccid = ce->lrc.ccid; ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; ccid &= GEN12_MAX_CONTEXT_HW_ID; if (ccid < BITS_PER_LONG) { GEM_BUG_ON(ccid == 0); GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); - set_bit(ccid - 1, &engine->context_tag); + __set_bit(ccid - 1, &engine->context_tag); } lrc_update_runtime(ce); @@ -661,26 +662,23 @@ __execlists_schedule_out(struct i915_request *rq, */ if (ce->engine != engine) kick_siblings(rq, ce); - - intel_context_put(ce); } static inline void execlists_schedule_out(struct i915_request *rq) { struct intel_context * const ce = rq->context; - struct intel_engine_cs *cur, *old; - u32 ccid; trace_i915_request_out(rq); - ccid = rq->context->lrc.ccid; - old = READ_ONCE(ce->inflight); - do - cur = ptr_unmask_bits(old, 2) ? ptr_dec(old) : NULL; - while (!try_cmpxchg(&ce->inflight, &old, cur)); - if (!cur) - __execlists_schedule_out(rq, old, ccid); + GEM_BUG_ON(!ce->inflight); + ce->inflight = ptr_dec(ce->inflight); + if (!__intel_context_inflight_count(ce->inflight)) { + GEM_BUG_ON(ce->inflight != rq->engine); + __execlists_schedule_out(rq); + WRITE_ONCE(ce->inflight, NULL); + intel_context_put(ce); + } i915_request_put(rq); }