From patchwork Thu Apr 25 09:20:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10916385 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 524A31515 for ; Thu, 25 Apr 2019 09:20:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42FB928C42 for ; Thu, 25 Apr 2019 09:20:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 374BB28C54; Thu, 25 Apr 2019 09:20:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 74B3C28C42 for ; Thu, 25 Apr 2019 09:20:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 62F04891A6; Thu, 25 Apr 2019 09:20:26 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 50A858914F for ; Thu, 25 Apr 2019 09:20:23 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16355027-1500050 for multiple; Thu, 25 Apr 2019 10:20:11 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Thu, 25 Apr 2019 10:20:04 +0100 Message-Id: <20190425092004.9995-45-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190425092004.9995-1-chris@chris-wilson.co.uk> References: <20190425092004.9995-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 45/45] drm/i915/execlists: Minimalistic timeslicing X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we have multiple contexts of equal priority pending execution, activate a timer to demote the currently executing context in favour of the next in the queue when that timeslice expires. This enforces fairness between contexts (so long as they allow preemption -- forced preemption, in the future, will kick those who do not obey) and allows us to avoid userspace blocking forward progress with e.g. unbounded MI_SEMAPHORE_WAIT. For the starting point here, we use the jiffie as our timeslice so that we should be reasonably efficient wrt frequent CPU wakeups. Testcase: igt/gem_exec_scheduler/semaphore-resolve Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + drivers/gpu/drm/i915/gt/intel_lrc.c | 111 +++++++++++++++++++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 3 + drivers/gpu/drm/i915/i915_scheduler.c | 1 + drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 122 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 30292e17b827..ca28ffbcf43c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include "i915_gem.h" @@ -137,6 +138,11 @@ struct intel_engine_execlists { */ struct tasklet_struct tasklet; + /** + * @timer: kick the current context if its timeslice expires + */ + struct timer_list timer; + /** * @default_priolist: priority list for I915_PRIORITY_NORMAL */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index f3b36a7296e1..e140c2a301a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -835,6 +835,82 @@ last_active(const struct intel_engine_execlists *execlists) return *last; } +static void +defer_request(struct i915_request * const rq, struct list_head * const pl) +{ + struct i915_dependency *p; + + /* + * We want to move the interrupted request to the back of + * the round-robin list (i.e. its priority level), but + * in doing so, we must then move all requests that were in + * flight and were waiting for the interrupted request to + * be run after it again. + */ + list_move_tail(&rq->sched.link, pl); + + list_for_each_entry(p, &rq->sched.waiters_list, wait_link) { + struct i915_request *w = + container_of(p->waiter, typeof(*w), sched); + + if (!i915_sw_fence_done(&w->submit)) + continue; + + /* Leave semaphores spinning on the other engines */ + if (w->engine != rq->engine) + continue; + + /* No waiter should start before the active request completed */ + GEM_BUG_ON(i915_request_started(w)); + + GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); + if (rq_prio(w) < rq_prio(rq)) + continue; + + /* + * This should be very shallow as it is limited by the + * number of requests that can fit in a ring (<64) and + * the number of contexts that can be in flight on this + * engine. + */ + defer_request(w, pl); + } +} + +static void defer_active(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = __unwind_incomplete_requests(engine); + if (!rq) + return; + + defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); +} + +static bool +need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq) +{ + int hint; + + if (list_is_last(&rq->sched.link, &engine->active.requests)) + return false; + + hint = max(rq_prio(list_next_entry(rq, sched.link)), + queue_prio(&engine->execlists)); + hint |= __NO_PREEMPTION; + + return hint >= rq_prio(rq); +} + +static bool +enable_timeslice(struct intel_engine_cs *engine) +{ + struct i915_request *last = last_active(&engine->execlists); + + return last && need_timeslice(engine, last); +} + static bool execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -928,6 +1004,27 @@ static bool execlists_dequeue(struct intel_engine_cs *engine) */ last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE; last = NULL; + } else if (need_timeslice(engine, last) && + !timer_pending(&engine->execlists.timer)) { + GEM_TRACE("%s: expired last=%llx:%lld, prio=%d, hint=%d\n", + engine->name, + last->fence.context, + last->fence.seqno, + last->sched.attr.priority, + execlists->queue_priority_hint); + + ring_suspend(engine) = 1; + defer_active(engine); + + /* + * Unlike for preemption, if we rewind and continue + * executing the same context as previously active, + * the order of execution will remain the same and + * the tail will only advance. We do not need to + * force a full context restore, as a lite-restore + * is sufficient to resample the monotonic TAIL. + */ + last = NULL; } else { /* * Otherwise if we already have a request pending @@ -1249,6 +1346,9 @@ static void process_csb(struct intel_engine_cs *engine) sizeof(*execlists->pending)); execlists->pending[0] = NULL; + if (enable_timeslice(engine)) + mod_timer(&execlists->timer, jiffies + 1); + if (!inject_preempt_hang(execlists)) ring_suspend(engine) = 0; } else if (status & GEN8_CTX_STATUS_PREEMPTED) { @@ -1325,6 +1425,15 @@ static void execlists_submission_tasklet(unsigned long data) spin_unlock_irqrestore(&engine->active.lock, flags); } +static void execlists_submission_timer(struct timer_list *timer) +{ + struct intel_engine_cs *engine = + from_timer(engine, timer, execlists.timer); + + /* Kick the tasklet for some interrupt coalescing and reset handling */ + tasklet_hi_schedule(&engine->execlists.tasklet); +} + static void queue_request(struct intel_engine_cs *engine, struct i915_sched_node *node, int prio) @@ -2563,6 +2672,7 @@ static int gen8_init_rcs_context(struct i915_request *rq) static void execlists_park(struct intel_engine_cs *engine) { + del_timer_sync(&engine->execlists.timer); tasklet_kill(&engine->execlists.tasklet); } @@ -2661,6 +2771,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) tasklet_init(&engine->execlists.tasklet, execlists_submission_tasklet, (unsigned long)engine); + timer_setup(&engine->execlists.timer, execlists_submission_timer, 0); logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 9edb6662b903..8a0c706e9126 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -396,6 +396,9 @@ static int live_late_preempt(void *arg) if (!ctx_lo) goto err_ctx_hi; + /* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */ + ctx_lo->sched.priority = I915_USER_PRIORITY(1); + for_each_engine(engine, i915, id) { struct igt_live_test t; struct i915_request *rq; diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 601aae909491..7d0c14a5e687 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -71,6 +71,7 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, list_add(&dep->wait_link, &signal->waiters_list); list_add(&dep->signal_link, &node->signalers_list); dep->signaler = signal; + dep->waiter = node; dep->flags = flags; /* Keep track of whether anyone on this chain has a semaphore */ diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 166a457884b2..21fb9cd81fcb 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -62,6 +62,7 @@ struct i915_sched_node { struct i915_dependency { struct i915_sched_node *signaler; + struct i915_sched_node *waiter; struct list_head signal_link; struct list_head wait_link; struct list_head dfs_link;