[08/46] drm/i915/execlists: Suppress mere WAIT preemption

Message ID	20190206130356.18771-9-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 6 Feb 2019 13:03:18 +0000 Message-Id: <20190206130356.18771-9-chris@chris-wilson.co.uk> In-Reply-To: <20190206130356.18771-1-chris@chris-wilson.co.uk> References: <20190206130356.18771-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[01/46] drm/i915: Hack and slash, throttle execbuffer hogs \| expand [01/46] drm/i915: Hack and slash, throttle execbuffer hogs [02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset [03/46] drm/i915: Force the GPU reset upon wedging [04/46] drm/i915: Uninterruptibly drain the timelines on unwedging [05/46] drm/i915: Wait for old resets before applying debugfs/i915_wedged [06/46] drm/i915: Serialise resets with wedging [07/46] drm/i915: Don't claim an unstarted request was guilty [08/46] drm/i915/execlists: Suppress mere WAIT preemption [09/46] drm/i915/execlists: Suppress redundant preemption [10/46] drm/i915: Make request allocation caches global [11/46] drm/i915: Keep timeline HWSP allocated until idle across the system [12/46] drm/i915/execlists: Refactor out can_merge_rq() [13/46] drm/i915: Compute the global scheduler caps [14/46] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ [15/46] drm/i915: Prioritise non-busywait semaphore workloads [16/46] drm/i915: Show support for accurate sw PMU busyness tracking [17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() [18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno [19/46] drm/i915/pmu: Always sample an active ringbuffer [20/46] drm/i915: Remove access to global seqno in the HWSP [21/46] drm/i915: Remove i915_request.global_seqno [22/46] drm/i915: Force GPU idle on suspend [23/46] drm/i915/selftests: Improve switch-to-kernel-context checking [24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling [25/46] drm/i915: Store the BIT(engine->id) as the engine's mask [26/46] drm/i915: Refactor common code to load initial power context [27/46] drm/i915: Reduce presumption of request ordering for barriers [28/46] drm/i915: Remove has-kernel-context [29/46] drm/i915: Introduce the i915_user_extension_method [30/46] drm/i915: Track active engines within a context [31/46] drm/i915: Introduce a context barrier callback [32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts [33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction [34/46] drm/i915: Allow contexts to share a single timeline across all engines [35/46] drm/i915: Fix I915_EXEC_RING_MASK [36/46] drm/i915: Remove last traces of exec-id (GEM_BUSY) [37/46] drm/i915: Re-arrange execbuf so context is known before engine [38/46] drm/i915: Allow a context to define its set of engines [39/46] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] [40/46] drm/i915: Pass around the intel_context [41/46] drm/i915: Split struct intel_context definition to its own header [42/46] drm/i915: Move over to intel_context_lookup() [43/46] drm/i915: Load balancing across a virtual engine [44/46] drm/i915: Extend execution fence to support a callback [45/46] drm/i915/execlists: Virtual engine bonding [46/46] drm/i915: Allow specification of parallel execbuf

Message ID

20190206130356.18771-9-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Wed,  6 Feb 2019 13:03:18 +0000
Message-Id: <20190206130356.18771-9-chris@chris-wilson.co.uk>
In-Reply-To: <20190206130356.18771-1-chris@chris-wilson.co.uk>
References: <20190206130356.18771-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT
 preemption
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

[01/46] drm/i915: Hack and slash, throttle execbuffer hogs | expand

Commit Message

Chris Wilson Feb. 6, 2019, 1:03 p.m. UTC

WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c        |  12 ++
 drivers/gpu/drm/i915/i915_scheduler.h      |   2 +
 drivers/gpu/drm/i915/intel_lrc.c           |   9 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 161 +++++++++++++++++++++
 4 files changed, 183 insertions(+), 1 deletion(-)

Comments

Tvrtko Ursulin Feb. 11, 2019, 11:19 a.m. UTC | #1

On 06/02/2019 13:03, Chris Wilson wrote:
> WAIT is occasionally suppressed by virtue of preempted requests being
> promoted to NEWCLIENT if they have not all ready received that boost.
> Make this consistent for all WAIT boosts that they are not allowed to
> preempt executing contexts and are merely granted the right to be at the
> front of the queue for the next execution slot. This is in keeping with
> the desire that the WAIT boost be a minor tweak that does not give
> excessive promotion to its user and open ourselves to trivial abuse.
> 
> The problem with the inconsistent WAIT preemption becomes more apparent
> as the preemption is propagated across the engines, where one engine may
> preempt and the other not, and we be relying on the exact execution
> order being consistent across engines (e.g. using HW semaphores to
> coordinate parallel execution).
> 
> v2: Also protect GuC submission from false preemption loops.
> v3: Build bug safeguards and better debug messages for st.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c        |  12 ++
>   drivers/gpu/drm/i915/i915_scheduler.h      |   2 +
>   drivers/gpu/drm/i915/intel_lrc.c           |   9 +-
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 161 +++++++++++++++++++++
>   4 files changed, 183 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c2a5c48c7541..35acef74b93a 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -372,12 +372,24 @@ void __i915_request_submit(struct i915_request *request)
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> +
>   	request->global_seqno = seqno;
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
>   	    !i915_request_enable_breadcrumb(request))
>   		intel_engine_queue_breadcrumbs(engine);
> +
> +	/*
> +	 * As we do not allow WAIT to preempt inflight requests,
> +	 * once we have executed a request, along with triggering
> +	 * any execution callbacks, we must preserve its ordering
> +	 * within the non-preemptible FIFO.
> +	 */
> +	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
> +	request->sched.attr.priority |= __NO_PREEMPTION;
> +
>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index dbe9cb7ecd82..54bd6c89817e 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -33,6 +33,8 @@ enum {
>   #define I915_PRIORITY_WAIT	((u8)BIT(0))
>   #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
>   
> +#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
> +
>   struct i915_sched_attr {
>   	/**
>   	 * @priority: execution and service priority
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5d5ce91a5dfa..afd05e25f911 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> +static int effective_prio(const struct i915_request *rq)
> +{
> +	/* Restrict mere WAIT boosts from triggering preemption */
> +	return rq_prio(rq) | __NO_PREEMPTION;
> +}
> +
>   static int queue_prio(const struct intel_engine_execlists *execlists)
>   {
>   	struct i915_priolist *p;
> @@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
>   				const struct i915_request *rq)
>   {
> -	const int last_prio = rq_prio(rq);
> +	int last_prio;
>   
>   	if (!intel_engine_has_preemption(engine))
>   		return false;
> @@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	 * preempt. If that hint is stale or we may be trying to preempt
>   	 * ourselves, ignore the request.
>   	 */
> +	last_prio = effective_prio(rq);
>   	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
>   				      last_prio))
>   		return false;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 58144e024751..263afd2f1596 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -407,6 +407,166 @@ static int live_suppress_self_preempt(void *arg)
>   	goto err_client_b;
>   }
>   
> +static int __i915_sw_fence_call
> +dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +	return NOTIFY_DONE;
> +}
> +
> +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> +{
> +	struct i915_request *rq;
> +
> +	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> +	if (!rq)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&rq->active_list);
> +	rq->engine = engine;
> +
> +	i915_sched_node_init(&rq->sched);
> +
> +	/* mark this request as permanently incomplete */
> +	rq->fence.seqno = 1;
> +	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> +	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> +	GEM_BUG_ON(i915_request_completed(rq));
> +
> +	i915_sw_fence_init(&rq->submit, dummy_notify);
> +	i915_sw_fence_commit(&rq->submit);
> +
> +	return rq;
> +}
> +
> +static void dummy_request_free(struct i915_request *dummy)
> +{
> +	i915_request_mark_complete(dummy);
> +	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> +	kfree(dummy);
> +}
> +
> +static int live_suppress_wait_preempt(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct preempt_client client[4];
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	int err = -ENOMEM;
> +	int i;
> +
> +	/*
> +	 * Waiters are given a little priority nudge, but not enough
> +	 * to actually cause any preemption. Double check that we do
> +	 * not needlessly generate preempt-to-idle cycles.
> +	 */
> +
> +	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
> +		return 0;
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
> +		goto err_unlock;
> +	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
> +		goto err_client_0;
> +	if (preempt_client_init(i915, &client[2])) /* head of queue */
> +		goto err_client_1;
> +	if (preempt_client_init(i915, &client[3])) /* bystander */
> +		goto err_client_2;
> +
> +	for_each_engine(engine, i915, id) {
> +		int depth;
> +
> +		if (!engine->emit_init_breadcrumb)
> +			continue;
> +
> +		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
> +			struct i915_request *rq[ARRAY_SIZE(client)];
> +			struct i915_request *dummy;
> +
> +			engine->execlists.preempt_hang.count = 0;
> +
> +			dummy = dummy_request(engine);
> +			if (!dummy)
> +				goto err_client_3;
> +
> +			for (i = 0; i < ARRAY_SIZE(client); i++) {
> +				rq[i] = igt_spinner_create_request(&client[i].spin,
> +								   client[i].ctx, engine,
> +								   MI_NOOP);
> +				if (IS_ERR(rq[i])) {
> +					err = PTR_ERR(rq[i]);
> +					goto err_wedged;
> +				}
> +
> +				/* Disable NEWCLIENT promotion */
> +				__i915_active_request_set(&rq[i]->timeline->last_request,
> +							  dummy);
> +				i915_request_add(rq[i]);
> +			}
> +
> +			dummy_request_free(dummy);
> +
> +			GEM_BUG_ON(i915_request_completed(rq[0]));
> +			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
> +				pr_err("%s: First client failed to start\n",
> +				       engine->name);
> +				goto err_wedged;
> +			}
> +			GEM_BUG_ON(!i915_request_started(rq[0]));
> +
> +			if (i915_request_wait(rq[depth],
> +					      I915_WAIT_LOCKED |
> +					      I915_WAIT_PRIORITY,
> +					      1) != -ETIME) {
> +				pr_err("%s: Waiter depth:%d completed!\n",
> +				       engine->name, depth);
> +				goto err_wedged;
> +			}
> +
> +			for (i = 0; i < ARRAY_SIZE(client); i++)
> +				igt_spinner_end(&client[i].spin);
> +
> +			if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +				goto err_wedged;
> +
> +			if (engine->execlists.preempt_hang.count) {
> +				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
> +				       engine->name,
> +				       engine->execlists.preempt_hang.count,
> +				       depth);
> +				err = -EINVAL;
> +				goto err_client_3;
> +			}
> +		}
> +	}
> +
> +	err = 0;
> +err_client_3:
> +	preempt_client_fini(&client[3]);
> +err_client_2:
> +	preempt_client_fini(&client[2]);
> +err_client_1:
> +	preempt_client_fini(&client[1]);
> +err_client_0:
> +	preempt_client_fini(&client[0]);
> +err_unlock:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +
> +err_wedged:
> +	for (i = 0; i < ARRAY_SIZE(client); i++)
> +		igt_spinner_end(&client[i].spin);
> +	i915_gem_set_wedged(i915);
> +	err = -EIO;
> +	goto err_client_3;
> +}
> +
>   static int live_chain_preempt(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -887,6 +1047,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_preempt),
>   		SUBTEST(live_late_preempt),
>   		SUBTEST(live_suppress_self_preempt),
> +		SUBTEST(live_suppress_wait_preempt),
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

Matthew Auld Feb. 19, 2019, 10:22 a.m. UTC | #2

On Wed, 6 Feb 2019 at 13:05, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> WAIT is occasionally suppressed by virtue of preempted requests being
> promoted to NEWCLIENT if they have not all ready received that boost.
> Make this consistent for all WAIT boosts that they are not allowed to
> preempt executing contexts and are merely granted the right to be at the
> front of the queue for the next execution slot. This is in keeping with
> the desire that the WAIT boost be a minor tweak that does not give
> excessive promotion to its user and open ourselves to trivial abuse.
>
> The problem with the inconsistent WAIT preemption becomes more apparent
> as the preemption is propagated across the engines, where one engine may
> preempt and the other not, and we be relying on the exact execution
> order being consistent across engines (e.g. using HW semaphores to
> coordinate parallel execution).
>
> v2: Also protect GuC submission from false preemption loops.
> v3: Build bug safeguards and better debug messages for st.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c        |  12 ++
>  drivers/gpu/drm/i915/i915_scheduler.h      |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c           |   9 +-
>  drivers/gpu/drm/i915/selftests/intel_lrc.c | 161 +++++++++++++++++++++
>  4 files changed, 183 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c2a5c48c7541..35acef74b93a 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -372,12 +372,24 @@ void __i915_request_submit(struct i915_request *request)
>
>         /* We may be recursing from the signal callback of another i915 fence */
>         spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +
>         GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>         set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> +
>         request->global_seqno = seqno;
>         if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
>             !i915_request_enable_breadcrumb(request))
>                 intel_engine_queue_breadcrumbs(engine);
> +
> +       /*
> +        * As we do not allow WAIT to preempt inflight requests,
> +        * once we have executed a request, along with triggering
> +        * any execution callbacks, we must preserve its ordering
> +        * within the non-preemptible FIFO.
> +        */
> +       BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
> +       request->sched.attr.priority |= __NO_PREEMPTION;
> +
>         spin_unlock(&request->lock);
>
>         engine->emit_fini_breadcrumb(request,
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index dbe9cb7ecd82..54bd6c89817e 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -33,6 +33,8 @@ enum {
>  #define I915_PRIORITY_WAIT     ((u8)BIT(0))
>  #define I915_PRIORITY_NEWCLIENT        ((u8)BIT(1))
>
> +#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
> +
>  struct i915_sched_attr {
>         /**
>          * @priority: execution and service priority
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5d5ce91a5dfa..afd05e25f911 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
>         return rq->sched.attr.priority;
>  }
>
> +static int effective_prio(const struct i915_request *rq)
> +{
> +       /* Restrict mere WAIT boosts from triggering preemption */
> +       return rq_prio(rq) | __NO_PREEMPTION;
> +}
> +
>  static int queue_prio(const struct intel_engine_execlists *execlists)
>  {
>         struct i915_priolist *p;
> @@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>  static inline bool need_preempt(const struct intel_engine_cs *engine,
>                                 const struct i915_request *rq)
>  {
> -       const int last_prio = rq_prio(rq);
> +       int last_prio;
>
>         if (!intel_engine_has_preemption(engine))
>                 return false;
> @@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>          * preempt. If that hint is stale or we may be trying to preempt
>          * ourselves, ignore the request.
>          */
> +       last_prio = effective_prio(rq);
>         if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
>                                       last_prio))
>                 return false;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 58144e024751..263afd2f1596 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -407,6 +407,166 @@ static int live_suppress_self_preempt(void *arg)
>         goto err_client_b;
>  }
>
> +static int __i915_sw_fence_call
> +dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +       return NOTIFY_DONE;
> +}
> +
> +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> +{
> +       struct i915_request *rq;
> +
> +       rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> +       if (!rq)
> +               return NULL;
> +
> +       INIT_LIST_HEAD(&rq->active_list);
> +       rq->engine = engine;
> +
> +       i915_sched_node_init(&rq->sched);
> +
> +       /* mark this request as permanently incomplete */
> +       rq->fence.seqno = 1;
> +       BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> +       rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> +       GEM_BUG_ON(i915_request_completed(rq));
> +
> +       i915_sw_fence_init(&rq->submit, dummy_notify);
> +       i915_sw_fence_commit(&rq->submit);
> +
> +       return rq;
> +}
> +
> +static void dummy_request_free(struct i915_request *dummy)
> +{
> +       i915_request_mark_complete(dummy);
> +       i915_sched_node_fini(dummy->engine->i915, &dummy->sched);

Do we need i915_sw_fence_fini() in here somewhere?

While looking at something unrelated I hit something like:
ODEBUG: init destroyed (active state 0) object type: i915_sw_fence
hint:           (null)

Chris Wilson Feb. 19, 2019, 10:34 a.m. UTC | #3

Quoting Matthew Auld (2019-02-19 10:22:57)
> On Wed, 6 Feb 2019 at 13:05, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> > +{
> > +       struct i915_request *rq;
> > +
> > +       rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> > +       if (!rq)
> > +               return NULL;
> > +
> > +       INIT_LIST_HEAD(&rq->active_list);
> > +       rq->engine = engine;
> > +
> > +       i915_sched_node_init(&rq->sched);
> > +
> > +       /* mark this request as permanently incomplete */
> > +       rq->fence.seqno = 1;
> > +       BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> > +       rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> > +       GEM_BUG_ON(i915_request_completed(rq));
> > +
> > +       i915_sw_fence_init(&rq->submit, dummy_notify);
> > +       i915_sw_fence_commit(&rq->submit);
> > +
> > +       return rq;
> > +}
> > +
> > +static void dummy_request_free(struct i915_request *dummy)
> > +{
> > +       i915_request_mark_complete(dummy);
> > +       i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> 
> Do we need i915_sw_fence_fini() in here somewhere?
> 
> While looking at something unrelated I hit something like:
> ODEBUG: init destroyed (active state 0) object type: i915_sw_fence
> hint:           (null)

Yeah, a missing fw_fence_fini would account for that. We should also use
dma_fence_release if I haven't already, just in case it ends up being
RCU sensitive.

As for requiring dummy_request in the first place, I think it indicates
that the i915_request_add() api is inadequate. At the moment, the only
sore points are this particular test and later on when we have to
manually fudge the priority after submission (for heartbeat requests).
So, it's not a pressing issue, but definitely a weak point.
-Chris

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c2a5c48c7541..35acef74b93a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -372,12 +372,24 @@  void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
+
 	request->global_seqno = seqno;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
+
+	/*
+	 * As we do not allow WAIT to preempt inflight requests,
+	 * once we have executed a request, along with triggering
+	 * any execution callbacks, we must preserve its ordering
+	 * within the non-preemptible FIFO.
+	 */
+	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
+	request->sched.attr.priority |= __NO_PREEMPTION;
+
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dbe9cb7ecd82..54bd6c89817e 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -33,6 +33,8 @@  enum {
 #define I915_PRIORITY_WAIT	((u8)BIT(0))
 #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
 
+#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
+
 struct i915_sched_attr {
 	/**
 	 * @priority: execution and service priority
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5d5ce91a5dfa..afd05e25f911 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -188,6 +188,12 @@  static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static int effective_prio(const struct i915_request *rq)
+{
+	/* Restrict mere WAIT boosts from triggering preemption */
+	return rq_prio(rq) | __NO_PREEMPTION;
+}
+
 static int queue_prio(const struct intel_engine_execlists *execlists)
 {
 	struct i915_priolist *p;
@@ -208,7 +214,7 @@  static int queue_prio(const struct intel_engine_execlists *execlists)
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq)
 {
-	const int last_prio = rq_prio(rq);
+	int last_prio;
 
 	if (!intel_engine_has_preemption(engine))
 		return false;
@@ -228,6 +234,7 @@  static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * preempt. If that hint is stale or we may be trying to preempt
 	 * ourselves, ignore the request.
 	 */
+	last_prio = effective_prio(rq);
 	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
 				      last_prio))
 		return false;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..263afd2f1596 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -407,6 +407,166 @@  static int live_suppress_self_preempt(void *arg)
 	goto err_client_b;
 }
 
+static int __i915_sw_fence_call
+dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
+static struct i915_request *dummy_request(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
+	if (!rq)
+		return NULL;
+
+	INIT_LIST_HEAD(&rq->active_list);
+	rq->engine = engine;
+
+	i915_sched_node_init(&rq->sched);
+
+	/* mark this request as permanently incomplete */
+	rq->fence.seqno = 1;
+	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
+	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
+	GEM_BUG_ON(i915_request_completed(rq));
+
+	i915_sw_fence_init(&rq->submit, dummy_notify);
+	i915_sw_fence_commit(&rq->submit);
+
+	return rq;
+}
+
+static void dummy_request_free(struct i915_request *dummy)
+{
+	i915_request_mark_complete(dummy);
+	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	kfree(dummy);
+}
+
+static int live_suppress_wait_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct preempt_client client[4];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+	int i;
+
+	/*
+	 * Waiters are given a little priority nudge, but not enough
+	 * to actually cause any preemption. Double check that we do
+	 * not needlessly generate preempt-to-idle cycles.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
+		goto err_unlock;
+	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
+		goto err_client_0;
+	if (preempt_client_init(i915, &client[2])) /* head of queue */
+		goto err_client_1;
+	if (preempt_client_init(i915, &client[3])) /* bystander */
+		goto err_client_2;
+
+	for_each_engine(engine, i915, id) {
+		int depth;
+
+		if (!engine->emit_init_breadcrumb)
+			continue;
+
+		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
+			struct i915_request *rq[ARRAY_SIZE(client)];
+			struct i915_request *dummy;
+
+			engine->execlists.preempt_hang.count = 0;
+
+			dummy = dummy_request(engine);
+			if (!dummy)
+				goto err_client_3;
+
+			for (i = 0; i < ARRAY_SIZE(client); i++) {
+				rq[i] = igt_spinner_create_request(&client[i].spin,
+								   client[i].ctx, engine,
+								   MI_NOOP);
+				if (IS_ERR(rq[i])) {
+					err = PTR_ERR(rq[i]);
+					goto err_wedged;
+				}
+
+				/* Disable NEWCLIENT promotion */
+				__i915_active_request_set(&rq[i]->timeline->last_request,
+							  dummy);
+				i915_request_add(rq[i]);
+			}
+
+			dummy_request_free(dummy);
+
+			GEM_BUG_ON(i915_request_completed(rq[0]));
+			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
+				pr_err("%s: First client failed to start\n",
+				       engine->name);
+				goto err_wedged;
+			}
+			GEM_BUG_ON(!i915_request_started(rq[0]));
+
+			if (i915_request_wait(rq[depth],
+					      I915_WAIT_LOCKED |
+					      I915_WAIT_PRIORITY,
+					      1) != -ETIME) {
+				pr_err("%s: Waiter depth:%d completed!\n",
+				       engine->name, depth);
+				goto err_wedged;
+			}
+
+			for (i = 0; i < ARRAY_SIZE(client); i++)
+				igt_spinner_end(&client[i].spin);
+
+			if (igt_flush_test(i915, I915_WAIT_LOCKED))
+				goto err_wedged;
+
+			if (engine->execlists.preempt_hang.count) {
+				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
+				       engine->name,
+				       engine->execlists.preempt_hang.count,
+				       depth);
+				err = -EINVAL;
+				goto err_client_3;
+			}
+		}
+	}
+
+	err = 0;
+err_client_3:
+	preempt_client_fini(&client[3]);
+err_client_2:
+	preempt_client_fini(&client[2]);
+err_client_1:
+	preempt_client_fini(&client[1]);
+err_client_0:
+	preempt_client_fini(&client[0]);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		igt_spinner_end(&client[i].spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_3;
+}
+
 static int live_chain_preempt(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -887,6 +1047,7 @@  int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_self_preempt),
+		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),

[08/46] drm/i915/execlists: Suppress mere WAIT preemption

Commit Message

Comments

Patch