[RFC,33/97] drm/i915: Engine relative MMIO

Message ID	20210506191451.77768-34-matthew.brost@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=3lL8=KB=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8160610D2 IronPort-SDR: DrxqRibbXyvus1ViPLrbeM/ii8g6J3U1i/iryLhdaL+5fxxmBkeMEy0j2vB66QKp4t0AsnnWwp eHesQ9oSGgsQ== IronPort-SDR: Y/TAttr2CLrDdWti6Ly4lnr7+xRRobRVMUi5S1HHAUvmNr5Uqk20Z9emXan1iOA7+V7S/GXIdJ /WsnNpo5PGhw== From: Matthew Brost <matthew.brost@intel.com> To: <intel-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org> Subject: [RFC PATCH 33/97] drm/i915: Engine relative MMIO Date: Thu, 6 May 2021 12:13:47 -0700 Message-Id: <20210506191451.77768-34-matthew.brost@intel.com> In-Reply-To: <20210506191451.77768-1-matthew.brost@intel.com> References: <20210506191451.77768-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: matthew.brost@intel.com, tvrtko.ursulin@intel.com, daniele.ceraolospurio@intel.com, jason.ekstrand@intel.com, jon.bloomfield@intel.com, daniel.vetter@intel.com, john.c.harrison@intel.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	Basic GuC submission support in the i915 \| expand [RFC,00/97] Basic GuC submission support in the i915 [RFC,01/97] drm/i915/gt: Move engine setup out of set_default_submission [RFC,02/97] drm/i915/gt: Move submission_method into intel_gt [RFC,03/97] drm/i915/gt: Move CS interrupt handler to the backend [RFC,04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC [RFC,05/97] drm/i915/guc: use probe_error log for CT enablement failure [RFC,06/97] drm/i915/guc: enable only the user interrupt when using GuC submission [RFC,07/97] drm/i915/guc: Remove sample_forcewake h2g action [RFC,08/97] drm/i915/guc: Keep strict GuC ABI definitions [RFC,09/97] drm/i915/guc: Stop using fence/status from CTB descriptor [RFC,10/97] drm/i915: Promote ptrdiff() to i915_utils.h [RFC,11/97] drm/i915/guc: Only rely on own CTB size [RFC,12/97] drm/i915/guc: Don't repeat CTB layout calculations [RFC,13/97] drm/i915/guc: Replace CTB array with explicit members [RFC,14/97] drm/i915/guc: Update sizes of CTB buffers [RFC,15/97] drm/i915/guc: Relax CTB response timeout [RFC,16/97] drm/i915/guc: Start protecting access to CTB descriptors [RFC,17/97] drm/i915/guc: Stop using mutex while sending CTB messages [RFC,18/97] drm/i915/guc: Don't receive all G2H messages in irq handler [RFC,19/97] drm/i915/guc: Always copy CT message to new allocation [RFC,20/97] drm/i915/guc: Introduce unified HXG messages [RFC,21/97] drm/i915/guc: Update MMIO based communication [RFC,22/97] drm/i915/guc: Update CTB response status [RFC,23/97] drm/i915/guc: Support per context scheduling policies [RFC,24/97] drm/i915/guc: Add flag for mark broken CTB [RFC,25/97] drm/i915/guc: New definition of the CTB descriptor [RFC,26/97] drm/i915/guc: New definition of the CTB registration action [RFC,27/97] drm/i915/guc: New CTB based communication [RFC,28/97] drm/i915/guc: Kill guc_clients.ct_pool [RFC,29/97] drm/i915/guc: Update firmware to v60.1.2 [RFC,30/97] drm/i915/uc: turn on GuC/HuC auto mode by default [RFC,31/97] drm/i915/guc: Early initialization of GuC send registers [RFC,32/97] drm/i915: Introduce i915_sched_engine object [RFC,33/97] drm/i915: Engine relative MMIO [RFC,34/97] drm/i915/guc: Use guc_class instead of engine_class in fw interface [RFC,35/97] drm/i915/guc: Improve error message for unsolicited CT response [RFC,36/97] drm/i915/guc: Add non blocking CTB send function [RFC,37/97] drm/i915/guc: Add stall timer to non blocking CTB send function [RFC,38/97] drm/i915/guc: Optimize CTB writes and reads [RFC,39/97] drm/i915/guc: Increase size of CTB buffers [RFC,40/97] drm/i915/guc: Module load failure test for CT buffer creation [RFC,41/97] drm/i915/guc: Add new GuC interface defines and structures [RFC,42/97] drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor [RFC,43/97] drm/i915/guc: Add lrc descriptor context lookup array [RFC,44/97] drm/i915/guc: Implement GuC submission tasklet [RFC,45/97] drm/i915/guc: Add bypass tasklet submission path to GuC [RFC,46/97] drm/i915/guc: Implement GuC context operations for new inteface [RFC,47/97] drm/i915/guc: Insert fence on context when deregistering [RFC,48/97] drm/i915/guc: Defer context unpin until scheduling is disabled [RFC,49/97] drm/i915/guc: Disable engine barriers with GuC during unpin [RFC,50/97] drm/i915/guc: Extend deregistration fence to schedule disable [RFC,51/97] drm/i915: Disable preempt busywait when using GuC scheduling [RFC,52/97] drm/i915/guc: Ensure request ordering via completion fences [RFC,53/97] drm/i915/guc: Disable semaphores when using GuC scheduling [RFC,54/97] drm/i915/guc: Ensure G2H response has space in buffer [RFC,55/97] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC [RFC,56/97] drm/i915/guc: Update GuC debugfs to support new GuC [RFC,57/97] drm/i915/guc: Add several request trace points [RFC,58/97] drm/i915: Add intel_context tracing [RFC,59/97] drm/i915/guc: GuC virtual engines [RFC,60/97] drm/i915: Track 'serial' counts for virtual engines [RFC,61/97] drm/i915: Hold reference to intel_context over life of i915_request [RFC,62/97] drm/i915/guc: Disable bonding extension with GuC submission [RFC,63/97] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs [RFC,64/97] drm/i915/guc: Reset implementation for new GuC interface [RFC,65/97] drm/i915: Reset GPU immediately if submission is disabled [RFC,66/97] drm/i915/guc: Add disable interrupts to guc sanitize [RFC,67/97] drm/i915/guc: Suspend/resume implementation for new interface [RFC,68/97] drm/i915/guc: Handle context reset notification [RFC,69/97] drm/i915/guc: Handle engine reset failure notification [RFC,70/97] drm/i915/guc: Enable the timer expired interrupt for GuC [RFC,71/97] drm/i915/guc: Provide mmio list to be saved/restored on engine reset [RFC,72/97] drm/i915/guc: Don't complain about reset races [RFC,73/97] drm/i915/guc: Enable GuC engine reset [RFC,74/97] drm/i915/guc: Capture error state on context reset [RFC,75/97] drm/i915/guc: Fix for error capture after full GPU reset with GuC [RFC,76/97] drm/i915/guc: Hook GuC scheduling policies up [RFC,77/97] drm/i915/guc: Connect reset modparam updates to GuC policy flags [RFC,78/97] drm/i915/guc: Include scheduling policies in the debugfs state dump [RFC,79/97] drm/i915/guc: Don't call ring_is_idle in GuC submission [RFC,80/97] drm/i915/guc: Implement banned contexts for GuC submission [RFC,81/97] drm/i915/guc: Allow flexible number of context ids [RFC,82/97] drm/i915/guc: Connect the number of guc_ids to debugfs [RFC,83/97] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted [RFC,84/97] drm/i915/guc: Don't allow requests not ready to consume all guc_ids [RFC,85/97] drm/i915/guc: Introduce guc_submit_engine object [RFC,86/97] drm/i915/guc: Add golden context to GuC ADS [RFC,87/97] drm/i915/guc: Implement GuC priority management [RFC,88/97] drm/i915/guc: Support request cancellation [RFC,89/97] drm/i915/guc: Check return of __xa_store when registering a context [RFC,90/97] drm/i915/guc: Non-static lrc descriptor registration buffer [RFC,91/97] drm/i915/guc: Take GT PM ref when deregistering context [RFC,92/97] drm/i915: Add GT PM delayed worker [RFC,93/97] drm/i915/guc: Take engine PM when a context is pinned with GuC submission [RFC,94/97] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission [RFC,95/97] drm/i915/guc: Selftest for GuC flow control [RFC,96/97] drm/i915/guc: Update GuC documentation [RFC,97/97] drm/i915/guc: Unblock GuC submission on Gen11+

Message ID

20210506191451.77768-34-matthew.brost@intel.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8160610D2
IronPort-SDR: 
 DrxqRibbXyvus1ViPLrbeM/ii8g6J3U1i/iryLhdaL+5fxxmBkeMEy0j2vB66QKp4t0AsnnWwp
 eHesQ9oSGgsQ==
IronPort-SDR: 
 Y/TAttr2CLrDdWti6Ly4lnr7+xRRobRVMUi5S1HHAUvmNr5Uqk20Z9emXan1iOA7+V7S/GXIdJ
 /WsnNpo5PGhw==
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Subject: [RFC PATCH 33/97] drm/i915: Engine relative MMIO
Date: Thu,  6 May 2021 12:13:47 -0700
Message-Id: <20210506191451.77768-34-matthew.brost@intel.com>
In-Reply-To: <20210506191451.77768-1-matthew.brost@intel.com>
References: <20210506191451.77768-1-matthew.brost@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Cc: matthew.brost@intel.com, tvrtko.ursulin@intel.com,
 daniele.ceraolospurio@intel.com, jason.ekstrand@intel.com,
 jon.bloomfield@intel.com, daniel.vetter@intel.com, john.c.harrison@intel.com
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

Basic GuC submission support in the i915 | expand

Commit Message

Matthew Brost May 6, 2021, 7:13 p.m. UTC

From: John Harrison <John.C.Harrison@Intel.com>

With virtual engines, it is no longer possible to know which specific
physical engine a given request will be executed on at the time that
request is generated. This means that the request itself must be engine
agnostic - any direct register writes must be relative to the engine
and not absolute addresses.

The LRI command has support for engine relative addressing. However,
the mechanism is not transparent to the driver. The scheme for Gen11
(MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
absolute engine base component. The hardware then adds on the correct
engine offset at execution time.

Due to the non-trivial and differing schemes on different hardware, it
is not possible to simply update the code that creates the LRI
commands to set a remap flag and let the hardware get on with it.
Instead, this patch adds function wrappers for generating the LRI
command itself and then for constructing the correct address to use
with the LRI.

Bspec: 45606
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Chris P Wilson <chris.p.wilson@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 +++---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 25 ++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 +++
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 ++++
 drivers/gpu/drm/i915/i915_perf.c             |  6 +++++
 5 files changed, 43 insertions(+), 3 deletions(-)

Comments

Tvrtko Ursulin May 25, 2021, 9:05 a.m. UTC | #1

On 06/05/2021 20:13, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> With virtual engines, it is no longer possible to know which specific
> physical engine a given request will be executed on at the time that
> request is generated. This means that the request itself must be engine
> agnostic - any direct register writes must be relative to the engine
> and not absolute addresses.
> 
> The LRI command has support for engine relative addressing. However,
> the mechanism is not transparent to the driver. The scheme for Gen11
> (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
> absolute engine base component. The hardware then adds on the correct
> engine offset at execution time.
> 
> Due to the non-trivial and differing schemes on different hardware, it
> is not possible to simply update the code that creates the LRI
> commands to set a remap flag and let the hardware get on with it.
> Instead, this patch adds function wrappers for generating the LRI
> command itself and then for constructing the correct address to use
> with the LRI.
> 
> Bspec: 45606
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Chris P Wilson <chris.p.wilson@intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 +++---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 25 ++++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 +++
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 ++++
>   drivers/gpu/drm/i915/i915_perf.c             |  6 +++++
>   5 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 188dee13e017..993faa213b41 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1211,7 +1211,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   {
>   	struct i915_address_space *vm = rq->context->vm;
>   	struct intel_engine_cs *engine = rq->engine;
> -	u32 base = engine->mmio_base;
> +	u32 base = engine->lri_mmio_base;
>   	u32 *cs;
>   	int i;
>   
> @@ -1223,7 +1223,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2);
> +		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;

Would a helper like MI_LOAD_REGISTER_IMM_REL(engine, n) look better?

>   
>   		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
>   		*cs++ = upper_32_bits(pd_daddr);
> @@ -1245,7 +1245,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
> +		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
> +			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
>   		for (i = GEN8_3LVL_PDPES; i--; ) {
>   			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index ec82a7ec0c8d..c88b792c1ab5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -16,6 +16,7 @@
>   #include "intel_engine_pm.h"
>   #include "intel_engine_user.h"
>   #include "intel_execlists_submission.h"
> +#include "intel_gpu_commands.h"
>   #include "intel_gt.h"
>   #include "intel_gt_requests.h"
>   #include "intel_gt_pm.h"
> @@ -223,6 +224,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
>   	return bases[i].base;
>   }
>   
> +static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
> +{
> +	if (INTEL_GEN(engine->i915) < 11)
> +		return false;
> +
> +	if (engine->class == COPY_ENGINE_CLASS)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void lri_init(struct intel_engine_cs *engine)
> +{
> +	if (i915_engine_has_relative_lri(engine)) {
> +		engine->lri_cmd_mode = MI_LRI_LRM_CS_MMIO;
> +		engine->lri_mmio_base = 0;
> +	} else {
> +		engine->lri_cmd_mode = 0;
> +		engine->lri_mmio_base = engine->mmio_base;
> +	}
> +}
> +
>   static void __sprint_engine_name(struct intel_engine_cs *engine)
>   {
>   	/*
> @@ -327,6 +350,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   	if (engine->context_size)
>   		DRIVER_CAPS(i915)->has_logical_contexts = true;
>   
> +	lri_init(engine);
> +
>   	ewma__engine_latency_init(&engine->latency);
>   	seqcount_init(&engine->stats.lock);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 93aa22680db0..86302e6d86b2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -281,6 +281,9 @@ struct intel_engine_cs {
>   	u32 context_size;
>   	u32 mmio_base;
>   
> +	u32 lri_mmio_base;
> +	u32 lri_cmd_mode;
> +
>   	/*
>   	 * Some w/a require forcewake to be held (which prevents RC6) while
>   	 * a particular engine is active. If so, we set fw_domain to which
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index 14e2ffb6c0e5..887d59897bc2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -134,6 +134,11 @@
>    *   simply ignores the register load under certain conditions.
>    * - One can actually load arbitrary many arbitrary registers: Simply issue x
>    *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> + * - Newer hardware supports engine relative addressing but older hardware does
> + *   not. This is required for hw engine load balancing. Hence the MI_LRI
> + *   instruction itself is prefixed with '__' and should only be used on
> + *   legacy hardware code paths. Generic code must always use the MI_LRI
> + *   and i915_get_lri_reg() helper functions instead.

Stale comment.

>    */
>   #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
>   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 66f1f25119b5..b9cc3f0a616f 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -2118,6 +2118,11 @@ gen8_update_reg_state_unlocked(const struct intel_context *ce,
>   	u32 *reg_state = ce->lrc_reg_state;
>   	int i;
>   
> +	/*
> +	 * NB: The LRI instruction is generated by the hardware.
> +	 * Should we read it in and assert that the offset flag is set?
> +	 */
> +
>   	reg_state[ctx_oactxctrl + 1] =
>   		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
>   		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> @@ -2174,6 +2179,7 @@ gen8_load_flex(struct i915_request *rq,
>   
>   	*cs++ = MI_LOAD_REGISTER_IMM(count);
>   	do {
> +		/* FIXME: Is this table LRI remap/offset friendly? */
>   		*cs++ = i915_mmio_reg_offset(flex->reg);
>   		*cs++ = flex->value;
>   	} while (flex++, --count);
> 

NB and FIXME would ideally be resolved before merging.

Regards,

Tvrtko

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 188dee13e017..993faa213b41 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1211,7 +1211,7 @@  static int emit_ppgtt_update(struct i915_request *rq, void *data)
 {
 	struct i915_address_space *vm = rq->context->vm;
 	struct intel_engine_cs *engine = rq->engine;
-	u32 base = engine->mmio_base;
+	u32 base = engine->lri_mmio_base;
 	u32 *cs;
 	int i;
 
@@ -1223,7 +1223,7 @@  static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2);
+		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
 
 		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
 		*cs++ = upper_32_bits(pd_daddr);
@@ -1245,7 +1245,8 @@  static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
+		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
+			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
 		for (i = GEN8_3LVL_PDPES; i--; ) {
 			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index ec82a7ec0c8d..c88b792c1ab5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -16,6 +16,7 @@ 
 #include "intel_engine_pm.h"
 #include "intel_engine_user.h"
 #include "intel_execlists_submission.h"
+#include "intel_gpu_commands.h"
 #include "intel_gt.h"
 #include "intel_gt_requests.h"
 #include "intel_gt_pm.h"
@@ -223,6 +224,28 @@  static u32 __engine_mmio_base(struct drm_i915_private *i915,
 	return bases[i].base;
 }
 
+static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
+{
+	if (INTEL_GEN(engine->i915) < 11)
+		return false;
+
+	if (engine->class == COPY_ENGINE_CLASS)
+		return false;
+
+	return true;
+}
+
+static void lri_init(struct intel_engine_cs *engine)
+{
+	if (i915_engine_has_relative_lri(engine)) {
+		engine->lri_cmd_mode = MI_LRI_LRM_CS_MMIO;
+		engine->lri_mmio_base = 0;
+	} else {
+		engine->lri_cmd_mode = 0;
+		engine->lri_mmio_base = engine->mmio_base;
+	}
+}
+
 static void __sprint_engine_name(struct intel_engine_cs *engine)
 {
 	/*
@@ -327,6 +350,8 @@  static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	if (engine->context_size)
 		DRIVER_CAPS(i915)->has_logical_contexts = true;
 
+	lri_init(engine);
+
 	ewma__engine_latency_init(&engine->latency);
 	seqcount_init(&engine->stats.lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 93aa22680db0..86302e6d86b2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -281,6 +281,9 @@  struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	u32 lri_mmio_base;
+	u32 lri_cmd_mode;
+
 	/*
 	 * Some w/a require forcewake to be held (which prevents RC6) while
 	 * a particular engine is active. If so, we set fw_domain to which
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 14e2ffb6c0e5..887d59897bc2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -134,6 +134,11 @@ 
  *   simply ignores the register load under certain conditions.
  * - One can actually load arbitrary many arbitrary registers: Simply issue x
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
+ * - Newer hardware supports engine relative addressing but older hardware does
+ *   not. This is required for hw engine load balancing. Hence the MI_LRI
+ *   instruction itself is prefixed with '__' and should only be used on
+ *   legacy hardware code paths. Generic code must always use the MI_LRI
+ *   and i915_get_lri_reg() helper functions instead.
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 66f1f25119b5..b9cc3f0a616f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2118,6 +2118,11 @@  gen8_update_reg_state_unlocked(const struct intel_context *ce,
 	u32 *reg_state = ce->lrc_reg_state;
 	int i;
 
+	/*
+	 * NB: The LRI instruction is generated by the hardware.
+	 * Should we read it in and assert that the offset flag is set?
+	 */
+
 	reg_state[ctx_oactxctrl + 1] =
 		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
 		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
@@ -2174,6 +2179,7 @@  gen8_load_flex(struct i915_request *rq,
 
 	*cs++ = MI_LOAD_REGISTER_IMM(count);
 	do {
+		/* FIXME: Is this table LRI remap/offset friendly? */
 		*cs++ = i915_mmio_reg_offset(flex->reg);
 		*cs++ = flex->value;
 	} while (flex++, --count);

[RFC,33/97] drm/i915: Engine relative MMIO

Commit Message

Comments

Patch