[2/8] drm/i915/xehp: CCS shares the render reset domain

Message ID	20210907171916.2548047-3-matthew.d.roper@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=8g7a=N5=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7BB0A6108D From: Matt Roper <matthew.d.roper@intel.com> To: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org, Matt Roper <matthew.d.roper@intel.com>, Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>, Vinay Belgaumkar <vinay.belgaumkar@intel.com>, Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>, Aravind Iddamsetty <aravind.iddamsetty@intel.com> Subject: [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain Date: Tue, 7 Sep 2021 10:19:10 -0700 Message-Id: <20210907171916.2548047-3-matthew.d.roper@intel.com> In-Reply-To: <20210907171916.2548047-1-matthew.d.roper@intel.com> References: <20210907171916.2548047-1-matthew.d.roper@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	i915: Introduce Xe_HP compute engines \| expand [0/8] i915: Introduce Xe_HP compute engines [1/8] drm/i915/xehp: Define compute class and engine [2/8] drm/i915/xehp: CCS shares the render reset domain [3/8] drm/i915/xehp: Add Compute CS IRQ handlers [4/8] drm/i915/xehp: CCS should use RCS setup functions [5/8] drm/i915/xehp: compute engine pipe_control [6/8] drm/i915/xehp: Define context scheduling attributes in lrc descriptor [7/8] drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE [8/8] drm/i915/xehp: Extend uninterruptible OpenCL workloads to CCS

Message ID

20210907171916.2548047-3-matthew.d.roper@intel.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7BB0A6108D
From: Matt Roper <matthew.d.roper@intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org, Matt Roper <matthew.d.roper@intel.com>,
 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
 Vinay Belgaumkar <vinay.belgaumkar@intel.com>,
 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
 Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Subject: [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain
Date: Tue,  7 Sep 2021 10:19:10 -0700
Message-Id: <20210907171916.2548047-3-matthew.d.roper@intel.com>
In-Reply-To: <20210907171916.2548047-1-matthew.d.roper@intel.com>
References: <20210907171916.2548047-1-matthew.d.roper@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

i915: Introduce Xe_HP compute engines | expand

Commit Message

Matt Roper Sept. 7, 2021, 5:19 p.m. UTC

The reset domain is shared between render and all compute engines,
so resetting one will affect the others.

Note:  Before performing a reset on an RCS or CCS engine, the GuC will
attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
impacting other clients (since some shared modules will be reset).  If
other engines are executing non-preemptable workloads, the impact is
unavoidable and some work may be lost.

Bspec: 52549
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Tvrtko Ursulin Sept. 8, 2021, 10:07 a.m. UTC | #1

On 07/09/2021 18:19, Matt Roper wrote:
> The reset domain is shared between render and all compute engines,
> so resetting one will affect the others.
> 
> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> impacting other clients (since some shared modules will be reset).  If
> other engines are executing non-preemptable workloads, the impact is
> unavoidable and some work may be lost.

Since here it talks about engine reset, should this patch add warning if 
  same is attempted by i915 on a GuC platform - to document it is not 
implemented/supported? Or perhaps later in the series, or future series 
works better.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Bspec: 52549
> Original-patch-by: Michel Thierry
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 91200c43951f..30598c1d070c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>   		[VECS1] = GEN11_GRDOM_VECS2,
>   		[VECS2] = GEN11_GRDOM_VECS3,
>   		[VECS3] = GEN11_GRDOM_VECS4,
> +		[CCS0] = GEN11_GRDOM_RENDER,
> +		[CCS1] = GEN11_GRDOM_RENDER,
> +		[CCS2] = GEN11_GRDOM_RENDER,
> +		[CCS3] = GEN11_GRDOM_RENDER,
>   	};
>   	struct intel_engine_cs *engine;
>   	intel_engine_mask_t tmp;
>

Daniel Vetter Sept. 8, 2021, 4:46 p.m. UTC | #2

On Tue, Sep 07, 2021 at 10:19:10AM -0700, Matt Roper wrote:
> The reset domain is shared between render and all compute engines,
> so resetting one will affect the others.
> 
> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> impacting other clients (since some shared modules will be reset).  If
> other engines are executing non-preemptable workloads, the impact is
> unavoidable and some work may be lost.
> 
> Bspec: 52549
> Original-patch-by: Michel Thierry
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

Do we have igts validating this all properly?

Specifically that the reset stats are incremented correctly for guilty
respectively victimized contexts.

This is necessary if it doesn't exist yet.

Also you need a patch set here that fixes up the igts which have wrong
assumptions about context isolation.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 91200c43951f..30598c1d070c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>  		[VECS1] = GEN11_GRDOM_VECS2,
>  		[VECS2] = GEN11_GRDOM_VECS3,
>  		[VECS3] = GEN11_GRDOM_VECS4,
> +		[CCS0] = GEN11_GRDOM_RENDER,
> +		[CCS1] = GEN11_GRDOM_RENDER,
> +		[CCS2] = GEN11_GRDOM_RENDER,
> +		[CCS3] = GEN11_GRDOM_RENDER,
>  	};
>  	struct intel_engine_cs *engine;
>  	intel_engine_mask_t tmp;
> -- 
> 2.25.4
>

Matt Roper Sept. 8, 2021, 8:23 p.m. UTC | #3

On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
> 
> On 07/09/2021 18:19, Matt Roper wrote:
> > The reset domain is shared between render and all compute engines,
> > so resetting one will affect the others.
> > 
> > Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> > impacting other clients (since some shared modules will be reset).  If
> > other engines are executing non-preemptable workloads, the impact is
> > unavoidable and some work may be lost.
> 
> Since here it talks about engine reset, should this patch add warning if
> same is attempted by i915 on a GuC platform - to document it is not

Did you mean "on a *non* GuC platform" here?  We aren't going to have
compute engine support on any platforms where GuC submission isn't the
default operating model, so the only way to get compute engines +
execlist submission is to force an override via module parameters (e.g.,
enable_guc=0).  Doing so will taint the kernel, so I think the current
consensus from offline discussion is that the user has already put
themselves into a configuration where it's easier than usual to shoot
themselves in the foot; it's not too much different than the kind of
trouble a user could get themselves into if they loaded the driver with
hangcheck disabled or something.


Matt

> implemented/supported? Or perhaps later in the series, or future series
> works better.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Regards,
> 
> Tvrtko
> 
> > Bspec: 52549
> > Original-patch-by: Michel Thierry
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 91200c43951f..30598c1d070c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
> >   		[VECS1] = GEN11_GRDOM_VECS2,
> >   		[VECS2] = GEN11_GRDOM_VECS3,
> >   		[VECS3] = GEN11_GRDOM_VECS4,
> > +		[CCS0] = GEN11_GRDOM_RENDER,
> > +		[CCS1] = GEN11_GRDOM_RENDER,
> > +		[CCS2] = GEN11_GRDOM_RENDER,
> > +		[CCS3] = GEN11_GRDOM_RENDER,
> >   	};
> >   	struct intel_engine_cs *engine;
> >   	intel_engine_mask_t tmp;
> >

Tvrtko Ursulin Sept. 9, 2021, 8:11 a.m. UTC | #4

On 08/09/2021 21:23, Matt Roper wrote:
> On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
>>
>> On 07/09/2021 18:19, Matt Roper wrote:
>>> The reset domain is shared between render and all compute engines,
>>> so resetting one will affect the others.
>>>
>>> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
>>> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
>>> impacting other clients (since some shared modules will be reset).  If
>>> other engines are executing non-preemptable workloads, the impact is
>>> unavoidable and some work may be lost.
>>
>> Since here it talks about engine reset, should this patch add warning if
>> same is attempted by i915 on a GuC platform - to document it is not
> 
> Did you mean "on a *non* GuC platform" here?  We aren't going to have
> compute engine support on any platforms where GuC submission isn't the
> default operating model, so the only way to get compute engines +
> execlist submission is to force an override via module parameters (e.g.,
> enable_guc=0).  Doing so will taint the kernel, so I think the current
> consensus from offline discussion is that the user has already put
> themselves into a configuration where it's easier than usual to shoot
> themselves in the foot; it's not too much different than the kind of
> trouble a user could get themselves into if they loaded the driver with
> hangcheck disabled or something.

Yes I meant non GuC. :)

Okay..ish, although I think an explicit warn would still be better. 
Because it is one thing to taint and another to actively allow something 
which we know cannot work.

Unless we could hide the CCS engine until GuC gets loaded, which would 
make i915.enable_guc=0 safe. Hm.. should be doable actually to skip 
intel_engine_add_user in the engine init phase and do the CCS ones after 
GuC has been loaded. Would that make sense?

Regards,

Tvrtko

>> implemented/supported? Or perhaps later in the series, or future series
>> works better.
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Bspec: 52549
>>> Original-patch-by: Michel Thierry
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
>>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> index 91200c43951f..30598c1d070c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>>>    		[VECS1] = GEN11_GRDOM_VECS2,
>>>    		[VECS2] = GEN11_GRDOM_VECS3,
>>>    		[VECS3] = GEN11_GRDOM_VECS4,
>>> +		[CCS0] = GEN11_GRDOM_RENDER,
>>> +		[CCS1] = GEN11_GRDOM_RENDER,
>>> +		[CCS2] = GEN11_GRDOM_RENDER,
>>> +		[CCS3] = GEN11_GRDOM_RENDER,
>>>    	};
>>>    	struct intel_engine_cs *engine;
>>>    	intel_engine_mask_t tmp;
>>>
>

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 91200c43951f..30598c1d070c 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -507,6 +507,10 @@  static int gen11_reset_engines(struct intel_gt *gt,
 		[VECS1] = GEN11_GRDOM_VECS2,
 		[VECS2] = GEN11_GRDOM_VECS3,
 		[VECS3] = GEN11_GRDOM_VECS4,
+		[CCS0] = GEN11_GRDOM_RENDER,
+		[CCS1] = GEN11_GRDOM_RENDER,
+		[CCS2] = GEN11_GRDOM_RENDER,
+		[CCS3] = GEN11_GRDOM_RENDER,
 	};
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t tmp;

[2/8] drm/i915/xehp: CCS shares the render reset domain

Commit Message

Comments

Patch