diff mbox series

[2/8] drm/i915/xehp: CCS shares the render reset domain

Message ID 20210907171916.2548047-3-matthew.d.roper@intel.com (mailing list archive)
State New, archived
Headers show
Series i915: Introduce Xe_HP compute engines | expand

Commit Message

Matt Roper Sept. 7, 2021, 5:19 p.m. UTC
The reset domain is shared between render and all compute engines,
so resetting one will affect the others.

Note:  Before performing a reset on an RCS or CCS engine, the GuC will
attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
impacting other clients (since some shared modules will be reset).  If
other engines are executing non-preemptable workloads, the impact is
unavoidable and some work may be lost.

Bspec: 52549
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Tvrtko Ursulin Sept. 8, 2021, 10:07 a.m. UTC | #1
On 07/09/2021 18:19, Matt Roper wrote:
> The reset domain is shared between render and all compute engines,
> so resetting one will affect the others.
> 
> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> impacting other clients (since some shared modules will be reset).  If
> other engines are executing non-preemptable workloads, the impact is
> unavoidable and some work may be lost.

Since here it talks about engine reset, should this patch add warning if 
  same is attempted by i915 on a GuC platform - to document it is not 
implemented/supported? Or perhaps later in the series, or future series 
works better.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Bspec: 52549
> Original-patch-by: Michel Thierry
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 91200c43951f..30598c1d070c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>   		[VECS1] = GEN11_GRDOM_VECS2,
>   		[VECS2] = GEN11_GRDOM_VECS3,
>   		[VECS3] = GEN11_GRDOM_VECS4,
> +		[CCS0] = GEN11_GRDOM_RENDER,
> +		[CCS1] = GEN11_GRDOM_RENDER,
> +		[CCS2] = GEN11_GRDOM_RENDER,
> +		[CCS3] = GEN11_GRDOM_RENDER,
>   	};
>   	struct intel_engine_cs *engine;
>   	intel_engine_mask_t tmp;
>
Daniel Vetter Sept. 8, 2021, 4:46 p.m. UTC | #2
On Tue, Sep 07, 2021 at 10:19:10AM -0700, Matt Roper wrote:
> The reset domain is shared between render and all compute engines,
> so resetting one will affect the others.
> 
> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> impacting other clients (since some shared modules will be reset).  If
> other engines are executing non-preemptable workloads, the impact is
> unavoidable and some work may be lost.
> 
> Bspec: 52549
> Original-patch-by: Michel Thierry
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

Do we have igts validating this all properly?

Specifically that the reset stats are incremented correctly for guilty
respectively victimized contexts.

This is necessary if it doesn't exist yet.

Also you need a patch set here that fixes up the igts which have wrong
assumptions about context isolation.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 91200c43951f..30598c1d070c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>  		[VECS1] = GEN11_GRDOM_VECS2,
>  		[VECS2] = GEN11_GRDOM_VECS3,
>  		[VECS3] = GEN11_GRDOM_VECS4,
> +		[CCS0] = GEN11_GRDOM_RENDER,
> +		[CCS1] = GEN11_GRDOM_RENDER,
> +		[CCS2] = GEN11_GRDOM_RENDER,
> +		[CCS3] = GEN11_GRDOM_RENDER,
>  	};
>  	struct intel_engine_cs *engine;
>  	intel_engine_mask_t tmp;
> -- 
> 2.25.4
>
Matt Roper Sept. 8, 2021, 8:23 p.m. UTC | #3
On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
> 
> On 07/09/2021 18:19, Matt Roper wrote:
> > The reset domain is shared between render and all compute engines,
> > so resetting one will affect the others.
> > 
> > Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> > impacting other clients (since some shared modules will be reset).  If
> > other engines are executing non-preemptable workloads, the impact is
> > unavoidable and some work may be lost.
> 
> Since here it talks about engine reset, should this patch add warning if
> same is attempted by i915 on a GuC platform - to document it is not

Did you mean "on a *non* GuC platform" here?  We aren't going to have
compute engine support on any platforms where GuC submission isn't the
default operating model, so the only way to get compute engines +
execlist submission is to force an override via module parameters (e.g.,
enable_guc=0).  Doing so will taint the kernel, so I think the current
consensus from offline discussion is that the user has already put
themselves into a configuration where it's easier than usual to shoot
themselves in the foot; it's not too much different than the kind of
trouble a user could get themselves into if they loaded the driver with
hangcheck disabled or something.


Matt

> implemented/supported? Or perhaps later in the series, or future series
> works better.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Regards,
> 
> Tvrtko
> 
> > Bspec: 52549
> > Original-patch-by: Michel Thierry
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 91200c43951f..30598c1d070c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
> >   		[VECS1] = GEN11_GRDOM_VECS2,
> >   		[VECS2] = GEN11_GRDOM_VECS3,
> >   		[VECS3] = GEN11_GRDOM_VECS4,
> > +		[CCS0] = GEN11_GRDOM_RENDER,
> > +		[CCS1] = GEN11_GRDOM_RENDER,
> > +		[CCS2] = GEN11_GRDOM_RENDER,
> > +		[CCS3] = GEN11_GRDOM_RENDER,
> >   	};
> >   	struct intel_engine_cs *engine;
> >   	intel_engine_mask_t tmp;
> >
Tvrtko Ursulin Sept. 9, 2021, 8:11 a.m. UTC | #4
On 08/09/2021 21:23, Matt Roper wrote:
> On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
>>
>> On 07/09/2021 18:19, Matt Roper wrote:
>>> The reset domain is shared between render and all compute engines,
>>> so resetting one will affect the others.
>>>
>>> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
>>> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
>>> impacting other clients (since some shared modules will be reset).  If
>>> other engines are executing non-preemptable workloads, the impact is
>>> unavoidable and some work may be lost.
>>
>> Since here it talks about engine reset, should this patch add warning if
>> same is attempted by i915 on a GuC platform - to document it is not
> 
> Did you mean "on a *non* GuC platform" here?  We aren't going to have
> compute engine support on any platforms where GuC submission isn't the
> default operating model, so the only way to get compute engines +
> execlist submission is to force an override via module parameters (e.g.,
> enable_guc=0).  Doing so will taint the kernel, so I think the current
> consensus from offline discussion is that the user has already put
> themselves into a configuration where it's easier than usual to shoot
> themselves in the foot; it's not too much different than the kind of
> trouble a user could get themselves into if they loaded the driver with
> hangcheck disabled or something.

Yes I meant non GuC. :)

Okay..ish, although I think an explicit warn would still be better. 
Because it is one thing to taint and another to actively allow something 
which we know cannot work.

Unless we could hide the CCS engine until GuC gets loaded, which would 
make i915.enable_guc=0 safe. Hm.. should be doable actually to skip 
intel_engine_add_user in the engine init phase and do the CCS ones after 
GuC has been loaded. Would that make sense?

Regards,

Tvrtko

>> implemented/supported? Or perhaps later in the series, or future series
>> works better.
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Bspec: 52549
>>> Original-patch-by: Michel Thierry
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
>>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> index 91200c43951f..30598c1d070c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>>>    		[VECS1] = GEN11_GRDOM_VECS2,
>>>    		[VECS2] = GEN11_GRDOM_VECS3,
>>>    		[VECS3] = GEN11_GRDOM_VECS4,
>>> +		[CCS0] = GEN11_GRDOM_RENDER,
>>> +		[CCS1] = GEN11_GRDOM_RENDER,
>>> +		[CCS2] = GEN11_GRDOM_RENDER,
>>> +		[CCS3] = GEN11_GRDOM_RENDER,
>>>    	};
>>>    	struct intel_engine_cs *engine;
>>>    	intel_engine_mask_t tmp;
>>>
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 91200c43951f..30598c1d070c 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -507,6 +507,10 @@  static int gen11_reset_engines(struct intel_gt *gt,
 		[VECS1] = GEN11_GRDOM_VECS2,
 		[VECS2] = GEN11_GRDOM_VECS3,
 		[VECS3] = GEN11_GRDOM_VECS4,
+		[CCS0] = GEN11_GRDOM_RENDER,
+		[CCS1] = GEN11_GRDOM_RENDER,
+		[CCS2] = GEN11_GRDOM_RENDER,
+		[CCS3] = GEN11_GRDOM_RENDER,
 	};
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t tmp;