Message ID | 20210907171916.2548047-3-matthew.d.roper@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | i915: Introduce Xe_HP compute engines | expand |
On 07/09/2021 18:19, Matt Roper wrote: > The reset domain is shared between render and all compute engines, > so resetting one will affect the others. > > Note: Before performing a reset on an RCS or CCS engine, the GuC will > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid > impacting other clients (since some shared modules will be reset). If > other engines are executing non-preemptable workloads, the impact is > unavoidable and some work may be lost. Since here it talks about engine reset, should this patch add warning if same is attempted by i915 on a GuC platform - to document it is not implemented/supported? Or perhaps later in the series, or future series works better. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko > Bspec: 52549 > Original-patch-by: Michel Thierry > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index 91200c43951f..30598c1d070c 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, > [VECS1] = GEN11_GRDOM_VECS2, > [VECS2] = GEN11_GRDOM_VECS3, > [VECS3] = GEN11_GRDOM_VECS4, > + [CCS0] = GEN11_GRDOM_RENDER, > + [CCS1] = GEN11_GRDOM_RENDER, > + [CCS2] = GEN11_GRDOM_RENDER, > + [CCS3] = GEN11_GRDOM_RENDER, > }; > struct intel_engine_cs *engine; > intel_engine_mask_t tmp; >
On Tue, Sep 07, 2021 at 10:19:10AM -0700, Matt Roper wrote: > The reset domain is shared between render and all compute engines, > so resetting one will affect the others. > > Note: Before performing a reset on an RCS or CCS engine, the GuC will > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid > impacting other clients (since some shared modules will be reset). If > other engines are executing non-preemptable workloads, the impact is > unavoidable and some work may be lost. > > Bspec: 52549 > Original-patch-by: Michel Thierry > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Do we have igts validating this all properly? Specifically that the reset stats are incremented correctly for guilty respectively victimized contexts. This is necessary if it doesn't exist yet. Also you need a patch set here that fixes up the igts which have wrong assumptions about context isolation. -Daniel > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index 91200c43951f..30598c1d070c 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, > [VECS1] = GEN11_GRDOM_VECS2, > [VECS2] = GEN11_GRDOM_VECS3, > [VECS3] = GEN11_GRDOM_VECS4, > + [CCS0] = GEN11_GRDOM_RENDER, > + [CCS1] = GEN11_GRDOM_RENDER, > + [CCS2] = GEN11_GRDOM_RENDER, > + [CCS3] = GEN11_GRDOM_RENDER, > }; > struct intel_engine_cs *engine; > intel_engine_mask_t tmp; > -- > 2.25.4 >
On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote: > > On 07/09/2021 18:19, Matt Roper wrote: > > The reset domain is shared between render and all compute engines, > > so resetting one will affect the others. > > > > Note: Before performing a reset on an RCS or CCS engine, the GuC will > > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid > > impacting other clients (since some shared modules will be reset). If > > other engines are executing non-preemptable workloads, the impact is > > unavoidable and some work may be lost. > > Since here it talks about engine reset, should this patch add warning if > same is attempted by i915 on a GuC platform - to document it is not Did you mean "on a *non* GuC platform" here? We aren't going to have compute engine support on any platforms where GuC submission isn't the default operating model, so the only way to get compute engines + execlist submission is to force an override via module parameters (e.g., enable_guc=0). Doing so will taint the kernel, so I think the current consensus from offline discussion is that the user has already put themselves into a configuration where it's easier than usual to shoot themselves in the foot; it's not too much different than the kind of trouble a user could get themselves into if they loaded the driver with hangcheck disabled or something. Matt > implemented/supported? Or perhaps later in the series, or future series > works better. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Regards, > > Tvrtko > > > Bspec: 52549 > > Original-patch-by: Michel Thierry > > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > > Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> > > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> > > Signed-off-by: Matt Roper <matthew.d.roper@intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > > index 91200c43951f..30598c1d070c 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, > > [VECS1] = GEN11_GRDOM_VECS2, > > [VECS2] = GEN11_GRDOM_VECS3, > > [VECS3] = GEN11_GRDOM_VECS4, > > + [CCS0] = GEN11_GRDOM_RENDER, > > + [CCS1] = GEN11_GRDOM_RENDER, > > + [CCS2] = GEN11_GRDOM_RENDER, > > + [CCS3] = GEN11_GRDOM_RENDER, > > }; > > struct intel_engine_cs *engine; > > intel_engine_mask_t tmp; > >
On 08/09/2021 21:23, Matt Roper wrote: > On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote: >> >> On 07/09/2021 18:19, Matt Roper wrote: >>> The reset domain is shared between render and all compute engines, >>> so resetting one will affect the others. >>> >>> Note: Before performing a reset on an RCS or CCS engine, the GuC will >>> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid >>> impacting other clients (since some shared modules will be reset). If >>> other engines are executing non-preemptable workloads, the impact is >>> unavoidable and some work may be lost. >> >> Since here it talks about engine reset, should this patch add warning if >> same is attempted by i915 on a GuC platform - to document it is not > > Did you mean "on a *non* GuC platform" here? We aren't going to have > compute engine support on any platforms where GuC submission isn't the > default operating model, so the only way to get compute engines + > execlist submission is to force an override via module parameters (e.g., > enable_guc=0). Doing so will taint the kernel, so I think the current > consensus from offline discussion is that the user has already put > themselves into a configuration where it's easier than usual to shoot > themselves in the foot; it's not too much different than the kind of > trouble a user could get themselves into if they loaded the driver with > hangcheck disabled or something. Yes I meant non GuC. :) Okay..ish, although I think an explicit warn would still be better. Because it is one thing to taint and another to actively allow something which we know cannot work. Unless we could hide the CCS engine until GuC gets loaded, which would make i915.enable_guc=0 safe. Hm.. should be doable actually to skip intel_engine_add_user in the engine init phase and do the CCS ones after GuC has been loaded. Would that make sense? Regards, Tvrtko >> implemented/supported? Or perhaps later in the series, or future series >> works better. >> >> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Regards, >> >> Tvrtko >> >>> Bspec: 52549 >>> Original-patch-by: Michel Thierry >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> >>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> >>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c >>> index 91200c43951f..30598c1d070c 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c >>> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, >>> [VECS1] = GEN11_GRDOM_VECS2, >>> [VECS2] = GEN11_GRDOM_VECS3, >>> [VECS3] = GEN11_GRDOM_VECS4, >>> + [CCS0] = GEN11_GRDOM_RENDER, >>> + [CCS1] = GEN11_GRDOM_RENDER, >>> + [CCS2] = GEN11_GRDOM_RENDER, >>> + [CCS3] = GEN11_GRDOM_RENDER, >>> }; >>> struct intel_engine_cs *engine; >>> intel_engine_mask_t tmp; >>> >
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4, + [CCS0] = GEN11_GRDOM_RENDER, + [CCS1] = GEN11_GRDOM_RENDER, + [CCS2] = GEN11_GRDOM_RENDER, + [CCS3] = GEN11_GRDOM_RENDER, }; struct intel_engine_cs *engine; intel_engine_mask_t tmp;