diff mbox series

drm/i915/guc: Fix detection of GuC submission in use

Message ID 20190905111631.23441-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Fix detection of GuC submission in use | expand

Commit Message

Janusz Krzysztofik Sept. 5, 2019, 11:16 a.m. UTC
The driver always assumes active GuC submission mode if it is
supported.  That's not true if GuC initialization fails for some
reason.  That may lead to kernel panics, caused e.g. by execlists
fallback submission mode incorrectly detecting GuC submission in use.

Fix it by also checking for GuC enabled status.

Fixes: 356c484822e6 ("drm/i915/uc: Add explicit DISABLED state for firmware")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Michal Wajdeczko Sept. 5, 2019, 12:08 p.m. UTC | #1
On Thu, 05 Sep 2019 13:16:31 +0200, Janusz Krzysztofik  
<janusz.krzysztofik@linux.intel.com> wrote:

> The driver always assumes active GuC submission mode if it is
> supported.  That's not true if GuC initialization fails for some
> reason.  That may lead to kernel panics, caused e.g. by execlists
> fallback submission mode incorrectly detecting GuC submission in use.
>
> Fix it by also checking for GuC enabled status.
>
> Fixes: 356c484822e6 ("drm/i915/uc: Add explicit DISABLED state for  
> firmware")
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index 527995c21196..b28bab64a280 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -51,7 +51,8 @@ static inline bool  
> intel_uc_supports_guc_submission(struct intel_uc *uc)
> static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
>  {
> -	return intel_guc_is_submission_supported(&uc->guc);
> +	return intel_guc_is_enabled(&uc->guc) &&
> +	       intel_guc_is_submission_supported(&uc->guc);

This wont fix your original problem (that btw is not possible to
repro on drm-tip) as after any GuC initialization failure we still
treat GuC as "enabled":

intel_guc_is_supported => H/W support (static)
intel_guc_is_enabled => aka not disabled by the user (config)
intel_guc_is_running => no major fw failure (runtime)

Note that we even s/intel_guc_is_enabled/intel_guc_is_running
won't help as GuC may be running but we may fail to correctly
initialize GuC submission.

Correct fix to original problem must be aligned with new GuC
submission model (coming soon) and it may look as this:

+static inline bool intel_guc_is_submission_active(struct intel_guc *guc)
+{
+	GEM_BUG_ON(guc->submission_active && !intel_guc_is_running(guc));
+	return guc->submission_active;
+}

and then

  static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
  {
-	return intel_guc_is_submission_supported(&uc->guc);
+	return intel_guc_is_submission_active(&uc->guc);
  }

We may need to revisit all uses/supports/ macros to better
reflect configuration vs runtime differences.

Thanks,
Michal
Janusz Krzysztofik Sept. 5, 2019, 12:34 p.m. UTC | #2
Hi MichaƂ,

On Thursday, September 5, 2019 2:08:12 PM CEST Michal Wajdeczko wrote:
> On Thu, 05 Sep 2019 13:16:31 +0200, Janusz Krzysztofik  
> <janusz.krzysztofik@linux.intel.com> wrote:
> 
> > The driver always assumes active GuC submission mode if it is
> > supported.  That's not true if GuC initialization fails for some
> > reason.  That may lead to kernel panics, caused e.g. by execlists
> > fallback submission mode incorrectly detecting GuC submission in use.
> >
> > Fix it by also checking for GuC enabled status.
> >
> > Fixes: 356c484822e6 ("drm/i915/uc: Add explicit DISABLED state for  
> > firmware")
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h  
> > b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index 527995c21196..b28bab64a280 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -51,7 +51,8 @@ static inline bool  
> > intel_uc_supports_guc_submission(struct intel_uc *uc)
> > static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
> >  {
> > -	return intel_guc_is_submission_supported(&uc->guc);
> > +	return intel_guc_is_enabled(&uc->guc) &&
> > +	       intel_guc_is_submission_supported(&uc->guc);
> 
> This wont fix your original problem (that btw is not possible to
> repro on drm-tip)

I'm not sure how you force GuC initialization to fail, mine just didn't have 
new firmware available.  On module load, the driver was starting up in 
execlists submission mode and BUG_ON( was raised from process_csb().  Running 
on a simulator, I was using current internal tree, based on current drm-tip.

> as after any GuC initialization failure we still
> treat GuC as "enabled":

My bad, I initially used intel_guc_is_running() but that interfered badly with 
module unload so I switched to intel_guc_is_enabled() and apparently didn't 
re-test if this still fixes the original issue.

> intel_guc_is_supported => H/W support (static)
> intel_guc_is_enabled => aka not disabled by the user (config)
> intel_guc_is_running => no major fw failure (runtime)
> 
> Note that we even s/intel_guc_is_enabled/intel_guc_is_running
> won't help as GuC may be running but we may fail to correctly
> initialize GuC submission.
> 
> Correct fix to original problem must be aligned with new GuC
> submission model (coming soon) and it may look as this:
> 
> +static inline bool intel_guc_is_submission_active(struct intel_guc *guc)
> +{
> +	GEM_BUG_ON(guc->submission_active && !intel_guc_is_running(guc));
> +	return guc->submission_active;
> +}
> 
> and then
> 
>   static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
>   {
> -	return intel_guc_is_submission_supported(&uc->guc);
> +	return intel_guc_is_submission_active(&uc->guc);
>   }
> 
> We may need to revisit all uses/supports/ macros to better
> reflect configuration vs runtime differences.

Definitely, or we may get in troubles like the one I experienced on module 
unload.  And that can be done in advance, I believe.

As long as the unload issue is resolved by not using 
intel_uc_uses_guc_submission() where it occurred inappropriate, using 
(intel_guc_is_running() && intel_guc_is_submission_supported()) seems a valid 
fix to me, easy to migrate to intel_guc_is_submission_active() as soon as 
available.  I'll revert back to intel_guc_is_running(), fix the module unload 
issue and resubmit to trybot, maybe it can discover more issues with that.

Thanks,
Janusz

> 
> Thanks,
> Michal
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 527995c21196..b28bab64a280 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -51,7 +51,8 @@  static inline bool intel_uc_supports_guc_submission(struct intel_uc *uc)
 
 static inline bool intel_uc_uses_guc_submission(struct intel_uc *uc)
 {
-	return intel_guc_is_submission_supported(&uc->guc);
+	return intel_guc_is_enabled(&uc->guc) &&
+	       intel_guc_is_submission_supported(&uc->guc);
 }
 
 static inline bool intel_uc_supports_huc(struct intel_uc *uc)