Message ID | 20240820215952.2290807-1-daniele.ceraolospurio@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Do not attempt to load the GSC multiple times | expand |
-----Original Message----- From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Daniele Ceraolo Spurio Sent: Tuesday, August 20, 2024 3:00 PM To: intel-gfx@lists.freedesktop.org Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>; Harrison, John C <john.c.harrison@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com> Subject: [PATCH] drm/i915: Do not attempt to load the GSC multiple times > > If the GSC FW fails to load the GSC HW hangs permanently; the only ways > to recover it are FLR or D3cold entry, with the former only being > supported on driver unload and the latter only on DGFX, for which we > don't need to load the GSC. Therefore, if GSC fails to load there is no > need to try again because the HW is stuck in the error state and the > submission to load the FW would just hang the GSCCS. > > Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS > hangs to full GT resets, which would trigger a new attempt to load the > GSC FW in the post-reset HW re-init; this issue is also fixed by not > attempting to load the GSC FW after an error. > > Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading") > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Cc: Alan Previn <alan.previn.teres.alexis@intel.com> > Cc: John Harrison <John.C.Harrison@Intel.com> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> LGTM. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> -Jonathan Cavitt > --- > drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 2 +- > drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h | 5 +++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c > index 453d855dd1de..3d3191deb0ab 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c > @@ -302,7 +302,7 @@ void intel_gsc_uc_load_start(struct intel_gsc_uc *gsc) > { > struct intel_gt *gt = gsc_uc_to_gt(gsc); > > - if (!intel_uc_fw_is_loadable(&gsc->fw)) > + if (!intel_uc_fw_is_loadable(&gsc->fw) || intel_uc_fw_is_in_error(&gsc->fw)) > return; > > if (intel_gsc_uc_fw_init_done(gsc)) > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h > index 9a431726c8d5..ac7b3aad2222 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h > @@ -258,6 +258,11 @@ static inline bool intel_uc_fw_is_running(struct intel_uc_fw *uc_fw) > return __intel_uc_fw_status(uc_fw) == INTEL_UC_FIRMWARE_RUNNING; > } > > +static inline bool intel_uc_fw_is_in_error(struct intel_uc_fw *uc_fw) > +{ > + return intel_uc_fw_status_to_error(__intel_uc_fw_status(uc_fw)) != 0; > +} > + > static inline bool intel_uc_fw_is_overridden(const struct intel_uc_fw *uc_fw) > { > return uc_fw->user_overridden; > -- > 2.43.0 > >
On 8/20/2024 3:28 PM, Cavitt, Jonathan wrote: > -----Original Message----- > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Daniele Ceraolo Spurio > Sent: Tuesday, August 20, 2024 3:00 PM > To: intel-gfx@lists.freedesktop.org > Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>; Harrison, John C <john.c.harrison@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com> > Subject: [PATCH] drm/i915: Do not attempt to load the GSC multiple times >> If the GSC FW fails to load the GSC HW hangs permanently; the only ways >> to recover it are FLR or D3cold entry, with the former only being >> supported on driver unload and the latter only on DGFX, for which we >> don't need to load the GSC. Therefore, if GSC fails to load there is no >> need to try again because the HW is stuck in the error state and the >> submission to load the FW would just hang the GSCCS. >> >> Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS >> hangs to full GT resets, which would trigger a new attempt to load the >> GSC FW in the post-reset HW re-init; this issue is also fixed by not >> attempting to load the GSC FW after an error. >> >> Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading") >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> >> Cc: John Harrison <John.C.Harrison@Intel.com> >> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > LGTM. > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > -Jonathan Cavitt Thanks! I was also missing: Cc: <stable@vger.kernel.org> # v6.3+ I'll add it when merging if CI is good. Daniele > >> --- >> drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 2 +- >> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h | 5 +++++ >> 2 files changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c >> index 453d855dd1de..3d3191deb0ab 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c >> @@ -302,7 +302,7 @@ void intel_gsc_uc_load_start(struct intel_gsc_uc *gsc) >> { >> struct intel_gt *gt = gsc_uc_to_gt(gsc); >> >> - if (!intel_uc_fw_is_loadable(&gsc->fw)) >> + if (!intel_uc_fw_is_loadable(&gsc->fw) || intel_uc_fw_is_in_error(&gsc->fw)) >> return; >> >> if (intel_gsc_uc_fw_init_done(gsc)) >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h >> index 9a431726c8d5..ac7b3aad2222 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h >> @@ -258,6 +258,11 @@ static inline bool intel_uc_fw_is_running(struct intel_uc_fw *uc_fw) >> return __intel_uc_fw_status(uc_fw) == INTEL_UC_FIRMWARE_RUNNING; >> } >> >> +static inline bool intel_uc_fw_is_in_error(struct intel_uc_fw *uc_fw) >> +{ >> + return intel_uc_fw_status_to_error(__intel_uc_fw_status(uc_fw)) != 0; >> +} >> + >> static inline bool intel_uc_fw_is_overridden(const struct intel_uc_fw *uc_fw) >> { >> return uc_fw->user_overridden; >> -- >> 2.43.0 >> >>
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c index 453d855dd1de..3d3191deb0ab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c @@ -302,7 +302,7 @@ void intel_gsc_uc_load_start(struct intel_gsc_uc *gsc) { struct intel_gt *gt = gsc_uc_to_gt(gsc); - if (!intel_uc_fw_is_loadable(&gsc->fw)) + if (!intel_uc_fw_is_loadable(&gsc->fw) || intel_uc_fw_is_in_error(&gsc->fw)) return; if (intel_gsc_uc_fw_init_done(gsc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h index 9a431726c8d5..ac7b3aad2222 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h @@ -258,6 +258,11 @@ static inline bool intel_uc_fw_is_running(struct intel_uc_fw *uc_fw) return __intel_uc_fw_status(uc_fw) == INTEL_UC_FIRMWARE_RUNNING; } +static inline bool intel_uc_fw_is_in_error(struct intel_uc_fw *uc_fw) +{ + return intel_uc_fw_status_to_error(__intel_uc_fw_status(uc_fw)) != 0; +} + static inline bool intel_uc_fw_is_overridden(const struct intel_uc_fw *uc_fw) { return uc_fw->user_overridden;
If the GSC FW fails to load the GSC HW hangs permanently; the only ways to recover it are FLR or D3cold entry, with the former only being supported on driver unload and the latter only on DGFX, for which we don't need to load the GSC. Therefore, if GSC fails to load there is no need to try again because the HW is stuck in the error state and the submission to load the FW would just hang the GSCCS. Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS hangs to full GT resets, which would trigger a new attempt to load the GSC FW in the post-reset HW re-init; this issue is also fixed by not attempting to load the GSC FW after an error. Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> --- drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-)