diff mbox series

drm/i915: Do not attempt to load the GSC multiple times

Message ID 20240820215952.2290807-1-daniele.ceraolospurio@intel.com (mailing list archive)
State New
Headers show
Series drm/i915: Do not attempt to load the GSC multiple times | expand

Commit Message

Daniele Ceraolo Spurio Aug. 20, 2024, 9:59 p.m. UTC
If the GSC FW fails to load the GSC HW hangs permanently; the only ways
to recover it are FLR or D3cold entry, with the former only being
supported on driver unload and the latter only on DGFX, for which we
don't need to load the GSC. Therefore, if GSC fails to load there is no
need to try again because the HW is stuck in the error state and the
submission to load the FW would just hang the GSCCS.

Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS
hangs to full GT resets, which would trigger a new attempt to load the
GSC FW in the post-reset HW re-init; this issue is also fixed by not
attempting to load the GSC FW after an error.

Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h  | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Cavitt, Jonathan Aug. 20, 2024, 10:28 p.m. UTC | #1
-----Original Message-----
From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Daniele Ceraolo Spurio
Sent: Tuesday, August 20, 2024 3:00 PM
To: intel-gfx@lists.freedesktop.org
Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>; Harrison, John C <john.c.harrison@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>
Subject: [PATCH] drm/i915: Do not attempt to load the GSC multiple times
> 
> If the GSC FW fails to load the GSC HW hangs permanently; the only ways
> to recover it are FLR or D3cold entry, with the former only being
> supported on driver unload and the latter only on DGFX, for which we
> don't need to load the GSC. Therefore, if GSC fails to load there is no
> need to try again because the HW is stuck in the error state and the
> submission to load the FW would just hang the GSCCS.
> 
> Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS
> hangs to full GT resets, which would trigger a new attempt to load the
> GSC FW in the post-reset HW re-init; this issue is also fixed by not
> attempting to load the GSC FW after an error.
> 
> Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

LGTM.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
-Jonathan Cavitt

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 2 +-
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h  | 5 +++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
> index 453d855dd1de..3d3191deb0ab 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
> @@ -302,7 +302,7 @@ void intel_gsc_uc_load_start(struct intel_gsc_uc *gsc)
>  {
>  	struct intel_gt *gt = gsc_uc_to_gt(gsc);
>  
> -	if (!intel_uc_fw_is_loadable(&gsc->fw))
> +	if (!intel_uc_fw_is_loadable(&gsc->fw) || intel_uc_fw_is_in_error(&gsc->fw))
>  		return;
>  
>  	if (intel_gsc_uc_fw_init_done(gsc))
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
> index 9a431726c8d5..ac7b3aad2222 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
> @@ -258,6 +258,11 @@ static inline bool intel_uc_fw_is_running(struct intel_uc_fw *uc_fw)
>  	return __intel_uc_fw_status(uc_fw) == INTEL_UC_FIRMWARE_RUNNING;
>  }
>  
> +static inline bool intel_uc_fw_is_in_error(struct intel_uc_fw *uc_fw)
> +{
> +	return intel_uc_fw_status_to_error(__intel_uc_fw_status(uc_fw)) != 0;
> +}
> +
>  static inline bool intel_uc_fw_is_overridden(const struct intel_uc_fw *uc_fw)
>  {
>  	return uc_fw->user_overridden;
> -- 
> 2.43.0
> 
>
Daniele Ceraolo Spurio Aug. 21, 2024, 12:02 a.m. UTC | #2
On 8/20/2024 3:28 PM, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Daniele Ceraolo Spurio
> Sent: Tuesday, August 20, 2024 3:00 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>; Harrison, John C <john.c.harrison@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Subject: [PATCH] drm/i915: Do not attempt to load the GSC multiple times
>> If the GSC FW fails to load the GSC HW hangs permanently; the only ways
>> to recover it are FLR or D3cold entry, with the former only being
>> supported on driver unload and the latter only on DGFX, for which we
>> don't need to load the GSC. Therefore, if GSC fails to load there is no
>> need to try again because the HW is stuck in the error state and the
>> submission to load the FW would just hang the GSCCS.
>>
>> Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS
>> hangs to full GT resets, which would trigger a new attempt to load the
>> GSC FW in the post-reset HW re-init; this issue is also fixed by not
>> attempting to load the GSC FW after an error.
>>
>> Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading")
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> LGTM.
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> -Jonathan Cavitt

Thanks!

I was also missing:
Cc: <stable@vger.kernel.org> # v6.3+

I'll add it when merging if CI is good.

Daniele

>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 2 +-
>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h  | 5 +++++
>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
>> index 453d855dd1de..3d3191deb0ab 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
>> @@ -302,7 +302,7 @@ void intel_gsc_uc_load_start(struct intel_gsc_uc *gsc)
>>   {
>>   	struct intel_gt *gt = gsc_uc_to_gt(gsc);
>>   
>> -	if (!intel_uc_fw_is_loadable(&gsc->fw))
>> +	if (!intel_uc_fw_is_loadable(&gsc->fw) || intel_uc_fw_is_in_error(&gsc->fw))
>>   		return;
>>   
>>   	if (intel_gsc_uc_fw_init_done(gsc))
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
>> index 9a431726c8d5..ac7b3aad2222 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
>> @@ -258,6 +258,11 @@ static inline bool intel_uc_fw_is_running(struct intel_uc_fw *uc_fw)
>>   	return __intel_uc_fw_status(uc_fw) == INTEL_UC_FIRMWARE_RUNNING;
>>   }
>>   
>> +static inline bool intel_uc_fw_is_in_error(struct intel_uc_fw *uc_fw)
>> +{
>> +	return intel_uc_fw_status_to_error(__intel_uc_fw_status(uc_fw)) != 0;
>> +}
>> +
>>   static inline bool intel_uc_fw_is_overridden(const struct intel_uc_fw *uc_fw)
>>   {
>>   	return uc_fw->user_overridden;
>> -- 
>> 2.43.0
>>
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
index 453d855dd1de..3d3191deb0ab 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c
@@ -302,7 +302,7 @@  void intel_gsc_uc_load_start(struct intel_gsc_uc *gsc)
 {
 	struct intel_gt *gt = gsc_uc_to_gt(gsc);
 
-	if (!intel_uc_fw_is_loadable(&gsc->fw))
+	if (!intel_uc_fw_is_loadable(&gsc->fw) || intel_uc_fw_is_in_error(&gsc->fw))
 		return;
 
 	if (intel_gsc_uc_fw_init_done(gsc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
index 9a431726c8d5..ac7b3aad2222 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
@@ -258,6 +258,11 @@  static inline bool intel_uc_fw_is_running(struct intel_uc_fw *uc_fw)
 	return __intel_uc_fw_status(uc_fw) == INTEL_UC_FIRMWARE_RUNNING;
 }
 
+static inline bool intel_uc_fw_is_in_error(struct intel_uc_fw *uc_fw)
+{
+	return intel_uc_fw_status_to_error(__intel_uc_fw_status(uc_fw)) != 0;
+}
+
 static inline bool intel_uc_fw_is_overridden(const struct intel_uc_fw *uc_fw)
 {
 	return uc_fw->user_overridden;