drm/i915/huc: Add more errors for I915_PARAM_HUC_STATUS
diff mbox series

Message ID 20200330113302.1670-1-michal.wajdeczko@intel.com
State New
Headers show
Series
  • drm/i915/huc: Add more errors for I915_PARAM_HUC_STATUS
Related show

Commit Message

Michal Wajdeczko March 30, 2020, 11:33 a.m. UTC
There might be many reasons why we failed to successfully
load and authenticate HuC firmware, but today we only use
single error in case of no HuC hardware. Add some more
error codes for most common cases (disabled, not installed,
corrupted or mismatched firmware).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: Robert M. Fosha <robert.m.fosha@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_huc.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

Comments

Chris Wilson March 30, 2020, 12:28 p.m. UTC | #1
Quoting Michal Wajdeczko (2020-03-30 12:33:02)
> There might be many reasons why we failed to successfully
> load and authenticate HuC firmware, but today we only use
> single error in case of no HuC hardware. Add some more
> error codes for most common cases (disabled, not installed,
> corrupted or mismatched firmware).
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Robert M. Fosha <robert.m.fosha@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index d6097b46600c..1e8073ec343f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -200,9 +200,13 @@ int intel_huc_auth(struct intel_huc *huc)
>   * This function reads status register to verify if HuC
>   * firmware was successfully loaded.
>   *
> - * Returns: 1 if HuC firmware is loaded and verified,
> - * 0 if HuC firmware is not loaded and -ENODEV if HuC
> - * is not present on this platform.
> + * Returns:
> + *  * -ENODEV if HuC is not present on this platform,
> + *  * -EOPNOTSUPP if HuC firmware is disabled,
> + *  * -ENOPKG if HuC firmware was not installed,
> + *  * -ENOEXEC if HuC firmware is invalid or mismatched,
> + *  * 0 if HuC firmware is not running,
> + *  * 1 if HuC firmware is authenticated and running.
>   */
>  int intel_huc_check_status(struct intel_huc *huc)
>  {
> @@ -210,8 +214,18 @@ int intel_huc_check_status(struct intel_huc *huc)
>         intel_wakeref_t wakeref;
>         u32 status = 0;
>  
> -       if (!intel_huc_is_supported(huc))
> +       switch (__intel_uc_fw_status(&huc->fw)) {
> +       case INTEL_UC_FIRMWARE_NOT_SUPPORTED:
>                 return -ENODEV;

No HW support.

> +       case INTEL_UC_FIRMWARE_DISABLED:
> +               return -EOPNOTSUPP;

Override by user [sysadmin]

> +       case INTEL_UC_FIRMWARE_MISSING:
> +               return -ENOPKG;

FILENOTFOUND.

> +       case INTEL_UC_FIRMWARE_ERROR:
> +               return -ENOEXEC;

File corruption.

There's nothing else between us loading the fw and the huc rejecting
it?

FIRMWARE_FAIL? That's set as the opposite of FIRMWARE_TRANSFERRED in
that we failed to upload the image to the HW. The firmware itself hasn't
had a chance to run.

case INTEL_UC_FIRMWARE_FAIL:
	return -ENXIO;

Or is that being overridden to FIRMWARE_ERROR?

Other than the question of whether there's one more step before the fw
is being run [and then able to set HUC_STATUS as it determines for
itself],

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Michal Wajdeczko March 30, 2020, 2:02 p.m. UTC | #2
On 30.03.2020 14:28, Chris Wilson wrote:
> Quoting Michal Wajdeczko (2020-03-30 12:33:02)
>> There might be many reasons why we failed to successfully
>> load and authenticate HuC firmware, but today we only use
>> single error in case of no HuC hardware. Add some more
>> error codes for most common cases (disabled, not installed,
>> corrupted or mismatched firmware).
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: Robert M. Fosha <robert.m.fosha@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 22 ++++++++++++++++++----
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> index d6097b46600c..1e8073ec343f 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> @@ -200,9 +200,13 @@ int intel_huc_auth(struct intel_huc *huc)
>>   * This function reads status register to verify if HuC
>>   * firmware was successfully loaded.
>>   *
>> - * Returns: 1 if HuC firmware is loaded and verified,
>> - * 0 if HuC firmware is not loaded and -ENODEV if HuC
>> - * is not present on this platform.
>> + * Returns:
>> + *  * -ENODEV if HuC is not present on this platform,
>> + *  * -EOPNOTSUPP if HuC firmware is disabled,
>> + *  * -ENOPKG if HuC firmware was not installed,
>> + *  * -ENOEXEC if HuC firmware is invalid or mismatched,
>> + *  * 0 if HuC firmware is not running,
>> + *  * 1 if HuC firmware is authenticated and running.
>>   */
>>  int intel_huc_check_status(struct intel_huc *huc)
>>  {
>> @@ -210,8 +214,18 @@ int intel_huc_check_status(struct intel_huc *huc)
>>         intel_wakeref_t wakeref;
>>         u32 status = 0;
>>  
>> -       if (!intel_huc_is_supported(huc))
>> +       switch (__intel_uc_fw_status(&huc->fw)) {
>> +       case INTEL_UC_FIRMWARE_NOT_SUPPORTED:
>>                 return -ENODEV;
> 
> No HW support.
> 
>> +       case INTEL_UC_FIRMWARE_DISABLED:
>> +               return -EOPNOTSUPP;
> 
> Override by user [sysadmin]
> 
>> +       case INTEL_UC_FIRMWARE_MISSING:
>> +               return -ENOPKG;
> 
> FILENOTFOUND.
> 
>> +       case INTEL_UC_FIRMWARE_ERROR:
>> +               return -ENOEXEC;
> 
> File corruption.
> 
> There's nothing else between us loading the fw and the huc rejecting
> it?
> 
> FIRMWARE_FAIL? That's set as the opposite of FIRMWARE_TRANSFERRED in
> that we failed to upload the image to the HW. The firmware itself hasn't
> had a chance to run.
> 
> case INTEL_UC_FIRMWARE_FAIL:
> 	return -ENXIO;
> 
> Or is that being overridden to FIRMWARE_ERROR?

No, it's not overridden by FIRMWARE_ERROR (since we use FIRMWARE_ERROR
as final state, while with FIRMWARE_FAIL there is a chance for recovery
during reset)

Also note that FIRMWARE_FAIL case is covered by the register check that
we have below, which provides HuC runtime status.

And if we decide to use FIRMWARE_FAIL to report -ENXIO, then it is
unlikely that we will ever report 0 again for any other fw error that
could prevent fw from successful load (now recall your and Joonas
position that this param shall stay as reflection of register read).

Michal

ps. on other hand, if we trust our uc_fw_status() then we can drop that
register read, finally decouple GET_PARAM from MMIO_READ and fully rely
on cached status:

case INTEL_UC_FIRMWARE_RUNNING:
	return 1;
default:
	return 0;

see [1] for my earlier attempt, before uc_fw.status was added

[1] https://patchwork.freedesktop.org/patch/306179/?series=60928&rev=1

> 
> Other than the question of whether there's one more step before the fw
> is being run [and then able to set HUC_STATUS as it determines for
> itself],
> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>
Chris Wilson March 30, 2020, 2:12 p.m. UTC | #3
Quoting Michal Wajdeczko (2020-03-30 15:02:53)
> 
> 
> On 30.03.2020 14:28, Chris Wilson wrote:
> > There's nothing else between us loading the fw and the huc rejecting
> > it?
> > 
> > FIRMWARE_FAIL? That's set as the opposite of FIRMWARE_TRANSFERRED in
> > that we failed to upload the image to the HW. The firmware itself hasn't
> > had a chance to run.
> > 
> > case INTEL_UC_FIRMWARE_FAIL:
> >       return -ENXIO;
> > 
> > Or is that being overridden to FIRMWARE_ERROR?
> 
> No, it's not overridden by FIRMWARE_ERROR (since we use FIRMWARE_ERROR
> as final state, while with FIRMWARE_FAIL there is a chance for recovery
> during reset)
> 
> Also note that FIRMWARE_FAIL case is covered by the register check that
> we have below, which provides HuC runtime status.

Yes, if it only reports on the auth failure.

> And if we decide to use FIRMWARE_FAIL to report -ENXIO, then it is
> unlikely that we will ever report 0 again for any other fw error that
> could prevent fw from successful load (now recall your and Joonas
> position that this param shall stay as reflection of register read).
> 
> Michal
> 
> ps. on other hand, if we trust our uc_fw_status() then we can drop that
> register read, finally decouple GET_PARAM from MMIO_READ and fully rely
> on cached status:

imo, that register read is the icing on the cake. We can tell whether
the FW got to the HW, but we can't tell if the HW was truly happy with
the FW without asking it.

I look at it as exposing an interface for the final capability bits to
userspace that the kernel does not have to understand, that go above and
beyond the kernel loading the firmware and confirming execution.
-Chris
Michal Wajdeczko March 30, 2020, 2:41 p.m. UTC | #4
On 30.03.2020 16:12, Chris Wilson wrote:
> Quoting Michal Wajdeczko (2020-03-30 15:02:53)
>>
>>
>> On 30.03.2020 14:28, Chris Wilson wrote:
>>> There's nothing else between us loading the fw and the huc rejecting
>>> it?
>>>
>>> FIRMWARE_FAIL? That's set as the opposite of FIRMWARE_TRANSFERRED in
>>> that we failed to upload the image to the HW. The firmware itself hasn't
>>> had a chance to run.
>>>
>>> case INTEL_UC_FIRMWARE_FAIL:
>>>       return -ENXIO;
>>>
>>> Or is that being overridden to FIRMWARE_ERROR?
>>
>> No, it's not overridden by FIRMWARE_ERROR (since we use FIRMWARE_ERROR
>> as final state, while with FIRMWARE_FAIL there is a chance for recovery
>> during reset)
>>
>> Also note that FIRMWARE_FAIL case is covered by the register check that
>> we have below, which provides HuC runtime status.
> 
> Yes, if it only reports on the auth failure.
> 
>> And if we decide to use FIRMWARE_FAIL to report -ENXIO, then it is
>> unlikely that we will ever report 0 again for any other fw error that
>> could prevent fw from successful load (now recall your and Joonas
>> position that this param shall stay as reflection of register read).
>>
>> Michal
>>
>> ps. on other hand, if we trust our uc_fw_status() then we can drop that
>> register read, finally decouple GET_PARAM from MMIO_READ and fully rely
>> on cached status:
> 
> imo, that register read is the icing on the cake. We can tell whether
> the FW got to the HW, but we can't tell if the HW was truly happy with
> the FW without asking it.
> 
> I look at it as exposing an interface for the final capability bits to
> userspace that the kernel does not have to understand, that go above and
> beyond the kernel loading the firmware and confirming execution.

note that kernel already asked HW in intel_huc_auth() for FW status and
based on that info changed our cached fw status to RUNNING if and only
if HW was happy with that FW (and that shall not change until reset)

Michal

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index d6097b46600c..1e8073ec343f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -200,9 +200,13 @@  int intel_huc_auth(struct intel_huc *huc)
  * This function reads status register to verify if HuC
  * firmware was successfully loaded.
  *
- * Returns: 1 if HuC firmware is loaded and verified,
- * 0 if HuC firmware is not loaded and -ENODEV if HuC
- * is not present on this platform.
+ * Returns:
+ *  * -ENODEV if HuC is not present on this platform,
+ *  * -EOPNOTSUPP if HuC firmware is disabled,
+ *  * -ENOPKG if HuC firmware was not installed,
+ *  * -ENOEXEC if HuC firmware is invalid or mismatched,
+ *  * 0 if HuC firmware is not running,
+ *  * 1 if HuC firmware is authenticated and running.
  */
 int intel_huc_check_status(struct intel_huc *huc)
 {
@@ -210,8 +214,18 @@  int intel_huc_check_status(struct intel_huc *huc)
 	intel_wakeref_t wakeref;
 	u32 status = 0;
 
-	if (!intel_huc_is_supported(huc))
+	switch (__intel_uc_fw_status(&huc->fw)) {
+	case INTEL_UC_FIRMWARE_NOT_SUPPORTED:
 		return -ENODEV;
+	case INTEL_UC_FIRMWARE_DISABLED:
+		return -EOPNOTSUPP;
+	case INTEL_UC_FIRMWARE_MISSING:
+		return -ENOPKG;
+	case INTEL_UC_FIRMWARE_ERROR:
+		return -ENOEXEC;
+	default:
+		break;
+	}
 
 	with_intel_runtime_pm(gt->uncore->rpm, wakeref)
 		status = intel_uncore_read(gt->uncore, huc->status.reg);