diff mbox series

drm/i915/huc: Fix error reported by I915_PARAM_HUC_STATUS

Message ID 20200122194825.101240-1-michal.wajdeczko@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/huc: Fix error reported by I915_PARAM_HUC_STATUS | expand

Commit Message

Michal Wajdeczko Jan. 22, 2020, 7:48 p.m. UTC
From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
to indicate lack of HuC hardware and we started to use this error
also for all other cases when HuC was not in use or supported.

Fix that by relying again on HAS_GT_UC macro, since currently
used function intel_huc_is_supported() is based on HuC firmware
support which could be unsupported also due to force disabled
GuC firmware.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_huc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniele Ceraolo Spurio Jan. 22, 2020, 11:52 p.m. UTC | #1
On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
> in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
> to indicate lack of HuC hardware and we started to use this error
> also for all other cases when HuC was not in use or supported.
> 
> Fix that by relying again on HAS_GT_UC macro, since currently
> used function intel_huc_is_supported() is based on HuC firmware
> support which could be unsupported also due to force disabled
> GuC firmware.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Tony Ye <tony.ye@intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Do we need a fixes tag?

Daniele

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index 32a069841c14..ad4d9e16a24e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -209,7 +209,7 @@ int intel_huc_check_status(struct intel_huc *huc)
>   	intel_wakeref_t wakeref;
>   	u32 status = 0;
>   
> -	if (!intel_huc_is_supported(huc))
> +	if (!HAS_GT_UC(gt->i915))
>   		return -ENODEV;
>   
>   	with_intel_runtime_pm(gt->uncore->rpm, wakeref)
>
Michal Wajdeczko Jan. 23, 2020, 2:57 p.m. UTC | #2
On Thu, 23 Jan 2020 03:43:04 +0100, Patchwork  
<patchwork@emeril.freedesktop.org> wrote:

> == Series Details ==
>
> Series: drm/i915/huc: Fix error reported by I915_PARAM_HUC_STATUS
> URL   : https://patchwork.freedesktop.org/series/72419/
> State : failure
>
> == Summary ==
>
> CI Bug Log - changes from CI_DRM_7799 -> Patchwork_16219
> ====================================================
>
> Summary
> -------
>
>   **FAILURE**
>
>   Serious unknown changes coming with Patchwork_16219 absolutely need to  
> be
>   verified manually.
>  If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_16219, please notify your bug team to allow  
> them
>   to document this new failure mode, which will reduce false positives  
> in CI.
>
>   External URL:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16219/index.html
>
> Possible new issues
> -------------------
>
>   Here are the unknown changes that may have been introduced in  
> Patchwork_16219:
>
> ### IGT changes ###
>
> #### Possible regressions ####
>
>   * igt@gem_exec_parallel@contexts:
>     - fi-byt-j1900:       NOTRUN -> [FAIL][1]
>    [1]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16219/fi-byt-j1900/igt@gem_exec_parallel@contexts.html

no HuC here, unrelated
Chris Wilson Jan. 23, 2020, 3:02 p.m. UTC | #3
Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
> 
> 
> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
> > to indicate lack of HuC hardware and we started to use this error
> > also for all other cases when HuC was not in use or supported.
> > 
> > Fix that by relying again on HAS_GT_UC macro, since currently
> > used function intel_huc_is_supported() is based on HuC firmware
> > support which could be unsupported also due to force disabled
> > GuC firmware.
> > 
> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Tony Ye <tony.ye@intel.com>
> 
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Once upon a time did you (Michal) not argue we should indicate the lack
of firmware in the error code? Something like

if (!HAS_GT_UC(gt->i915))
	return -ENODEV;

if (!intel_huc_is_supported(huc))
	return -ENOEXEC;
Michal Wajdeczko Jan. 23, 2020, 3:38 p.m. UTC | #4
On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>
>>
>> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
>> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>> > to indicate lack of HuC hardware and we started to use this error
>> > also for all other cases when HuC was not in use or supported.
>> >
>> > Fix that by relying again on HAS_GT_UC macro, since currently
>> > used function intel_huc_is_supported() is based on HuC firmware
>> > support which could be unsupported also due to force disabled
>> > GuC firmware.
>> >
>> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> > Cc: Tony Ye <tony.ye@intel.com>
>>
>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>
> Once upon a time did you (Michal) not argue we should indicate the lack
> of firmware in the error code? Something like
>
> if (!HAS_GT_UC(gt->i915))
> 	return -ENODEV;
>
> if (!intel_huc_is_supported(huc))
> 	return -ENOEXEC;

Yes, we discussed this here [1] together with [2] but we didn't
conclude our discussion due to different opinions on how represent
some states, in particular "manually disabled" state.

In this patch I just wanted to restore old notation.

But we can start new discussion, here is summary:

------------------+----------+----------+----------
  HuC state        | today*   | option A | option B
------------------+----------+----------+----------
no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
HuC fw fail       |   0      | -EACCES  |    0
HuC authenticated |   1      |     1    |    1
------------------+----------+----------+----------

Note that all above should be compatible with media driver,
which explicitly looks for no error and value 1

Michal

[1] https://patchwork.freedesktop.org/patch/306419/?series=61001&rev=1
[2] https://patchwork.freedesktop.org/series/60800/#rev1
Chris Wilson Jan. 23, 2020, 3:51 p.m. UTC | #5
Quoting Michal Wajdeczko (2020-01-23 15:38:52)
> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
> >>
> >>
> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
> >> > to indicate lack of HuC hardware and we started to use this error
> >> > also for all other cases when HuC was not in use or supported.
> >> >
> >> > Fix that by relying again on HAS_GT_UC macro, since currently
> >> > used function intel_huc_is_supported() is based on HuC firmware
> >> > support which could be unsupported also due to force disabled
> >> > GuC firmware.
> >> >
> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> > Cc: Tony Ye <tony.ye@intel.com>
> >>
> >> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >
> > Once upon a time did you (Michal) not argue we should indicate the lack
> > of firmware in the error code? Something like
> >
> > if (!HAS_GT_UC(gt->i915))
> >       return -ENODEV;
> >
> > if (!intel_huc_is_supported(huc))
> >       return -ENOEXEC;
> 
> Yes, we discussed this here [1] together with [2] but we didn't
> conclude our discussion due to different opinions on how represent
> some states, in particular "manually disabled" state.
> 
> In this patch I just wanted to restore old notation.
> 
> But we can start new discussion, here is summary:
> 
> ------------------+----------+----------+----------
>   HuC state        | today*   | option A | option B
> ------------------+----------+----------+----------
> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
> HuC fw fail       |   0      | -EACCES  |    0
> HuC authenticated |   1      |     1    |    1
> ------------------+----------+----------+----------

By fw fail, you mean we loaded the firmware (to our knowledge)
correctly, but HUC_STATUS is not reported as valid?

If so, I support option B. I like the idea of saying
"no HuC" (machine too old)
"no firmware" (user action, or lack thereof)
0 (fw unhappy)
1 (fw reports success)

In between states for failures in fw loading? Not so sure. But I can see
the nicety in distinguishing between lack of firmware and some random
failure in loading the firmware (the former being user action required
to rectify, command line parameter whatever and the latter being the
firmware file is either invalid or a stray neutrino prevented loading).

Imo the error messages should be about why we cannot probe/trust the
HUC_STATUS register. If everything is setup correctly then the returned
value should be from reading the register. I dislike only returning 1 if
supported, and converting a valid read of 0 into another error.

So Option B :)

> Note that all above should be compatible with media driver,
> which explicitly looks for no error and value 1

Cool.
-Chris
Ye, Tony Jan. 23, 2020, 6:12 p.m. UTC | #6
Acked-by: Tony Ye <tony.ye@intel.com>

On 1/22/2020 11:48 AM, Michal Wajdeczko wrote:
>  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
> in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
> to indicate lack of HuC hardware and we started to use this error
> also for all other cases when HuC was not in use or supported.
> 
> Fix that by relying again on HAS_GT_UC macro, since currently
> used function intel_huc_is_supported() is based on HuC firmware
> support which could be unsupported also due to force disabled
> GuC firmware.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Tony Ye <tony.ye@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index 32a069841c14..ad4d9e16a24e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -209,7 +209,7 @@ int intel_huc_check_status(struct intel_huc *huc)
>   	intel_wakeref_t wakeref;
>   	u32 status = 0;
>   
> -	if (!intel_huc_is_supported(huc))
> +	if (!HAS_GT_UC(gt->i915))
>   		return -ENODEV;
>   
>   	with_intel_runtime_pm(gt->uncore->rpm, wakeref)
>
Ye, Tony Jan. 23, 2020, 6:26 p.m. UTC | #7
On 1/23/2020 7:38 AM, Michal Wajdeczko wrote:
> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson 
> <chris@chris-wilson.co.uk> wrote:
> 
>> Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>
>>>
>>> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
>>> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>>> > to indicate lack of HuC hardware and we started to use this error
>>> > also for all other cases when HuC was not in use or supported.
>>> >
>>> > Fix that by relying again on HAS_GT_UC macro, since currently
>>> > used function intel_huc_is_supported() is based on HuC firmware
>>> > support which could be unsupported also due to force disabled
>>> > GuC firmware.
>>> >
>>> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> > Cc: Tony Ye <tony.ye@intel.com>
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>
>> Once upon a time did you (Michal) not argue we should indicate the lack
>> of firmware in the error code? Something like
>>
>> if (!HAS_GT_UC(gt->i915))
>>     return -ENODEV;
>>
>> if (!intel_huc_is_supported(huc))
>>     return -ENOEXEC;
> 
> Yes, we discussed this here [1] together with [2] but we didn't
> conclude our discussion due to different opinions on how represent
> some states, in particular "manually disabled" state.
> 
> In this patch I just wanted to restore old notation.
> 
> But we can start new discussion, here is summary:
> 
> ------------------+----------+----------+----------
>   HuC state        | today*   | option A | option B
> ------------------+----------+----------+----------
> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
> HuC fw fail       |   0      | -EACCES  |    0

What is the difference of HuC fw error and HuC fw fail here?

Regards,
Tony

> HuC authenticated |   1      |     1    |    1
> ------------------+----------+----------+----------
> 
> Note that all above should be compatible with media driver,
> which explicitly looks for no error and value 1
> 
> Michal
> 
> [1] https://patchwork.freedesktop.org/patch/306419/?series=61001&rev=1
> [2] https://patchwork.freedesktop.org/series/60800/#rev1
Michal Wajdeczko Jan. 23, 2020, 6:43 p.m. UTC | #8
On Thu, 23 Jan 2020 19:26:58 +0100, Ye, Tony <tony.ye@intel.com> wrote:

>
>
> On 1/23/2020 7:38 AM, Michal Wajdeczko wrote:
>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson  
>> <chris@chris-wilson.co.uk> wrote:
>>
>>> Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>>
>>>>
>>>> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>>> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
>>>> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>>>> > to indicate lack of HuC hardware and we started to use this error
>>>> > also for all other cases when HuC was not in use or supported.
>>>> >
>>>> > Fix that by relying again on HAS_GT_UC macro, since currently
>>>> > used function intel_huc_is_supported() is based on HuC firmware
>>>> > support which could be unsupported also due to force disabled
>>>> > GuC firmware.
>>>> >
>>>> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> > Cc: Tony Ye <tony.ye@intel.com>
>>>>
>>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>
>>> Once upon a time did you (Michal) not argue we should indicate the lack
>>> of firmware in the error code? Something like
>>>
>>> if (!HAS_GT_UC(gt->i915))
>>>     return -ENODEV;
>>>
>>> if (!intel_huc_is_supported(huc))
>>>     return -ENOEXEC;
>>  Yes, we discussed this here [1] together with [2] but we didn't
>> conclude our discussion due to different opinions on how represent
>> some states, in particular "manually disabled" state.
>>  In this patch I just wanted to restore old notation.
>>  But we can start new discussion, here is summary:
>>  ------------------+----------+----------+----------
>>   HuC state        | today*   | option A | option B
>> ------------------+----------+----------+----------
>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>> HuC fw fail       |   0      | -EACCES  |    0
>
> What is the difference of HuC fw error and HuC fw fail here?

see corresponding internal fw status codes:

	INTEL_UC_FIRMWARE_ERROR, /* invalid format or version */
	INTEL_UC_FIRMWARE_FAIL, /* failed to xfer or init/auth the fw */

>
> Regards,
> Tony
>
>> HuC authenticated |   1      |     1    |    1
>> ------------------+----------+----------+----------
>>  Note that all above should be compatible with media driver,
>> which explicitly looks for no error and value 1
>>  Michal
>>  [1] https://patchwork.freedesktop.org/patch/306419/?series=61001&rev=1
>> [2] https://patchwork.freedesktop.org/series/60800/#rev1
Ye, Tony Jan. 23, 2020, 7:50 p.m. UTC | #9
On 1/23/2020 7:38 AM, Michal Wajdeczko wrote:
> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson 
> <chris@chris-wilson.co.uk> wrote:
> 
>> Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>
>>>
>>> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
>>> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>>> > to indicate lack of HuC hardware and we started to use this error
>>> > also for all other cases when HuC was not in use or supported.
>>> >
>>> > Fix that by relying again on HAS_GT_UC macro, since currently
>>> > used function intel_huc_is_supported() is based on HuC firmware
>>> > support which could be unsupported also due to force disabled
>>> > GuC firmware.
>>> >
>>> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> > Cc: Tony Ye <tony.ye@intel.com>
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>
>> Once upon a time did you (Michal) not argue we should indicate the lack
>> of firmware in the error code? Something like
>>
>> if (!HAS_GT_UC(gt->i915))
>>     return -ENODEV;
>>
>> if (!intel_huc_is_supported(huc))
>>     return -ENOEXEC;
> 
> Yes, we discussed this here [1] together with [2] but we didn't
> conclude our discussion due to different opinions on how represent
> some states, in particular "manually disabled" state.
> 
> In this patch I just wanted to restore old notation.
> 
> But we can start new discussion, here is summary:
> 
> ------------------+----------+----------+----------
>   HuC state        | today*   | option A | option B
> ------------------+----------+----------+----------
> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
> HuC fw fail       |   0      | -EACCES  |    0
> HuC authenticated |   1      |     1    |    1
> ------------------+----------+----------+----------
> 
> Note that all above should be compatible with media driver,
> which explicitly looks for no error and value 1

 From user space perspective, e.g. the new igt huc_copy, option B looks 
like this:

    pg.param = I915_PARAM_HUC_STATUS;
    pg.value = &val;
    ret = ioctl(fd, DRM_IOCTL_I915_GETPARAM, &pg);

------------------+----------+----------+------------+----------
   HuC state       | ret      | pg.value |  errno     | huc_copy
------------------+----------+----------+------------+----------
no HuC hardware   |  -1      |     0    | -ENODEV    | SKIP
GuC fw disabled   |  -1      |     0    | -EOPNOTSUPP| SKIP
HuC fw disabled   |  -1      |     0    | -EOPNOTSUPP| SKIP
HuC fw missing    |  -1      |     0    | -ENOEXEC   | FAIL
HuC fw error      |  -1      |     0    | -ENOEXEC   | FAIL
HuC fw fail       |   0      |     0    |    0       | FAIL
HuC authenticated |   0      |     1    |    0       | continue
------------------+----------+----------+------------+----------

It can distinguish the SKIP and FAIL conditions. But looks not elegant 
enough.
The pg.value is wasted as it is not pushed back to user space when 
intel_huc_check_status() < 0.

	case I915_PARAM_HUC_STATUS:
		value = intel_huc_check_status(&i915->gt.uc.huc);
		if (value < 0)
			return value;
		break;

Regards,
Tony
> 
> Michal
> 
> [1] https://patchwork.freedesktop.org/patch/306419/?series=61001&rev=1
> [2] https://patchwork.freedesktop.org/series/60800/#rev1
Michal Wajdeczko Jan. 26, 2020, 5:41 p.m. UTC | #10
On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Michal Wajdeczko (2020-01-23 15:38:52)
>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
>> <chris@chris-wilson.co.uk> wrote:
>>
>> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>> >>
>> >>
>> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
>> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>> >> > to indicate lack of HuC hardware and we started to use this error
>> >> > also for all other cases when HuC was not in use or supported.
>> >> >
>> >> > Fix that by relying again on HAS_GT_UC macro, since currently
>> >> > used function intel_huc_is_supported() is based on HuC firmware
>> >> > support which could be unsupported also due to force disabled
>> >> > GuC firmware.
>> >> >
>> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> >> > Cc: Tony Ye <tony.ye@intel.com>
>> >>
>> >> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> >
>> > Once upon a time did you (Michal) not argue we should indicate the  
>> lack
>> > of firmware in the error code? Something like
>> >
>> > if (!HAS_GT_UC(gt->i915))
>> >       return -ENODEV;
>> >
>> > if (!intel_huc_is_supported(huc))
>> >       return -ENOEXEC;
>>
>> Yes, we discussed this here [1] together with [2] but we didn't
>> conclude our discussion due to different opinions on how represent
>> some states, in particular "manually disabled" state.
>>
>> In this patch I just wanted to restore old notation.
>>
>> But we can start new discussion, here is summary:
>>
>> ------------------+----------+----------+----------
>>   HuC state        | today*   | option A | option B
>> ------------------+----------+----------+----------
>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>> HuC fw fail       |   0      | -EACCES  |    0
>> HuC authenticated |   1      |     1    |    1
>> ------------------+----------+----------+----------
>
> By fw fail, you mean we loaded the firmware (to our knowledge)
> correctly, but HUC_STATUS is not reported as valid?
>
> If so, I support option B. I like the idea of saying
> "no HuC" (machine too old)
> "no firmware" (user action, or lack thereof)
> 0 (fw unhappy)
> 1 (fw reports success)
>
> In between states for failures in fw loading? Not so sure. But I can see
> the nicety in distinguishing between lack of firmware and some random
> failure in loading the firmware (the former being user action required
> to rectify, command line parameter whatever and the latter being the
> firmware file is either invalid or a stray neutrino prevented loading).
>
> Imo the error messages should be about why we cannot probe/trust the
> HUC_STATUS register. If everything is setup correctly then the returned
> value should be from reading the register. I dislike only returning 1 if
> supported, and converting a valid read of 0 into another error.
>
> So Option B :)

But I'm not sure that option B is consistent in error reporting, as
"fw unhappy" is definitely an serious error but is represented as plain
non-error "0" status, while "fw disabled" (user action) is treated as error

------------------+----------
   HuC state       | option B
------------------+----------
no HuC hardware   | -ENODEV
GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
HuC fw missing    | -ENOEXEC
HuC fw error      | -ENOEXEC
HuC fw fail       |    0        -> unlikely, but still fw/hw error
HuC authenticated |    1
------------------+----------

On other hand, option A treats all error conditions as errors, leaving
status codes only for normal operations: disabled(0)/authenticated(1):

------------------+----------
   HuC state       | option A
------------------+----------
no HuC hardware   | -ENODEV  -> you shouldn't ask
GuC fw disabled   |     0    -> user decision, not an error
HuC fw disabled   |     0    -> user decision, not an error
HuC fw missing    | -ENOPKG  -> fw not installed correctly
HuC fw error      | -ENOEXEC -> bad/wrong fw
HuC fw fail       | -EACCES  -> fw/hw error
HuC authenticated |     1
------------------+----------

But since I'm not an active HuC user, will leave final decision to others.

/Michal


>
>> Note that all above should be compatible with media driver,
>> which explicitly looks for no error and value 1
>
> Cool.
> -Chris
Ye, Tony Feb. 5, 2020, 12:43 a.m. UTC | #11
On 1/27/2020 1:41 AM, Michal Wajdeczko wrote:
> On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson 
> <chris@chris-wilson.co.uk> wrote:
> 
>> Quoting Michal Wajdeczko (2020-01-23 15:38:52)
>>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
>>> <chris@chris-wilson.co.uk> wrote:
>>>
>>> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>> >>
>>> >>
>>> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915
>>> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>>> >> > to indicate lack of HuC hardware and we started to use this error
>>> >> > also for all other cases when HuC was not in use or supported.
>>> >> >
>>> >> > Fix that by relying again on HAS_GT_UC macro, since currently
>>> >> > used function intel_huc_is_supported() is based on HuC firmware
>>> >> > support which could be unsupported also due to force disabled
>>> >> > GuC firmware.
>>> >> >
>>> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> >> > Cc: Tony Ye <tony.ye@intel.com>
>>> >>
>>> >> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> >
>>> > Once upon a time did you (Michal) not argue we should indicate the 
>>> lack
>>> > of firmware in the error code? Something like
>>> >
>>> > if (!HAS_GT_UC(gt->i915))
>>> >       return -ENODEV;
>>> >
>>> > if (!intel_huc_is_supported(huc))
>>> >       return -ENOEXEC;
>>>
>>> Yes, we discussed this here [1] together with [2] but we didn't
>>> conclude our discussion due to different opinions on how represent
>>> some states, in particular "manually disabled" state.
>>>
>>> In this patch I just wanted to restore old notation.
>>>
>>> But we can start new discussion, here is summary:
>>>
>>> ------------------+----------+----------+----------
>>>   HuC state        | today*   | option A | option B
>>> ------------------+----------+----------+----------
>>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>>> HuC fw fail       |   0      | -EACCES  |    0
>>> HuC authenticated |   1      |     1    |    1
>>> ------------------+----------+----------+----------
>>
>> By fw fail, you mean we loaded the firmware (to our knowledge)
>> correctly, but HUC_STATUS is not reported as valid?
>>
>> If so, I support option B. I like the idea of saying
>> "no HuC" (machine too old)
>> "no firmware" (user action, or lack thereof)
>> 0 (fw unhappy)
>> 1 (fw reports success)
>>
>> In between states for failures in fw loading? Not so sure. But I can see
>> the nicety in distinguishing between lack of firmware and some random
>> failure in loading the firmware (the former being user action required
>> to rectify, command line parameter whatever and the latter being the
>> firmware file is either invalid or a stray neutrino prevented loading).
>>
>> Imo the error messages should be about why we cannot probe/trust the
>> HUC_STATUS register. If everything is setup correctly then the returned
>> value should be from reading the register. I dislike only returning 1 if
>> supported, and converting a valid read of 0 into another error.
>>
>> So Option B :)
> 
> But I'm not sure that option B is consistent in error reporting, as
> "fw unhappy" is definitely an serious error but is represented as plain
> non-error "0" status, while "fw disabled" (user action) is treated as error
> 
> ------------------+----------
>    HuC state       | option B
> ------------------+----------
> no HuC hardware   | -ENODEV
> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
> HuC fw missing    | -ENOEXEC
> HuC fw error      | -ENOEXEC
> HuC fw fail       |    0        -> unlikely, but still fw/hw error
> HuC authenticated |    1
> ------------------+----------
> 
> On other hand, option A treats all error conditions as errors, leaving
> status codes only for normal operations: disabled(0)/authenticated(1):
> 
> ------------------+----------
>    HuC state       | option A
> ------------------+----------
> no HuC hardware   | -ENODEV  -> you shouldn't ask
> GuC fw disabled   |     0    -> user decision, not an error
> HuC fw disabled   |     0    -> user decision, not an error
> HuC fw missing    | -ENOPKG  -> fw not installed correctly
> HuC fw error      | -ENOEXEC -> bad/wrong fw
> HuC fw fail       | -EACCES  -> fw/hw error
> HuC authenticated |     1
> ------------------+----------

Vote for Option A.

Regards,
Tony

> 
> But since I'm not an active HuC user, will leave final decision to others.
> 
> /Michal
> 
> 
>>
>>> Note that all above should be compatible with media driver,
>>> which explicitly looks for no error and value 1
>>
>> Cool.
>> -Chris
Fosha, Robert M Feb. 11, 2020, 5:53 p.m. UTC | #12
On 2/4/20 4:43 PM, Ye, Tony wrote:
>
>
> On 1/27/2020 1:41 AM, Michal Wajdeczko wrote:
>> On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson 
>> <chris@chris-wilson.co.uk> wrote:
>>
>>> Quoting Michal Wajdeczko (2020-01-23 15:38:52)
>>>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
>>>> <chris@chris-wilson.co.uk> wrote:
>>>>
>>>> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>> >>
>>>> >>
>>>> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>>> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over 
>>>> i915
>>>> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only
>>>> >> > to indicate lack of HuC hardware and we started to use this error
>>>> >> > also for all other cases when HuC was not in use or supported.
>>>> >> >
>>>> >> > Fix that by relying again on HAS_GT_UC macro, since currently
>>>> >> > used function intel_huc_is_supported() is based on HuC firmware
>>>> >> > support which could be unsupported also due to force disabled
>>>> >> > GuC firmware.
>>>> >> >
>>>> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> >> > Cc: Tony Ye <tony.ye@intel.com>
>>>> >>
>>>> >> Reviewed-by: Daniele Ceraolo Spurio 
>>>> <daniele.ceraolospurio@intel.com>
>>>> >
>>>> > Once upon a time did you (Michal) not argue we should indicate 
>>>> the lack
>>>> > of firmware in the error code? Something like
>>>> >
>>>> > if (!HAS_GT_UC(gt->i915))
>>>> >       return -ENODEV;
>>>> >
>>>> > if (!intel_huc_is_supported(huc))
>>>> >       return -ENOEXEC;
>>>>
>>>> Yes, we discussed this here [1] together with [2] but we didn't
>>>> conclude our discussion due to different opinions on how represent
>>>> some states, in particular "manually disabled" state.
>>>>
>>>> In this patch I just wanted to restore old notation.
>>>>
>>>> But we can start new discussion, here is summary:
>>>>
>>>> ------------------+----------+----------+----------
>>>>   HuC state        | today*   | option A | option B
>>>> ------------------+----------+----------+----------
>>>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>>>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>>>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>>>> HuC fw fail       |   0      | -EACCES  |    0
>>>> HuC authenticated |   1      |     1    |    1
>>>> ------------------+----------+----------+----------
>>>
>>> By fw fail, you mean we loaded the firmware (to our knowledge)
>>> correctly, but HUC_STATUS is not reported as valid?
>>>
>>> If so, I support option B. I like the idea of saying
>>> "no HuC" (machine too old)
>>> "no firmware" (user action, or lack thereof)
>>> 0 (fw unhappy)
>>> 1 (fw reports success)
>>>
>>> In between states for failures in fw loading? Not so sure. But I can 
>>> see
>>> the nicety in distinguishing between lack of firmware and some random
>>> failure in loading the firmware (the former being user action required
>>> to rectify, command line parameter whatever and the latter being the
>>> firmware file is either invalid or a stray neutrino prevented loading).
>>>
>>> Imo the error messages should be about why we cannot probe/trust the
>>> HUC_STATUS register. If everything is setup correctly then the returned
>>> value should be from reading the register. I dislike only returning 
>>> 1 if
>>> supported, and converting a valid read of 0 into another error.
>>>
>>> So Option B :)
>>
>> But I'm not sure that option B is consistent in error reporting, as
>> "fw unhappy" is definitely an serious error but is represented as plain
>> non-error "0" status, while "fw disabled" (user action) is treated as 
>> error
>>
>> ------------------+----------
>>    HuC state       | option B
>> ------------------+----------
>> no HuC hardware   | -ENODEV
>> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>> HuC fw missing    | -ENOEXEC
>> HuC fw error      | -ENOEXEC
>> HuC fw fail       |    0        -> unlikely, but still fw/hw error
>> HuC authenticated |    1
>> ------------------+----------
>>
>> On other hand, option A treats all error conditions as errors, leaving
>> status codes only for normal operations: disabled(0)/authenticated(1):
>>
>> ------------------+----------
>>    HuC state       | option A
>> ------------------+----------
>> no HuC hardware   | -ENODEV  -> you shouldn't ask
>> GuC fw disabled   |     0    -> user decision, not an error
>> HuC fw disabled   |     0    -> user decision, not an error
>> HuC fw missing    | -ENOPKG  -> fw not installed correctly
>> HuC fw error      | -ENOEXEC -> bad/wrong fw
>> HuC fw fail       | -EACCES  -> fw/hw error
>> HuC authenticated |     1
>> ------------------+----------
>
> Vote for Option A.
>
> Regards,
> Tony
>

Are we ok to move forward on this? Michal, are you working on updating 
the patch?

-Rob

>>
>> But since I'm not an active HuC user, will leave final decision to 
>> others.
>>
>> /Michal
>>
>>
>>>
>>>> Note that all above should be compatible with media driver,
>>>> which explicitly looks for no error and value 1
>>>
>>> Cool.
>>> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Michal Wajdeczko Feb. 11, 2020, 7:57 p.m. UTC | #13
On Tue, 11 Feb 2020 18:53:05 +0100, Fosha, Robert M  
<robert.m.fosha@intel.com> wrote:

>
>
> On 2/4/20 4:43 PM, Ye, Tony wrote:
>>
>>
>> On 1/27/2020 1:41 AM, Michal Wajdeczko wrote:
>>> On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson  
>>> <chris@chris-wilson.co.uk> wrote:
>>>
>>>> Quoting Michal Wajdeczko (2020-01-23 15:38:52)
>>>>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
>>>>> <chris@chris-wilson.co.uk> wrote:
>>>>>
>>>>> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>>> >>
>>>>> >>
>>>>> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>>>> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over  
>>>>> i915
>>>>> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV  
>>>>> only
>>>>> >> > to indicate lack of HuC hardware and we started to use this  
>>>>> error
>>>>> >> > also for all other cases when HuC was not in use or supported.
>>>>> >> >
>>>>> >> > Fix that by relying again on HAS_GT_UC macro, since currently
>>>>> >> > used function intel_huc_is_supported() is based on HuC firmware
>>>>> >> > support which could be unsupported also due to force disabled
>>>>> >> > GuC firmware.
>>>>> >> >
>>>>> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>> >> > Cc: Tony Ye <tony.ye@intel.com>
>>>>> >>
>>>>> >> Reviewed-by: Daniele Ceraolo Spurio  
>>>>> <daniele.ceraolospurio@intel.com>
>>>>> >
>>>>> > Once upon a time did you (Michal) not argue we should indicate the  
>>>>> lack
>>>>> > of firmware in the error code? Something like
>>>>> >
>>>>> > if (!HAS_GT_UC(gt->i915))
>>>>> >       return -ENODEV;
>>>>> >
>>>>> > if (!intel_huc_is_supported(huc))
>>>>> >       return -ENOEXEC;
>>>>>
>>>>> Yes, we discussed this here [1] together with [2] but we didn't
>>>>> conclude our discussion due to different opinions on how represent
>>>>> some states, in particular "manually disabled" state.
>>>>>
>>>>> In this patch I just wanted to restore old notation.
>>>>>
>>>>> But we can start new discussion, here is summary:
>>>>>
>>>>> ------------------+----------+----------+----------
>>>>>   HuC state        | today*   | option A | option B
>>>>> ------------------+----------+----------+----------
>>>>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>>>>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>>>>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>>>>> HuC fw fail       |   0      | -EACCES  |    0
>>>>> HuC authenticated |   1      |     1    |    1
>>>>> ------------------+----------+----------+----------
>>>>
>>>> By fw fail, you mean we loaded the firmware (to our knowledge)
>>>> correctly, but HUC_STATUS is not reported as valid?
>>>>
>>>> If so, I support option B. I like the idea of saying
>>>> "no HuC" (machine too old)
>>>> "no firmware" (user action, or lack thereof)
>>>> 0 (fw unhappy)
>>>> 1 (fw reports success)
>>>>
>>>> In between states for failures in fw loading? Not so sure. But I can  
>>>> see
>>>> the nicety in distinguishing between lack of firmware and some random
>>>> failure in loading the firmware (the former being user action required
>>>> to rectify, command line parameter whatever and the latter being the
>>>> firmware file is either invalid or a stray neutrino prevented  
>>>> loading).
>>>>
>>>> Imo the error messages should be about why we cannot probe/trust the
>>>> HUC_STATUS register. If everything is setup correctly then the  
>>>> returned
>>>> value should be from reading the register. I dislike only returning 1  
>>>> if
>>>> supported, and converting a valid read of 0 into another error.
>>>>
>>>> So Option B :)
>>>
>>> But I'm not sure that option B is consistent in error reporting, as
>>> "fw unhappy" is definitely an serious error but is represented as plain
>>> non-error "0" status, while "fw disabled" (user action) is treated as  
>>> error
>>>
>>> ------------------+----------
>>>    HuC state       | option B
>>> ------------------+----------
>>> no HuC hardware   | -ENODEV
>>> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>> HuC fw missing    | -ENOEXEC
>>> HuC fw error      | -ENOEXEC
>>> HuC fw fail       |    0        -> unlikely, but still fw/hw error
>>> HuC authenticated |    1
>>> ------------------+----------
>>>
>>> On other hand, option A treats all error conditions as errors, leaving
>>> status codes only for normal operations: disabled(0)/authenticated(1):
>>>
>>> ------------------+----------
>>>    HuC state       | option A
>>> ------------------+----------
>>> no HuC hardware   | -ENODEV  -> you shouldn't ask
>>> GuC fw disabled   |     0    -> user decision, not an error
>>> HuC fw disabled   |     0    -> user decision, not an error
>>> HuC fw missing    | -ENOPKG  -> fw not installed correctly
>>> HuC fw error      | -ENOEXEC -> bad/wrong fw
>>> HuC fw fail       | -EACCES  -> fw/hw error
>>> HuC authenticated |     1
>>> ------------------+----------
>>
>> Vote for Option A.
>>
>> Regards,
>> Tony
>>
>
> Are we ok to move forward on this? Michal, are you working on updating  
> the patch?

I can update the patch, but I don't know which option to implement:
Tony votes for Option A, Chris for Option B, what's your choice?

Michal

>
> -Rob
>
>>>
>>> But since I'm not an active HuC user, will leave final decision to  
>>> others.
>>>
>>> /Michal
>>>
>>>
>>>>
>>>>> Note that all above should be compatible with media driver,
>>>>> which explicitly looks for no error and value 1
>>>>
>>>> Cool.
>>>> -Chris
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Fosha, Robert M Feb. 11, 2020, 9:54 p.m. UTC | #14
On 2/11/20 11:57 AM, Michal Wajdeczko wrote:
> On Tue, 11 Feb 2020 18:53:05 +0100, Fosha, Robert M 
> <robert.m.fosha@intel.com> wrote:
>
>>
>>
>> On 2/4/20 4:43 PM, Ye, Tony wrote:
>>>
>>>
>>> On 1/27/2020 1:41 AM, Michal Wajdeczko wrote:
>>>> On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson 
>>>> <chris@chris-wilson.co.uk> wrote:
>>>>
>>>>> Quoting Michal Wajdeczko (2020-01-23 15:38:52)
>>>>>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
>>>>>> <chris@chris-wilson.co.uk> wrote:
>>>>>>
>>>>>> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>>>> >>
>>>>>> >>
>>>>>> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>>>>> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt 
>>>>>> over i915
>>>>>> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV 
>>>>>> only
>>>>>> >> > to indicate lack of HuC hardware and we started to use this 
>>>>>> error
>>>>>> >> > also for all other cases when HuC was not in use or supported.
>>>>>> >> >
>>>>>> >> > Fix that by relying again on HAS_GT_UC macro, since currently
>>>>>> >> > used function intel_huc_is_supported() is based on HuC firmware
>>>>>> >> > support which could be unsupported also due to force disabled
>>>>>> >> > GuC firmware.
>>>>>> >> >
>>>>>> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>> >> > Cc: Tony Ye <tony.ye@intel.com>
>>>>>> >>
>>>>>> >> Reviewed-by: Daniele Ceraolo Spurio 
>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>> >
>>>>>> > Once upon a time did you (Michal) not argue we should indicate 
>>>>>> the lack
>>>>>> > of firmware in the error code? Something like
>>>>>> >
>>>>>> > if (!HAS_GT_UC(gt->i915))
>>>>>> >       return -ENODEV;
>>>>>> >
>>>>>> > if (!intel_huc_is_supported(huc))
>>>>>> >       return -ENOEXEC;
>>>>>>
>>>>>> Yes, we discussed this here [1] together with [2] but we didn't
>>>>>> conclude our discussion due to different opinions on how represent
>>>>>> some states, in particular "manually disabled" state.
>>>>>>
>>>>>> In this patch I just wanted to restore old notation.
>>>>>>
>>>>>> But we can start new discussion, here is summary:
>>>>>>
>>>>>> ------------------+----------+----------+----------
>>>>>>   HuC state        | today*   | option A | option B
>>>>>> ------------------+----------+----------+----------
>>>>>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>>>>>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>>>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>>>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>>>>>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>>>>>> HuC fw fail       |   0      | -EACCES  |    0
>>>>>> HuC authenticated |   1      |     1    |    1
>>>>>> ------------------+----------+----------+----------
>>>>>
>>>>> By fw fail, you mean we loaded the firmware (to our knowledge)
>>>>> correctly, but HUC_STATUS is not reported as valid?
>>>>>
>>>>> If so, I support option B. I like the idea of saying
>>>>> "no HuC" (machine too old)
>>>>> "no firmware" (user action, or lack thereof)
>>>>> 0 (fw unhappy)
>>>>> 1 (fw reports success)
>>>>>
>>>>> In between states for failures in fw loading? Not so sure. But I 
>>>>> can see
>>>>> the nicety in distinguishing between lack of firmware and some random
>>>>> failure in loading the firmware (the former being user action 
>>>>> required
>>>>> to rectify, command line parameter whatever and the latter being the
>>>>> firmware file is either invalid or a stray neutrino prevented 
>>>>> loading).
>>>>>
>>>>> Imo the error messages should be about why we cannot probe/trust the
>>>>> HUC_STATUS register. If everything is setup correctly then the 
>>>>> returned
>>>>> value should be from reading the register. I dislike only 
>>>>> returning 1 if
>>>>> supported, and converting a valid read of 0 into another error.
>>>>>
>>>>> So Option B :)
>>>>
>>>> But I'm not sure that option B is consistent in error reporting, as
>>>> "fw unhappy" is definitely an serious error but is represented as 
>>>> plain
>>>> non-error "0" status, while "fw disabled" (user action) is treated 
>>>> as error
>>>>
>>>> ------------------+----------
>>>>    HuC state       | option B
>>>> ------------------+----------
>>>> no HuC hardware   | -ENODEV
>>>> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>>> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>>> HuC fw missing    | -ENOEXEC
>>>> HuC fw error      | -ENOEXEC
>>>> HuC fw fail       |    0        -> unlikely, but still fw/hw error
>>>> HuC authenticated |    1
>>>> ------------------+----------
>>>>
>>>> On other hand, option A treats all error conditions as errors, leaving
>>>> status codes only for normal operations: disabled(0)/authenticated(1):
>>>>
>>>> ------------------+----------
>>>>    HuC state       | option A
>>>> ------------------+----------
>>>> no HuC hardware   | -ENODEV  -> you shouldn't ask
>>>> GuC fw disabled   |     0    -> user decision, not an error
>>>> HuC fw disabled   |     0    -> user decision, not an error
>>>> HuC fw missing    | -ENOPKG  -> fw not installed correctly
>>>> HuC fw error      | -ENOEXEC -> bad/wrong fw
>>>> HuC fw fail       | -EACCES  -> fw/hw error
>>>> HuC authenticated |     1
>>>> ------------------+----------
>>>
>>> Vote for Option A.
>>>
>>> Regards,
>>> Tony
>>>
>>
>> Are we ok to move forward on this? Michal, are you working on 
>> updating the patch?
>
> I can update the patch, but I don't know which option to implement:
> Tony votes for Option A, Chris for Option B, what's your choice?
>
> Michal
>

I don't have a preference and would trust both Tony and Chris over 
myself. However, if Tony is using HuC more I would vote for his prefered 
option.

-Rob

>>
>> -Rob
>>
>>>>
>>>> But since I'm not an active HuC user, will leave final decision to 
>>>> others.
>>>>
>>>> /Michal
>>>>
>>>>
>>>>>
>>>>>> Note that all above should be compatible with media driver,
>>>>>> which explicitly looks for no error and value 1
>>>>>
>>>>> Cool.
>>>>> -Chris
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ye, Tony Feb. 21, 2020, 3:56 a.m. UTC | #15
On 2/12/2020 5:54 AM, Fosha, Robert M wrote:
> 
> 
> On 2/11/20 11:57 AM, Michal Wajdeczko wrote:
>> On Tue, 11 Feb 2020 18:53:05 +0100, Fosha, Robert M 
>> <robert.m.fosha@intel.com> wrote:
>>
>>>
>>>
>>> On 2/4/20 4:43 PM, Ye, Tony wrote:
>>>>
>>>>
>>>> On 1/27/2020 1:41 AM, Michal Wajdeczko wrote:
>>>>> On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson 
>>>>> <chris@chris-wilson.co.uk> wrote:
>>>>>
>>>>>> Quoting Michal Wajdeczko (2020-01-23 15:38:52)
>>>>>>> On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
>>>>>>> <chris@chris-wilson.co.uk> wrote:
>>>>>>>
>>>>>>> > Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>>>>>>> >> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt 
>>>>>>> over i915
>>>>>>> >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV 
>>>>>>> only
>>>>>>> >> > to indicate lack of HuC hardware and we started to use this 
>>>>>>> error
>>>>>>> >> > also for all other cases when HuC was not in use or supported.
>>>>>>> >> >
>>>>>>> >> > Fix that by relying again on HAS_GT_UC macro, since currently
>>>>>>> >> > used function intel_huc_is_supported() is based on HuC firmware
>>>>>>> >> > support which could be unsupported also due to force disabled
>>>>>>> >> > GuC firmware.
>>>>>>> >> >
>>>>>>> >> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>>> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>>> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>>> >> > Cc: Tony Ye <tony.ye@intel.com>
>>>>>>> >>
>>>>>>> >> Reviewed-by: Daniele Ceraolo Spurio 
>>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>>> >
>>>>>>> > Once upon a time did you (Michal) not argue we should indicate 
>>>>>>> the lack
>>>>>>> > of firmware in the error code? Something like
>>>>>>> >
>>>>>>> > if (!HAS_GT_UC(gt->i915))
>>>>>>> >       return -ENODEV;
>>>>>>> >
>>>>>>> > if (!intel_huc_is_supported(huc))
>>>>>>> >       return -ENOEXEC;
>>>>>>>
>>>>>>> Yes, we discussed this here [1] together with [2] but we didn't
>>>>>>> conclude our discussion due to different opinions on how represent
>>>>>>> some states, in particular "manually disabled" state.
>>>>>>>
>>>>>>> In this patch I just wanted to restore old notation.
>>>>>>>
>>>>>>> But we can start new discussion, here is summary:
>>>>>>>
>>>>>>> ------------------+----------+----------+----------
>>>>>>>   HuC state        | today*   | option A | option B
>>>>>>> ------------------+----------+----------+----------
>>>>>>> no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
>>>>>>> GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>>>>> HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
>>>>>>> HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
>>>>>>> HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
>>>>>>> HuC fw fail       |   0      | -EACCES  |    0
>>>>>>> HuC authenticated |   1      |     1    |    1
>>>>>>> ------------------+----------+----------+----------
>>>>>>
>>>>>> By fw fail, you mean we loaded the firmware (to our knowledge)
>>>>>> correctly, but HUC_STATUS is not reported as valid?
>>>>>>
>>>>>> If so, I support option B. I like the idea of saying
>>>>>> "no HuC" (machine too old)
>>>>>> "no firmware" (user action, or lack thereof)
>>>>>> 0 (fw unhappy)
>>>>>> 1 (fw reports success)
>>>>>>
>>>>>> In between states for failures in fw loading? Not so sure. But I 
>>>>>> can see
>>>>>> the nicety in distinguishing between lack of firmware and some random
>>>>>> failure in loading the firmware (the former being user action 
>>>>>> required
>>>>>> to rectify, command line parameter whatever and the latter being the
>>>>>> firmware file is either invalid or a stray neutrino prevented 
>>>>>> loading).
>>>>>>
>>>>>> Imo the error messages should be about why we cannot probe/trust the
>>>>>> HUC_STATUS register. If everything is setup correctly then the 
>>>>>> returned
>>>>>> value should be from reading the register. I dislike only 
>>>>>> returning 1 if
>>>>>> supported, and converting a valid read of 0 into another error.
>>>>>>
>>>>>> So Option B :)
>>>>>
>>>>> But I'm not sure that option B is consistent in error reporting, as
>>>>> "fw unhappy" is definitely an serious error but is represented as 
>>>>> plain
>>>>> non-error "0" status, while "fw disabled" (user action) is treated 
>>>>> as error
>>>>>
>>>>> ------------------+----------
>>>>>    HuC state       | option B
>>>>> ------------------+----------
>>>>> no HuC hardware   | -ENODEV
>>>>> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>>>> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>>>> HuC fw missing    | -ENOEXEC
>>>>> HuC fw error      | -ENOEXEC
>>>>> HuC fw fail       |    0        -> unlikely, but still fw/hw error
>>>>> HuC authenticated |    1
>>>>> ------------------+----------
>>>>>
>>>>> On other hand, option A treats all error conditions as errors, leaving
>>>>> status codes only for normal operations: disabled(0)/authenticated(1):
>>>>>
>>>>> ------------------+----------
>>>>>    HuC state       | option A
>>>>> ------------------+----------
>>>>> no HuC hardware   | -ENODEV  -> you shouldn't ask
>>>>> GuC fw disabled   |     0    -> user decision, not an error
>>>>> HuC fw disabled   |     0    -> user decision, not an error
>>>>> HuC fw missing    | -ENOPKG  -> fw not installed correctly
>>>>> HuC fw error      | -ENOEXEC -> bad/wrong fw
>>>>> HuC fw fail       | -EACCES  -> fw/hw error
>>>>> HuC authenticated |     1
>>>>> ------------------+----------
>>>>
>>>> Vote for Option A.
>>>>
>>>> Regards,
>>>> Tony
>>>>
>>>
>>> Are we ok to move forward on this? Michal, are you working on 
>>> updating the patch?
>>
>> I can update the patch, but I don't know which option to implement:
>> Tony votes for Option A, Chris for Option B, what's your choice?
>>
>> Michal
>>
> 
> I don't have a preference and would trust both Tony and Chris over 
> myself. However, if Tony is using HuC more I would vote for his prefered 
> option.
> 
> -Rob

Option B takes enable_guc=0|1 (user doesn't want to load HuC FW) as 
error, and takes FW fail (failed to xfer or init/auth the fw) as 0. It 
would confuse the user.

Regards,
Tony

> 
>>>
>>> -Rob
>>>
>>>>>
>>>>> But since I'm not an active HuC user, will leave final decision to 
>>>>> others.
>>>>>
>>>>> /Michal
>>>>>
>>>>>
>>>>>>
>>>>>>> Note that all above should be compatible with media driver,
>>>>>>> which explicitly looks for no error and value 1
>>>>>>
>>>>>> Cool.
>>>>>> -Chris
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Joonas Lahtinen Feb. 28, 2020, 11:33 a.m. UTC | #16
> >>>>> ------------------+----------
> >>>>>    HuC state       | option B
> >>>>> ------------------+----------
> >>>>> no HuC hardware   | -ENODEV
> >>>>> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
> >>>>> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
> >>>>> HuC fw missing    | -ENOEXEC
> >>>>> HuC fw error      | -ENOEXEC
> >>>>> HuC fw fail       |    0        -> unlikely, but still fw/hw error
> >>>>> HuC authenticated |    1
> >>>>> ------------------+----------
> >>>>>
> >>>>> On other hand, option A treats all error conditions as errors, leaving
> >>>>> status codes only for normal operations: disabled(0)/authenticated(1):
> >>>>>
> >>>>> ------------------+----------
> >>>>>    HuC state       | option A
> >>>>> ------------------+----------
> >>>>> no HuC hardware   | -ENODEV  -> you shouldn't ask
> >>>>> GuC fw disabled   |     0    -> user decision, not an error
> >>>>> HuC fw disabled   |     0    -> user decision, not an error
> >>>>> HuC fw missing    | -ENOPKG  -> fw not installed correctly
> >>>>> HuC fw error      | -ENOEXEC -> bad/wrong fw
> >>>>> HuC fw fail       | -EACCES  -> fw/hw error
> >>>>> HuC authenticated |     1
> >>>>> ------------------+----------

Let's go with Option B here.

It correctly reports anything as error if it precedes
the actual action of authentication and prevents it from
happening.

So the result one gets is 0/1 is for the authentication
status which is the original intent here. Or alternatively
an error if the authentication was not attempted.

Regards, Joonas
Michal Wajdeczko Feb. 28, 2020, 1:44 p.m. UTC | #17
On 28.02.2020 12:33, Joonas Lahtinen wrote:
>>>>>>> ------------------+----------
>>>>>>>    HuC state       | option B
>>>>>>> ------------------+----------
>>>>>>> no HuC hardware   | -ENODEV
>>>>>>> GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>>>>>> HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
>>>>>>> HuC fw missing    | -ENOEXEC
>>>>>>> HuC fw error      | -ENOEXEC
>>>>>>> HuC fw fail       |    0        -> unlikely, but still fw/hw error
>>>>>>> HuC authenticated |    1
>>>>>>> ------------------+----------
>>>>>>>
>>>>>>> On other hand, option A treats all error conditions as errors, leaving
>>>>>>> status codes only for normal operations: disabled(0)/authenticated(1):
>>>>>>>
>>>>>>> ------------------+----------
>>>>>>>    HuC state       | option A
>>>>>>> ------------------+----------
>>>>>>> no HuC hardware   | -ENODEV  -> you shouldn't ask
>>>>>>> GuC fw disabled   |     0    -> user decision, not an error
>>>>>>> HuC fw disabled   |     0    -> user decision, not an error
>>>>>>> HuC fw missing    | -ENOPKG  -> fw not installed correctly
>>>>>>> HuC fw error      | -ENOEXEC -> bad/wrong fw
>>>>>>> HuC fw fail       | -EACCES  -> fw/hw error
>>>>>>> HuC authenticated |     1
>>>>>>> ------------------+----------
> 
> Let's go with Option B here.

Tony was ok with option A...

> 
> It correctly reports anything as error if it precedes
> the actual action of authentication and prevents it from
> happening.

Option A do the same, including correctly reporting corrupted/wrong
firmware as error (as this is the most likely reason that prevents
successful HuC authentication on given platform).

The main difference here is user decision to disable HuC, that is now
treated as the only one non-error reason that HuC is not available for use.

> 
> So the result one gets is 0/1 is for the authentication
> status which is the original intent here. 

I'm not sure that from user POV original intent was to explicitly check
HuC authentication status but rather if HuC is available for use.

Our redundant [1] check of fw authentication status during this ioctl is
just internal detail, not necessary something user was interested from
this action.

Michal

[1] as we are already checking this register during HuC fw load
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index 32a069841c14..ad4d9e16a24e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -209,7 +209,7 @@  int intel_huc_check_status(struct intel_huc *huc)
 	intel_wakeref_t wakeref;
 	u32 status = 0;
 
-	if (!intel_huc_is_supported(huc))
+	if (!HAS_GT_UC(gt->i915))
 		return -ENODEV;
 
 	with_intel_runtime_pm(gt->uncore->rpm, wakeref)