diff mbox

[v2,4/7] drm/i915/uc: Don't use -EIO to report missing firmware

Message ID 20171201103317.48416-4-michal.wajdeczko@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Wajdeczko Dec. 1, 2017, 10:33 a.m. UTC
-EIO has special meaning and is used when we want to allow
engine initialization to fail and mark GPU as wedged.
Missing firmware does not fit into this scenario as this is
permanent error not related to GPU condition.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/intel_uc_fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson Dec. 1, 2017, 12:31 p.m. UTC | #1
Quoting Michal Wajdeczko (2017-12-01 10:33:14)
> -EIO has special meaning and is used when we want to allow
> engine initialization to fail and mark GPU as wedged.
> Missing firmware does not fit into this scenario as this is
> permanent error not related to GPU condition.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>

Ok, keeping -EIO to mean something special is a good idea. So if upload
now fails, we abort loading of the driver with ENOEXEC.

Is that sensible? Let's say due to fs corruption, or other error, we
aren't able to upload a fw, what should we do? If we abort the driver
load at this point, the user is likely left with a blank screen. If we
use -EIO, at least KMS is still functional and the user can still
interact with the system. (If we just fell back to execlists, then the
system remains very usable.)

What is the plan for HW initialisation failure?
-Chris
sagar.a.kamble@intel.com Dec. 1, 2017, 3:55 p.m. UTC | #2
On 12/1/2017 6:01 PM, Chris Wilson wrote:
> Quoting Michal Wajdeczko (2017-12-01 10:33:14)
>> -EIO has special meaning and is used when we want to allow
>> engine initialization to fail and mark GPU as wedged.
>> Missing firmware does not fit into this scenario as this is
>> permanent error not related to GPU condition.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Ok, keeping -EIO to mean something special is a good idea. So if upload
> now fails, we abort loading of the driver with ENOEXEC.
>
> Is that sensible? Let's say due to fs corruption, or other error, we
> aren't able to upload a fw, what should we do? If we abort the driver
> load at this point, the user is likely left with a blank screen. If we
> use -EIO, at least KMS is still functional and the user can still
> interact with the system. (If we just fell back to execlists, then the
> system remains very usable.)
>
> What is the plan for HW initialisation failure?
Earlier we were returning -EIO from intel_uc_init_hw when GuC 
load/submission was "required" (enable_guc_loading/submission=2).
Keeping the same behavior I feel we should return -EIO if (enable_guc & 
1) (need to know that user requested "required")
I feel on auto mode (need to know user requested "auto") falling back to 
execlists makes sense as user dint enforce, so we should be returning 0 
then.
> -Chris
Michal Wajdeczko Dec. 1, 2017, 4:10 p.m. UTC | #3
On Fri, 01 Dec 2017 16:55:41 +0100, Sagar Arun Kamble  
<sagar.a.kamble@intel.com> wrote:

>
>
> On 12/1/2017 6:01 PM, Chris Wilson wrote:
>> Quoting Michal Wajdeczko (2017-12-01 10:33:14)
>>> -EIO has special meaning and is used when we want to allow
>>> engine initialization to fail and mark GPU as wedged.
>>> Missing firmware does not fit into this scenario as this is
>>> permanent error not related to GPU condition.
>>>
>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> Ok, keeping -EIO to mean something special is a good idea. So if upload
>> now fails, we abort loading of the driver with ENOEXEC.
>>
>> Is that sensible? Let's say due to fs corruption, or other error, we
>> aren't able to upload a fw, what should we do? If we abort the driver
>> load at this point, the user is likely left with a blank screen. If we
>> use -EIO, at least KMS is still functional and the user can still
>> interact with the system. (If we just fell back to execlists, then the
>> system remains very usable.)
>>
>> What is the plan for HW initialisation failure?
> Earlier we were returning -EIO from intel_uc_init_hw when GuC  
> load/submission was "required" (enable_guc_loading/submission=2).
> Keeping the same behavior I feel we should return -EIO if (enable_guc &  
> 1) (need to know that user requested "required")
> I feel on auto mode (need to know user requested "auto") falling back to  
> execlists makes sense as user dint enforce, so we should be returning 0  
> then.

if we return 0 and use execlist then we will have mismatched state
in flags as USES_GUC will be still returning true (as the same time
we don't want to do late changes to enable_guc modparam on failure
as part of the driver was already initiated with old value)
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_uc_fw.c b/drivers/gpu/drm/i915/intel_uc_fw.c
index b376dd3..784eff9 100644
--- a/drivers/gpu/drm/i915/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/intel_uc_fw.c
@@ -214,7 +214,7 @@  int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
 			 intel_uc_fw_type_repr(uc_fw->type), uc_fw->path);
 
 	if (uc_fw->fetch_status != INTEL_UC_FIRMWARE_SUCCESS)
-		return -EIO;
+		return -ENOEXEC;
 
 	uc_fw->load_status = INTEL_UC_FIRMWARE_PENDING;
 	DRM_DEBUG_DRIVER("%s fw load %s\n",