diff mbox

[CI,1/2] drm/i915/guc: Fix null pointer dereference when GuC FW is not available

Message ID 20180323112319.16293-1-piotr.piorkowski@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Piotr Piórkowski March 23, 2018, 11:23 a.m. UTC
If GuC firmware is not available on the system and we load i915 with enable
GuC, then we hit this null pointer dereference issue:

[   71.098873] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   71.098938] IP: intel_uc_fw_upload+0x1f/0x360 [i915]
[   71.098947] PGD 0 P4D 0
[   71.098956] Oops: 0000 [#1] PREEMPT SMP PTI
[   71.098965] Modules linked in: i915(O+) netconsole x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me i2c_i801 prime_numbers mei [last unloaded: i915]
[   71.099005] CPU: 2 PID: 1167 Comm: insmod Tainted: G     U  W  O     4.16.0-rc1+ #337
[   71.099018] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0065.2018.0103.1000 01/03/2018
[   71.099077] RIP: 0010:intel_uc_fw_upload+0x1f/0x360 [i915]
[   71.099087] RSP: 0018:ffffc90000417aa0 EFLAGS: 00010282
[   71.099097] RAX: 0000000000000000 RBX: ffff88084cad12f8 RCX: ffffffffa03e9357
[   71.099108] RDX: 0000000000000002 RSI: ffffffffa034dba0 RDI: ffff88084cad12f8
[   71.099118] RBP: 0000000000000002 R08: ffff88085344ca90 R09: 0000000000000001
[   71.099128] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88084cad0000
[   71.099139] R13: ffffffffa034dba0 R14: 00000000fffffff5 R15: ffff88084cad12b0
[   71.099151] FS:  00007f7f24ae2740(0000) GS:ffff88085e200000(0000) knlGS:0000000000000000
[   71.099162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.099171] CR2: 0000000000000008 CR3: 0000000855f48001 CR4: 00000000003606e0
[   71.099182] Call Trace:
[   71.099246]  intel_uc_init_hw+0xc8/0x520 [i915]
[   71.099303]  i915_gem_init_hw+0x11f/0x2d0 [i915]
[   71.099364]  i915_gem_init+0x2b9/0x640 [i915]
[   71.099413]  i915_driver_load+0xb74/0x1110 [i915]
[   71.099462]  i915_pci_probe+0x2e/0x90 [i915]
[   71.099476]  pci_device_probe+0xa1/0x130
[   71.099488]  driver_probe_device+0x302/0x470
[   71.099502]  __driver_attach+0xb9/0xe0
[   71.099513]  ? driver_probe_device+0x470/0x470
[   71.099525]  ? driver_probe_device+0x470/0x470
[   71.099538]  bus_for_each_dev+0x64/0x90
[   71.099550]  bus_add_driver+0x164/0x260
[   71.099561]  ? 0xffffffffa04d6000
[   71.099572]  driver_register+0x57/0xc0
[   71.099582]  ? 0xffffffffa04d6000
[   71.099593]  do_one_initcall+0x3b/0x160
[   71.099606]  ? kmem_cache_alloc_trace+0x1c3/0x2a0
[   71.099621]  do_init_module+0x5b/0x1f9
[   71.099635]  load_module+0x2467/0x2a70
[   71.099654]  ? SyS_finit_module+0xbd/0xe0
[   71.099668]  SyS_finit_module+0xbd/0xe0
[   71.099682]  do_syscall_64+0x73/0x1c0
[   71.099694]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
[   71.099706] RIP: 0033:0x7f7f23fb40d9
[   71.099717] RSP: 002b:00007ffda7d67ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   71.099734] RAX: ffffffffffffffda RBX: 000055f96e2a8870 RCX: 00007f7f23fb40d9
[   71.099748] RDX: 0000000000000000 RSI: 000055f96e2a8260 RDI: 0000000000000003
[   71.099763] RBP: 000055f96e2a8260 R08: 0000000000000000 R09: 00007ffda7d68088
[   71.099777] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
[   71.099791] R13: 000055f96e2a8830 R14: 0000000000000000 R15: 000055f96e2a8260
[   71.099810] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 41 54 49 89 f5 55 53 48 c7 c1 57 93 3e a0 48 8b 47 10 48 89 fb 4c 8b 07 <48> 8b 68 08 8b 47 28 85 c0 74 15 83 f8 01 48 c7 c1 5b 93 3e a0
[   71.100004] RIP: intel_uc_fw_upload+0x1f/0x360 [i915] RSP: ffffc90000417aa0
[   71.100020] CR2: 0000000000000008
[   71.100031] ---[ end trace d8ac93c30ceff5b2 ]--

Fixes: 6b0478fb722a ("drm/i915: Implement dynamic GuC WOPCM offset and size calculation")

v2: don't assume it is always GuC FW (Michal)
v3: added a new variable to avoid exceeding the number of characters in the
line (Michal)

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reported-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jackie Li <yaodong.li@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jackie Li <yaodong.li@intel.com>

---
 drivers/gpu/drm/i915/intel_uc_fw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

sagar.a.kamble@intel.com March 23, 2018, 12:07 p.m. UTC | #1
On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
> If GuC firmware is not available on the system and we load i915 with enable
> GuC, then we hit this null pointer dereference issue:
Patch looks good but I have query on usage of enable_guc modparam.
enable_guc modparam will enable GuC/HuC only if firmware is available.
If user sets to forcefully enable GuC/HuC on systems that don't have 
firmware then it is user's fault.
Michal, could you please clarify.

Thanks,
Sagar
>
> [   71.098873] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [   71.098938] IP: intel_uc_fw_upload+0x1f/0x360 [i915]
> [   71.098947] PGD 0 P4D 0
> [   71.098956] Oops: 0000 [#1] PREEMPT SMP PTI
> [   71.098965] Modules linked in: i915(O+) netconsole x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me i2c_i801 prime_numbers mei [last unloaded: i915]
> [   71.099005] CPU: 2 PID: 1167 Comm: insmod Tainted: G     U  W  O     4.16.0-rc1+ #337
> [   71.099018] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0065.2018.0103.1000 01/03/2018
> [   71.099077] RIP: 0010:intel_uc_fw_upload+0x1f/0x360 [i915]
> [   71.099087] RSP: 0018:ffffc90000417aa0 EFLAGS: 00010282
> [   71.099097] RAX: 0000000000000000 RBX: ffff88084cad12f8 RCX: ffffffffa03e9357
> [   71.099108] RDX: 0000000000000002 RSI: ffffffffa034dba0 RDI: ffff88084cad12f8
> [   71.099118] RBP: 0000000000000002 R08: ffff88085344ca90 R09: 0000000000000001
> [   71.099128] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88084cad0000
> [   71.099139] R13: ffffffffa034dba0 R14: 00000000fffffff5 R15: ffff88084cad12b0
> [   71.099151] FS:  00007f7f24ae2740(0000) GS:ffff88085e200000(0000) knlGS:0000000000000000
> [   71.099162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   71.099171] CR2: 0000000000000008 CR3: 0000000855f48001 CR4: 00000000003606e0
> [   71.099182] Call Trace:
> [   71.099246]  intel_uc_init_hw+0xc8/0x520 [i915]
> [   71.099303]  i915_gem_init_hw+0x11f/0x2d0 [i915]
> [   71.099364]  i915_gem_init+0x2b9/0x640 [i915]
> [   71.099413]  i915_driver_load+0xb74/0x1110 [i915]
> [   71.099462]  i915_pci_probe+0x2e/0x90 [i915]
> [   71.099476]  pci_device_probe+0xa1/0x130
> [   71.099488]  driver_probe_device+0x302/0x470
> [   71.099502]  __driver_attach+0xb9/0xe0
> [   71.099513]  ? driver_probe_device+0x470/0x470
> [   71.099525]  ? driver_probe_device+0x470/0x470
> [   71.099538]  bus_for_each_dev+0x64/0x90
> [   71.099550]  bus_add_driver+0x164/0x260
> [   71.099561]  ? 0xffffffffa04d6000
> [   71.099572]  driver_register+0x57/0xc0
> [   71.099582]  ? 0xffffffffa04d6000
> [   71.099593]  do_one_initcall+0x3b/0x160
> [   71.099606]  ? kmem_cache_alloc_trace+0x1c3/0x2a0
> [   71.099621]  do_init_module+0x5b/0x1f9
> [   71.099635]  load_module+0x2467/0x2a70
> [   71.099654]  ? SyS_finit_module+0xbd/0xe0
> [   71.099668]  SyS_finit_module+0xbd/0xe0
> [   71.099682]  do_syscall_64+0x73/0x1c0
> [   71.099694]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
> [   71.099706] RIP: 0033:0x7f7f23fb40d9
> [   71.099717] RSP: 002b:00007ffda7d67ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [   71.099734] RAX: ffffffffffffffda RBX: 000055f96e2a8870 RCX: 00007f7f23fb40d9
> [   71.099748] RDX: 0000000000000000 RSI: 000055f96e2a8260 RDI: 0000000000000003
> [   71.099763] RBP: 000055f96e2a8260 R08: 0000000000000000 R09: 00007ffda7d68088
> [   71.099777] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
> [   71.099791] R13: 000055f96e2a8830 R14: 0000000000000000 R15: 000055f96e2a8260
> [   71.099810] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 41 54 49 89 f5 55 53 48 c7 c1 57 93 3e a0 48 8b 47 10 48 89 fb 4c 8b 07 <48> 8b 68 08 8b 47 28 85 c0 74 15 83 f8 01 48 c7 c1 5b 93 3e a0
> [   71.100004] RIP: intel_uc_fw_upload+0x1f/0x360 [i915] RSP: ffffc90000417aa0
> [   71.100020] CR2: 0000000000000008
> [   71.100031] ---[ end trace d8ac93c30ceff5b2 ]--
>
> Fixes: 6b0478fb722a ("drm/i915: Implement dynamic GuC WOPCM offset and size calculation")
>
> v2: don't assume it is always GuC FW (Michal)
> v3: added a new variable to avoid exceeding the number of characters in the
> line (Michal)
>
> Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
> Reported-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Jackie Li <yaodong.li@intel.com>
> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Reviewed-by: Jackie Li <yaodong.li@intel.com>
>
> ---
>   drivers/gpu/drm/i915/intel_uc_fw.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uc_fw.c b/drivers/gpu/drm/i915/intel_uc_fw.c
> index 30c73243f54d..93b0fc661b3a 100644
> --- a/drivers/gpu/drm/i915/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/intel_uc_fw.c
> @@ -199,9 +199,9 @@ int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
>   		       int (*xfer)(struct intel_uc_fw *uc_fw,
>   				   struct i915_vma *vma))
>   {
> -	struct drm_i915_private *i915 = to_i915(uc_fw->obj->base.dev);
>   	struct i915_vma *vma;
>   	int err;
> +	u32 ggtt_pin_bias;
>   
>   	DRM_DEBUG_DRIVER("%s fw load %s\n",
>   			 intel_uc_fw_type_repr(uc_fw->type), uc_fw->path);
> @@ -222,9 +222,9 @@ int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
>   		goto fail;
>   	}
>   
> +	ggtt_pin_bias = to_i915(uc_fw->obj->base.dev)->guc.ggtt_pin_bias;
>   	vma = i915_gem_object_ggtt_pin(uc_fw->obj, NULL, 0, 0,
> -				       PIN_OFFSET_BIAS |
> -				       i915->guc.ggtt_pin_bias);
> +				       PIN_OFFSET_BIAS | ggtt_pin_bias);
>   	if (IS_ERR(vma)) {
>   		err = PTR_ERR(vma);
>   		DRM_DEBUG_DRIVER("%s fw ggtt-pin err=%d\n",
Michal Wajdeczko March 23, 2018, 12:27 p.m. UTC | #2
On Fri, 23 Mar 2018 13:07:15 +0100, Sagar Arun Kamble  
<sagar.a.kamble@intel.com> wrote:

>
>
> On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
>> If GuC firmware is not available on the system and we load i915 with  
>> enable
>> GuC, then we hit this null pointer dereference issue:
> Patch looks good but I have query on usage of enable_guc modparam.
> enable_guc modparam will enable GuC/HuC only if firmware is available.

During modparam sanitization phase, code will only check for firmware
name, code will attempt to check if firmware file exists.

> If user sets to forcefully enable GuC/HuC on systems that don't have  
> firmware then it is user's fault.

Sure its user's fault, but event in such case we should just gracefully
abort driver load, without any NULL pointer BUG ;)

And note, that we will hit this bug not only when user force GuC with:
	enable_guc=1 guc_firmware_path=/does/not/exist

but also when user just specify auto mode:
	enable_guc=-1

when predefined firmwares are not present (not installed or removed)

/m

> Michal, could you please clarify.
>
Jackie Li March 23, 2018, 6:03 p.m. UTC | #3
On 03/23/2018 05:27 AM, Michal Wajdeczko wrote:
> On Fri, 23 Mar 2018 13:07:15 +0100, Sagar Arun Kamble 
> <sagar.a.kamble@intel.com> wrote:
>
>>
>>
>> On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
>>> If GuC firmware is not available on the system and we load i915 with 
>>> enable
>>> GuC, then we hit this null pointer dereference issue:
>> Patch looks good but I have query on usage of enable_guc modparam.
>> enable_guc modparam will enable GuC/HuC only if firmware is available.
>
> During modparam sanitization phase, code will only check for firmware
> name, code will attempt to check if firmware file exists.
Is it better if we won't even call into upload FW once we found the FW 
fetching was failed?
>
>> If user sets to forcefully enable GuC/HuC on systems that don't have 
>> firmware then it is user's fault.
>
> Sure its user's fault, but event in such case we should just gracefully
> abort driver load, without any NULL pointer BUG ;)
>
> And note, that we will hit this bug not only when user force GuC with:
>     enable_guc=1 guc_firmware_path=/does/not/exist
>
> but also when user just specify auto mode:
>     enable_guc=-1
>
> when predefined firmwares are not present (not installed or removed)
>
it feels like it's still user's fault even with the auto mode. why the 
user even set
the enable_guc to -1 after known the fact that the FWs are not present?
I mean should users be expected to know the fact that the auto mode
enable_guc = -1 will enable the GuC and HuC, if the guc_firmware_path 
was left
to be none?

I agree we should make sure correct pointer access. it's a non-excusable 
fault:)

The thing actually frustrated me was that we failed to screen such an error
with the CI, but suddenly it broke things and tests with incorrect 
configurations
(enable_guc set to 1 or -1 + No FW available). so maybe we should add a 
case to
cover such a case in CI as well if we did have such test configurations.

Thanks,
-Jackie
Michal Wajdeczko March 23, 2018, 6:26 p.m. UTC | #4
On Fri, 23 Mar 2018 19:03:47 +0100, Yaodong Li <yaodong.li@intel.com>  
wrote:

> On 03/23/2018 05:27 AM, Michal Wajdeczko wrote:
>> On Fri, 23 Mar 2018 13:07:15 +0100, Sagar Arun Kamble  
>> <sagar.a.kamble@intel.com> wrote:
>>
>>>
>>>
>>> On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
>>>> If GuC firmware is not available on the system and we load i915 with  
>>>> enable
>>>> GuC, then we hit this null pointer dereference issue:
>>> Patch looks good but I have query on usage of enable_guc modparam.
>>> enable_guc modparam will enable GuC/HuC only if firmware is available.
>>
>> During modparam sanitization phase, code will only check for firmware
>> name, code will attempt to check if firmware file exists.
> Is it better if we won't even call into upload FW once we found the FW  
> fetching was failed?
>>
>>> If user sets to forcefully enable GuC/HuC on systems that don't have  
>>> firmware then it is user's fault.
>>
>> Sure its user's fault, but event in such case we should just gracefully
>> abort driver load, without any NULL pointer BUG ;)
>>
>> And note, that we will hit this bug not only when user force GuC with:
>>     enable_guc=1 guc_firmware_path=/does/not/exist
>>
>> but also when user just specify auto mode:
>>     enable_guc=-1
>>
>> when predefined firmwares are not present (not installed or removed)
>>
> it feels like it's still user's fault even with the auto mode. why the  
> user even set
> the enable_guc to -1 after known the fact that the FWs are not present?
> I mean should users be expected to know the fact that the auto mode
> enable_guc = -1 will enable the GuC and HuC, if the guc_firmware_path  
> was left
> to be none?

GuC fw path will not be none, as we have some hardcoded paths in the code.
And we don't tell the user that after turning on 'auto' mode, he has to
do something special. Only later in case of fw fetch errors, we are just
printing error message with extra help:

"%s: Failed to fetch firmware %s (error %d)\n"
"%s: Firmware can be downloaded from %s\n"

>
> I agree we should make sure correct pointer access. it's a non-excusable  
> fault:)
>
> The thing actually frustrated me was that we failed to screen such an  
> error
> with the CI, but suddenly it broke things and tests with incorrect  
> configurations
> (enable_guc set to 1 or -1 + No FW available). so maybe we should add a  
> case to
> cover such a case in CI as well if we did have such test configurations.
>

I guess you can write such test on your own and submit patch to igt-dev ;)

/m
Jackie Li March 23, 2018, 6:40 p.m. UTC | #5
On 03/23/2018 11:26 AM, Michal Wajdeczko wrote:
> On Fri, 23 Mar 2018 19:03:47 +0100, Yaodong Li <yaodong.li@intel.com> 
> wrote:
>
>> On 03/23/2018 05:27 AM, Michal Wajdeczko wrote:
>>> On Fri, 23 Mar 2018 13:07:15 +0100, Sagar Arun Kamble 
>>> <sagar.a.kamble@intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
>>>>> If GuC firmware is not available on the system and we load i915 
>>>>> with enable
>>>>> GuC, then we hit this null pointer dereference issue:
>>>> Patch looks good but I have query on usage of enable_guc modparam.
>>>> enable_guc modparam will enable GuC/HuC only if firmware is available.
>>>
>>> During modparam sanitization phase, code will only check for firmware
>>> name, code will attempt to check if firmware file exists.
>> Is it better if we won't even call into upload FW once we found the 
>> FW fetching was failed?
>>>
>>>> If user sets to forcefully enable GuC/HuC on systems that don't 
>>>> have firmware then it is user's fault.
>>>
>>> Sure its user's fault, but event in such case we should just gracefully
>>> abort driver load, without any NULL pointer BUG ;)
>>>
>>> And note, that we will hit this bug not only when user force GuC with:
>>>     enable_guc=1 guc_firmware_path=/does/not/exist
>>>
>>> but also when user just specify auto mode:
>>>     enable_guc=-1
>>>
>>> when predefined firmwares are not present (not installed or removed)
>>>
>> it feels like it's still user's fault even with the auto mode. why 
>> the user even set
>> the enable_guc to -1 after known the fact that the FWs are not present?
>> I mean should users be expected to know the fact that the auto mode
>> enable_guc = -1 will enable the GuC and HuC, if the guc_firmware_path 
>> was left
>> to be none?
>
> GuC fw path will not be none, as we have some hardcoded paths in the 
> code.
> And we don't tell the user that after turning on 'auto' mode, he has to
> do something special. Only later in case of fw fetch errors, we are just
> printing error message with extra help:
>
> "%s: Failed to fetch firmware %s (error %d)\n"
> "%s: Firmware can be downloaded from %s\n"
>
In this case, back to my question: maybe it's a better idea to make 
things stops here
instead of continuing to call following functions such as uc_init, 
uc_init_hw -> fw_upload?
>>
>> I agree we should make sure correct pointer access. it's a 
>> non-excusable fault:)
>>
>> The thing actually frustrated me was that we failed to screen such an 
>> error
>> with the CI, but suddenly it broke things and tests with incorrect 
>> configurations
>> (enable_guc set to 1 or -1 + No FW available). so maybe we should add 
>> a case to
>> cover such a case in CI as well if we did have such test configurations.
>>
>
> I guess you can write such test on your own and submit patch to 
> igt-dev ;)
:-)

Regards,
-Jackie
Michal Wajdeczko March 23, 2018, 7:04 p.m. UTC | #6
On Fri, 23 Mar 2018 19:40:10 +0100, Yaodong Li <yaodong.li@intel.com>  
wrote:

> On 03/23/2018 11:26 AM, Michal Wajdeczko wrote:
>> On Fri, 23 Mar 2018 19:03:47 +0100, Yaodong Li <yaodong.li@intel.com>  
>> wrote:
>>
>>> On 03/23/2018 05:27 AM, Michal Wajdeczko wrote:
>>>> On Fri, 23 Mar 2018 13:07:15 +0100, Sagar Arun Kamble  
>>>> <sagar.a.kamble@intel.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
>>>>>> If GuC firmware is not available on the system and we load i915  
>>>>>> with enable
>>>>>> GuC, then we hit this null pointer dereference issue:
>>>>> Patch looks good but I have query on usage of enable_guc modparam.
>>>>> enable_guc modparam will enable GuC/HuC only if firmware is  
>>>>> available.
>>>>
>>>> During modparam sanitization phase, code will only check for firmware
>>>> name, code will attempt to check if firmware file exists.
>>> Is it better if we won't even call into upload FW once we found the FW  
>>> fetching was failed?
>>>>
>>>>> If user sets to forcefully enable GuC/HuC on systems that don't have  
>>>>> firmware then it is user's fault.
>>>>
>>>> Sure its user's fault, but event in such case we should just  
>>>> gracefully
>>>> abort driver load, without any NULL pointer BUG ;)
>>>>
>>>> And note, that we will hit this bug not only when user force GuC with:
>>>>     enable_guc=1 guc_firmware_path=/does/not/exist
>>>>
>>>> but also when user just specify auto mode:
>>>>     enable_guc=-1
>>>>
>>>> when predefined firmwares are not present (not installed or removed)
>>>>
>>> it feels like it's still user's fault even with the auto mode. why the  
>>> user even set
>>> the enable_guc to -1 after known the fact that the FWs are not present?
>>> I mean should users be expected to know the fact that the auto mode
>>> enable_guc = -1 will enable the GuC and HuC, if the guc_firmware_path  
>>> was left
>>> to be none?
>>
>> GuC fw path will not be none, as we have some hardcoded paths in the  
>> code.
>> And we don't tell the user that after turning on 'auto' mode, he has to
>> do something special. Only later in case of fw fetch errors, we are just
>> printing error message with extra help:
>>
>> "%s: Failed to fetch firmware %s (error %d)\n"
>> "%s: Firmware can be downloaded from %s\n"
>>
> In this case, back to my question: maybe it's a better idea to make  
> things stops here
> instead of continuing to call following functions such as uc_init,  
> uc_init_hw -> fw_upload?

We can stop at many places, but we should be still prepared for broken/
unsigned firmware, and signature will be verified during fw_upload

>>>
>>> I agree we should make sure correct pointer access. it's a  
>>> non-excusable fault:)
>>>
>>> The thing actually frustrated me was that we failed to screen such an  
>>> error
>>> with the CI, but suddenly it broke things and tests with incorrect  
>>> configurations
>>> (enable_guc set to 1 or -1 + No FW available). so maybe we should add  
>>> a case to
>>> cover such a case in CI as well if we did have such test  
>>> configurations.
>>>
>>
>> I guess you can write such test on your own and submit patch to igt-dev  
>> ;)
> :-)
>
> Regards,
> -Jackie
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_uc_fw.c b/drivers/gpu/drm/i915/intel_uc_fw.c
index 30c73243f54d..93b0fc661b3a 100644
--- a/drivers/gpu/drm/i915/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/intel_uc_fw.c
@@ -199,9 +199,9 @@  int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
 		       int (*xfer)(struct intel_uc_fw *uc_fw,
 				   struct i915_vma *vma))
 {
-	struct drm_i915_private *i915 = to_i915(uc_fw->obj->base.dev);
 	struct i915_vma *vma;
 	int err;
+	u32 ggtt_pin_bias;
 
 	DRM_DEBUG_DRIVER("%s fw load %s\n",
 			 intel_uc_fw_type_repr(uc_fw->type), uc_fw->path);
@@ -222,9 +222,9 @@  int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
 		goto fail;
 	}
 
+	ggtt_pin_bias = to_i915(uc_fw->obj->base.dev)->guc.ggtt_pin_bias;
 	vma = i915_gem_object_ggtt_pin(uc_fw->obj, NULL, 0, 0,
-				       PIN_OFFSET_BIAS |
-				       i915->guc.ggtt_pin_bias);
+				       PIN_OFFSET_BIAS | ggtt_pin_bias);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		DRM_DEBUG_DRIVER("%s fw ggtt-pin err=%d\n",