diff mbox

drm/i915/guc: Protect against NULL client dereference in error path

Message ID 20180712202027.19801-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 12, 2018, 8:20 p.m. UTC
After aborting a module load, we may try and disable guc before we have
finished setting it. Long term plan is to ensure perfect onion unwind,
but in the short term we want to fix the oops to re-enable
drv_module_reload.

[  317.401239] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[  317.401279] Oops: 0000 [#1] PREEMPT SMP PTI
[  317.401294] CPU: 5 PID: 4275 Comm: drv_module_relo Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4476+ #1
[  317.401317] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
[  317.401440] RIP: 0010:unreserve_doorbell+0x0/0x80 [i915]
[  317.401454] Code: bb e0 48 8b 35 21 4d 18 00 49 c7 c0 a8 e5 62 a0 b9 cc 00 00 00 48 c7 c2 d8 41 5f a0 48 c7 c7 c9 f6 53 a0 e8 a2 3d c2 e0 0f 0b <0f> b7 47 30 66 3d 00 01 74 20 48 8b 57 18 48 0f a3 82 40 05 00 00
[  317.401602] RSP: 0018:ffffc900003d3da0 EFLAGS: 00010246
[  317.401619] RAX: ffffffff8223b300 RBX: 0000000000000000 RCX: 0000000000000000
[  317.401636] RDX: 0000001fffffffc0 RSI: ffff880219f115f0 RDI: 0000000000000000
[  317.401654] RBP: ffff880219f11838 R08: 0000000000000000 R09: 0000000000000000
[  317.401671] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880219f11300
[  317.401689] R13: ffff880219f17770 R14: ffff88022c1daef8 R15: ffffffffa06ae950
[  317.401707] FS:  00007febf77a9980(0000) GS:ffff880236d40000(0000) knlGS:0000000000000000
[  317.401727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  317.401743] CR2: 0000000000000030 CR3: 0000000222072003 CR4: 00000000003606e0
[  317.401761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  317.401779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  317.401796] Call Trace:
[  317.401894]  guc_client_free+0x9/0x130 [i915]
[  317.401993]  intel_guc_submission_fini+0x50/0x90 [i915]
[  317.402092]  intel_uc_fini+0x34/0xd0 [i915]
[  317.402179]  i915_gem_fini+0x5c/0x100 [i915]
[  317.402249]  i915_driver_unload+0xd2/0x110 [i915]
[  317.402321]  i915_pci_remove+0x10/0x20 [i915]
[  317.402341]  pci_device_remove+0x36/0xb0
[  317.402357]  device_release_driver_internal+0x185/0x250
[  317.402374]  driver_detach+0x35/0x70
[  317.402390]  bus_remove_driver+0x53/0xd0
[  317.402404]  pci_unregister_driver+0x25/0xa0
[  317.402423]  __se_sys_delete_module+0x162/0x210
[  317.402439]  ? do_syscall_64+0xd/0x190
[  317.402454]  do_syscall_64+0x55/0x190
[  317.402470]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  317.402485] RIP: 0033:0x7febf6e5d1b7
[  317.402496] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
[  317.402646] RSP: 002b:00007fffb5e72798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  317.402667] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007febf6e5d1b7
[  317.402686] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562da1addd98
[  317.402703] RBP: 0000562da1addd30 R08: 0000562da1addd9c R09: 00007fffb5e727d8
[  317.402721] R10: 00007fffb5e71794 R11: 0000000000000206 R12: 0000562da0ff6470

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Rodrigo Vivi July 12, 2018, 11:02 p.m. UTC | #1
On Thu, Jul 12, 2018 at 09:20:27PM +0100, Chris Wilson wrote:
> After aborting a module load, we may try and disable guc before we have
> finished setting it. Long term plan is to ensure perfect onion unwind,
> but in the short term we want to fix the oops to re-enable
> drv_module_reload.
> 
> [  317.401239] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
> [  317.401279] Oops: 0000 [#1] PREEMPT SMP PTI
> [  317.401294] CPU: 5 PID: 4275 Comm: drv_module_relo Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4476+ #1
> [  317.401317] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
> [  317.401440] RIP: 0010:unreserve_doorbell+0x0/0x80 [i915]
> [  317.401454] Code: bb e0 48 8b 35 21 4d 18 00 49 c7 c0 a8 e5 62 a0 b9 cc 00 00 00 48 c7 c2 d8 41 5f a0 48 c7 c7 c9 f6 53 a0 e8 a2 3d c2 e0 0f 0b <0f> b7 47 30 66 3d 00 01 74 20 48 8b 57 18 48 0f a3 82 40 05 00 00
> [  317.401602] RSP: 0018:ffffc900003d3da0 EFLAGS: 00010246
> [  317.401619] RAX: ffffffff8223b300 RBX: 0000000000000000 RCX: 0000000000000000
> [  317.401636] RDX: 0000001fffffffc0 RSI: ffff880219f115f0 RDI: 0000000000000000
> [  317.401654] RBP: ffff880219f11838 R08: 0000000000000000 R09: 0000000000000000
> [  317.401671] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880219f11300
> [  317.401689] R13: ffff880219f17770 R14: ffff88022c1daef8 R15: ffffffffa06ae950
> [  317.401707] FS:  00007febf77a9980(0000) GS:ffff880236d40000(0000) knlGS:0000000000000000
> [  317.401727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  317.401743] CR2: 0000000000000030 CR3: 0000000222072003 CR4: 00000000003606e0
> [  317.401761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  317.401779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  317.401796] Call Trace:
> [  317.401894]  guc_client_free+0x9/0x130 [i915]
> [  317.401993]  intel_guc_submission_fini+0x50/0x90 [i915]
> [  317.402092]  intel_uc_fini+0x34/0xd0 [i915]
> [  317.402179]  i915_gem_fini+0x5c/0x100 [i915]
> [  317.402249]  i915_driver_unload+0xd2/0x110 [i915]
> [  317.402321]  i915_pci_remove+0x10/0x20 [i915]
> [  317.402341]  pci_device_remove+0x36/0xb0
> [  317.402357]  device_release_driver_internal+0x185/0x250
> [  317.402374]  driver_detach+0x35/0x70
> [  317.402390]  bus_remove_driver+0x53/0xd0
> [  317.402404]  pci_unregister_driver+0x25/0xa0
> [  317.402423]  __se_sys_delete_module+0x162/0x210
> [  317.402439]  ? do_syscall_64+0xd/0x190
> [  317.402454]  do_syscall_64+0x55/0x190
> [  317.402470]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  317.402485] RIP: 0033:0x7febf6e5d1b7
> [  317.402496] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> [  317.402646] RSP: 002b:00007fffb5e72798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [  317.402667] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007febf6e5d1b7
> [  317.402686] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562da1addd98
> [  317.402703] RBP: 0000562da1addd30 R08: 0000562da1addd9c R09: 00007fffb5e727d8
> [  317.402721] R10: 00007fffb5e71794 R11: 0000000000000206 R12: 0000562da0ff6470
> 
> Testcase: igt/drv_module_reload/basic-reload-inject
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> ---
>  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index cd51be8ff025..22367131d6a1 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -1128,7 +1128,8 @@ static void guc_clients_destroy(struct intel_guc *guc)
>  		guc_client_free(client);
>  
>  	client = fetch_and_zero(&guc->execbuf_client);
> -	guc_client_free(client);
> +	if (client)
> +		guc_client_free(client);
>  }
>  
>  /*
> -- 
> 2.18.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson July 13, 2018, 8:17 a.m. UTC | #2
Quoting Rodrigo Vivi (2018-07-13 00:02:36)
> On Thu, Jul 12, 2018 at 09:20:27PM +0100, Chris Wilson wrote:
> > After aborting a module load, we may try and disable guc before we have
> > finished setting it. Long term plan is to ensure perfect onion unwind,
> > but in the short term we want to fix the oops to re-enable
> > drv_module_reload.
> > 
> > [  317.401239] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
> > [  317.401279] Oops: 0000 [#1] PREEMPT SMP PTI
> > [  317.401294] CPU: 5 PID: 4275 Comm: drv_module_relo Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4476+ #1
> > [  317.401317] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
> > [  317.401440] RIP: 0010:unreserve_doorbell+0x0/0x80 [i915]
> > [  317.401454] Code: bb e0 48 8b 35 21 4d 18 00 49 c7 c0 a8 e5 62 a0 b9 cc 00 00 00 48 c7 c2 d8 41 5f a0 48 c7 c7 c9 f6 53 a0 e8 a2 3d c2 e0 0f 0b <0f> b7 47 30 66 3d 00 01 74 20 48 8b 57 18 48 0f a3 82 40 05 00 00
> > [  317.401602] RSP: 0018:ffffc900003d3da0 EFLAGS: 00010246
> > [  317.401619] RAX: ffffffff8223b300 RBX: 0000000000000000 RCX: 0000000000000000
> > [  317.401636] RDX: 0000001fffffffc0 RSI: ffff880219f115f0 RDI: 0000000000000000
> > [  317.401654] RBP: ffff880219f11838 R08: 0000000000000000 R09: 0000000000000000
> > [  317.401671] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880219f11300
> > [  317.401689] R13: ffff880219f17770 R14: ffff88022c1daef8 R15: ffffffffa06ae950
> > [  317.401707] FS:  00007febf77a9980(0000) GS:ffff880236d40000(0000) knlGS:0000000000000000
> > [  317.401727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  317.401743] CR2: 0000000000000030 CR3: 0000000222072003 CR4: 00000000003606e0
> > [  317.401761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  317.401779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  317.401796] Call Trace:
> > [  317.401894]  guc_client_free+0x9/0x130 [i915]
> > [  317.401993]  intel_guc_submission_fini+0x50/0x90 [i915]
> > [  317.402092]  intel_uc_fini+0x34/0xd0 [i915]
> > [  317.402179]  i915_gem_fini+0x5c/0x100 [i915]
> > [  317.402249]  i915_driver_unload+0xd2/0x110 [i915]
> > [  317.402321]  i915_pci_remove+0x10/0x20 [i915]
> > [  317.402341]  pci_device_remove+0x36/0xb0
> > [  317.402357]  device_release_driver_internal+0x185/0x250
> > [  317.402374]  driver_detach+0x35/0x70
> > [  317.402390]  bus_remove_driver+0x53/0xd0
> > [  317.402404]  pci_unregister_driver+0x25/0xa0
> > [  317.402423]  __se_sys_delete_module+0x162/0x210
> > [  317.402439]  ? do_syscall_64+0xd/0x190
> > [  317.402454]  do_syscall_64+0x55/0x190
> > [  317.402470]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [  317.402485] RIP: 0033:0x7febf6e5d1b7
> > [  317.402496] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> > [  317.402646] RSP: 002b:00007fffb5e72798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> > [  317.402667] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007febf6e5d1b7
> > [  317.402686] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562da1addd98
> > [  317.402703] RBP: 0000562da1addd30 R08: 0000562da1addd9c R09: 00007fffb5e727d8
> > [  317.402721] R10: 00007fffb5e71794 R11: 0000000000000206 R12: 0000562da0ff6470
> > 
> > Testcase: igt/drv_module_reload/basic-reload-inject
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

And pushed, so now I just need to find an IGT patch so we can test
drv_module_reload once more.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index cd51be8ff025..22367131d6a1 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -1128,7 +1128,8 @@  static void guc_clients_destroy(struct intel_guc *guc)
 		guc_client_free(client);
 
 	client = fetch_and_zero(&guc->execbuf_client);
-	guc_client_free(client);
+	if (client)
+		guc_client_free(client);
 }
 
 /*