Message ID | 20180712202027.19801-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jul 12, 2018 at 09:20:27PM +0100, Chris Wilson wrote: > After aborting a module load, we may try and disable guc before we have > finished setting it. Long term plan is to ensure perfect onion unwind, > but in the short term we want to fix the oops to re-enable > drv_module_reload. > > [ 317.401239] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 > [ 317.401279] Oops: 0000 [#1] PREEMPT SMP PTI > [ 317.401294] CPU: 5 PID: 4275 Comm: drv_module_relo Tainted: G U 4.18.0-rc4-CI-CI_DRM_4476+ #1 > [ 317.401317] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018 > [ 317.401440] RIP: 0010:unreserve_doorbell+0x0/0x80 [i915] > [ 317.401454] Code: bb e0 48 8b 35 21 4d 18 00 49 c7 c0 a8 e5 62 a0 b9 cc 00 00 00 48 c7 c2 d8 41 5f a0 48 c7 c7 c9 f6 53 a0 e8 a2 3d c2 e0 0f 0b <0f> b7 47 30 66 3d 00 01 74 20 48 8b 57 18 48 0f a3 82 40 05 00 00 > [ 317.401602] RSP: 0018:ffffc900003d3da0 EFLAGS: 00010246 > [ 317.401619] RAX: ffffffff8223b300 RBX: 0000000000000000 RCX: 0000000000000000 > [ 317.401636] RDX: 0000001fffffffc0 RSI: ffff880219f115f0 RDI: 0000000000000000 > [ 317.401654] RBP: ffff880219f11838 R08: 0000000000000000 R09: 0000000000000000 > [ 317.401671] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880219f11300 > [ 317.401689] R13: ffff880219f17770 R14: ffff88022c1daef8 R15: ffffffffa06ae950 > [ 317.401707] FS: 00007febf77a9980(0000) GS:ffff880236d40000(0000) knlGS:0000000000000000 > [ 317.401727] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 317.401743] CR2: 0000000000000030 CR3: 0000000222072003 CR4: 00000000003606e0 > [ 317.401761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 317.401779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 317.401796] Call Trace: > [ 317.401894] guc_client_free+0x9/0x130 [i915] > [ 317.401993] intel_guc_submission_fini+0x50/0x90 [i915] > [ 317.402092] intel_uc_fini+0x34/0xd0 [i915] > [ 317.402179] i915_gem_fini+0x5c/0x100 [i915] > [ 317.402249] i915_driver_unload+0xd2/0x110 [i915] > [ 317.402321] i915_pci_remove+0x10/0x20 [i915] > [ 317.402341] pci_device_remove+0x36/0xb0 > [ 317.402357] device_release_driver_internal+0x185/0x250 > [ 317.402374] driver_detach+0x35/0x70 > [ 317.402390] bus_remove_driver+0x53/0xd0 > [ 317.402404] pci_unregister_driver+0x25/0xa0 > [ 317.402423] __se_sys_delete_module+0x162/0x210 > [ 317.402439] ? do_syscall_64+0xd/0x190 > [ 317.402454] do_syscall_64+0x55/0x190 > [ 317.402470] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 317.402485] RIP: 0033:0x7febf6e5d1b7 > [ 317.402496] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48 > [ 317.402646] RSP: 002b:00007fffb5e72798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 > [ 317.402667] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007febf6e5d1b7 > [ 317.402686] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562da1addd98 > [ 317.402703] RBP: 0000562da1addd30 R08: 0000562da1addd9c R09: 00007fffb5e727d8 > [ 317.402721] R10: 00007fffb5e71794 R11: 0000000000000206 R12: 0000562da0ff6470 > > Testcase: igt/drv_module_reload/basic-reload-inject > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Michał Winiarski <michal.winiarski@intel.com> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > --- > drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c > index cd51be8ff025..22367131d6a1 100644 > --- a/drivers/gpu/drm/i915/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c > @@ -1128,7 +1128,8 @@ static void guc_clients_destroy(struct intel_guc *guc) > guc_client_free(client); > > client = fetch_and_zero(&guc->execbuf_client); > - guc_client_free(client); > + if (client) > + guc_client_free(client); > } > > /* > -- > 2.18.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Rodrigo Vivi (2018-07-13 00:02:36) > On Thu, Jul 12, 2018 at 09:20:27PM +0100, Chris Wilson wrote: > > After aborting a module load, we may try and disable guc before we have > > finished setting it. Long term plan is to ensure perfect onion unwind, > > but in the short term we want to fix the oops to re-enable > > drv_module_reload. > > > > [ 317.401239] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 > > [ 317.401279] Oops: 0000 [#1] PREEMPT SMP PTI > > [ 317.401294] CPU: 5 PID: 4275 Comm: drv_module_relo Tainted: G U 4.18.0-rc4-CI-CI_DRM_4476+ #1 > > [ 317.401317] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018 > > [ 317.401440] RIP: 0010:unreserve_doorbell+0x0/0x80 [i915] > > [ 317.401454] Code: bb e0 48 8b 35 21 4d 18 00 49 c7 c0 a8 e5 62 a0 b9 cc 00 00 00 48 c7 c2 d8 41 5f a0 48 c7 c7 c9 f6 53 a0 e8 a2 3d c2 e0 0f 0b <0f> b7 47 30 66 3d 00 01 74 20 48 8b 57 18 48 0f a3 82 40 05 00 00 > > [ 317.401602] RSP: 0018:ffffc900003d3da0 EFLAGS: 00010246 > > [ 317.401619] RAX: ffffffff8223b300 RBX: 0000000000000000 RCX: 0000000000000000 > > [ 317.401636] RDX: 0000001fffffffc0 RSI: ffff880219f115f0 RDI: 0000000000000000 > > [ 317.401654] RBP: ffff880219f11838 R08: 0000000000000000 R09: 0000000000000000 > > [ 317.401671] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880219f11300 > > [ 317.401689] R13: ffff880219f17770 R14: ffff88022c1daef8 R15: ffffffffa06ae950 > > [ 317.401707] FS: 00007febf77a9980(0000) GS:ffff880236d40000(0000) knlGS:0000000000000000 > > [ 317.401727] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 317.401743] CR2: 0000000000000030 CR3: 0000000222072003 CR4: 00000000003606e0 > > [ 317.401761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 317.401779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 317.401796] Call Trace: > > [ 317.401894] guc_client_free+0x9/0x130 [i915] > > [ 317.401993] intel_guc_submission_fini+0x50/0x90 [i915] > > [ 317.402092] intel_uc_fini+0x34/0xd0 [i915] > > [ 317.402179] i915_gem_fini+0x5c/0x100 [i915] > > [ 317.402249] i915_driver_unload+0xd2/0x110 [i915] > > [ 317.402321] i915_pci_remove+0x10/0x20 [i915] > > [ 317.402341] pci_device_remove+0x36/0xb0 > > [ 317.402357] device_release_driver_internal+0x185/0x250 > > [ 317.402374] driver_detach+0x35/0x70 > > [ 317.402390] bus_remove_driver+0x53/0xd0 > > [ 317.402404] pci_unregister_driver+0x25/0xa0 > > [ 317.402423] __se_sys_delete_module+0x162/0x210 > > [ 317.402439] ? do_syscall_64+0xd/0x190 > > [ 317.402454] do_syscall_64+0x55/0x190 > > [ 317.402470] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [ 317.402485] RIP: 0033:0x7febf6e5d1b7 > > [ 317.402496] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48 > > [ 317.402646] RSP: 002b:00007fffb5e72798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 > > [ 317.402667] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007febf6e5d1b7 > > [ 317.402686] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562da1addd98 > > [ 317.402703] RBP: 0000562da1addd30 R08: 0000562da1addd9c R09: 00007fffb5e727d8 > > [ 317.402721] R10: 00007fffb5e71794 R11: 0000000000000206 R12: 0000562da0ff6470 > > > > Testcase: igt/drv_module_reload/basic-reload-inject > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Michał Winiarski <michal.winiarski@intel.com> > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> > > Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> And pushed, so now I just need to find an IGT patch so we can test drv_module_reload once more. -Chris
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index cd51be8ff025..22367131d6a1 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1128,7 +1128,8 @@ static void guc_clients_destroy(struct intel_guc *guc) guc_client_free(client); client = fetch_and_zero(&guc->execbuf_client); - guc_client_free(client); + if (client) + guc_client_free(client); } /*
After aborting a module load, we may try and disable guc before we have finished setting it. Long term plan is to ensure perfect onion unwind, but in the short term we want to fix the oops to re-enable drv_module_reload. [ 317.401239] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [ 317.401279] Oops: 0000 [#1] PREEMPT SMP PTI [ 317.401294] CPU: 5 PID: 4275 Comm: drv_module_relo Tainted: G U 4.18.0-rc4-CI-CI_DRM_4476+ #1 [ 317.401317] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018 [ 317.401440] RIP: 0010:unreserve_doorbell+0x0/0x80 [i915] [ 317.401454] Code: bb e0 48 8b 35 21 4d 18 00 49 c7 c0 a8 e5 62 a0 b9 cc 00 00 00 48 c7 c2 d8 41 5f a0 48 c7 c7 c9 f6 53 a0 e8 a2 3d c2 e0 0f 0b <0f> b7 47 30 66 3d 00 01 74 20 48 8b 57 18 48 0f a3 82 40 05 00 00 [ 317.401602] RSP: 0018:ffffc900003d3da0 EFLAGS: 00010246 [ 317.401619] RAX: ffffffff8223b300 RBX: 0000000000000000 RCX: 0000000000000000 [ 317.401636] RDX: 0000001fffffffc0 RSI: ffff880219f115f0 RDI: 0000000000000000 [ 317.401654] RBP: ffff880219f11838 R08: 0000000000000000 R09: 0000000000000000 [ 317.401671] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880219f11300 [ 317.401689] R13: ffff880219f17770 R14: ffff88022c1daef8 R15: ffffffffa06ae950 [ 317.401707] FS: 00007febf77a9980(0000) GS:ffff880236d40000(0000) knlGS:0000000000000000 [ 317.401727] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 317.401743] CR2: 0000000000000030 CR3: 0000000222072003 CR4: 00000000003606e0 [ 317.401761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 317.401779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 317.401796] Call Trace: [ 317.401894] guc_client_free+0x9/0x130 [i915] [ 317.401993] intel_guc_submission_fini+0x50/0x90 [i915] [ 317.402092] intel_uc_fini+0x34/0xd0 [i915] [ 317.402179] i915_gem_fini+0x5c/0x100 [i915] [ 317.402249] i915_driver_unload+0xd2/0x110 [i915] [ 317.402321] i915_pci_remove+0x10/0x20 [i915] [ 317.402341] pci_device_remove+0x36/0xb0 [ 317.402357] device_release_driver_internal+0x185/0x250 [ 317.402374] driver_detach+0x35/0x70 [ 317.402390] bus_remove_driver+0x53/0xd0 [ 317.402404] pci_unregister_driver+0x25/0xa0 [ 317.402423] __se_sys_delete_module+0x162/0x210 [ 317.402439] ? do_syscall_64+0xd/0x190 [ 317.402454] do_syscall_64+0x55/0x190 [ 317.402470] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 317.402485] RIP: 0033:0x7febf6e5d1b7 [ 317.402496] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48 [ 317.402646] RSP: 002b:00007fffb5e72798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 317.402667] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007febf6e5d1b7 [ 317.402686] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562da1addd98 [ 317.402703] RBP: 0000562da1addd30 R08: 0000562da1addd9c R09: 00007fffb5e727d8 [ 317.402721] R10: 00007fffb5e71794 R11: 0000000000000206 R12: 0000562da0ff6470 Testcase: igt/drv_module_reload/basic-reload-inject Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> --- drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)