diff mbox series

[01/36] drm/i915: Handle very early engine initialisation failure

Message ID 20200601072446.19548-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/36] drm/i915: Handle very early engine initialisation failure | expand

Commit Message

Chris Wilson June 1, 2020, 7:24 a.m. UTC
If we fail during engine setup, we may leave some engines not yet setup.
During the error cleanup, we have to be careful not to try and use the
uninitialise engines before discarding them.

[   16.136152] RIP: 0010:__flush_work+0x198/0x1b0
[   16.136168] Code: ff ff 8b 0b 48 8b 53 08 83 e1 08 48 0f ba 2b 03 80 c9 f0 e9 63 ff ff ff 0f 0b 48 83 c4 48 44 89 f0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 f6 e9 62 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f
[   16.136186] RSP: 0018:ffffc900003bb928 EFLAGS: 00010246
[   16.136201] RAX: 0000000000000000 RBX: ffff88844f392168 RCX: 0000000000000000
[   16.136216] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88844f392168
[   16.136231] RBP: ffff88844f392130 R08: 0000000000000000 R09: 0000000000000001
[   16.136246] R10: ffff888441e31e40 R11: ffff88845e329c70 R12: ffff88844f796988
[   16.136261] R13: ffff888441e4fb80 R14: 0000000000000001 R15: ffff88844f790000
[   16.136388] FS:  00007fecbd208880(0000) GS:ffff88845e380000(0000) knlGS:0000000000000000
[   16.136405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.136420] CR2: 00007ff3ce748f90 CR3: 0000000457a6a001 CR4: 00000000000606e0
[   16.136437] Call Trace:
[   16.136456]  ? try_to_del_timer_sync+0x3a/0x50
[   16.136529]  intel_wakeref_wait_for_idle+0x87/0xb0 [i915]
[   16.136606]  ? intel_engines_release+0x68/0xc0 [i915]
[   16.136680]  intel_engines_release+0x49/0xc0 [i915]
[   16.136757]  intel_gt_init+0x2f4/0x5e0 [i915]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Mika Kuoppala June 1, 2020, 11:31 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail during engine setup, we may leave some engines not yet setup.
> During the error cleanup, we have to be careful not to try and use the
> uninitialise engines before discarding them.
>
> [   16.136152] RIP: 0010:__flush_work+0x198/0x1b0
> [   16.136168] Code: ff ff 8b 0b 48 8b 53 08 83 e1 08 48 0f ba 2b 03 80 c9 f0 e9 63 ff ff ff 0f 0b 48 83 c4 48 44 89 f0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 f6 e9 62 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f
> [   16.136186] RSP: 0018:ffffc900003bb928 EFLAGS: 00010246
> [   16.136201] RAX: 0000000000000000 RBX: ffff88844f392168 RCX: 0000000000000000
> [   16.136216] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88844f392168
> [   16.136231] RBP: ffff88844f392130 R08: 0000000000000000 R09: 0000000000000001
> [   16.136246] R10: ffff888441e31e40 R11: ffff88845e329c70 R12: ffff88844f796988
> [   16.136261] R13: ffff888441e4fb80 R14: 0000000000000001 R15: ffff88844f790000
> [   16.136388] FS:  00007fecbd208880(0000) GS:ffff88845e380000(0000) knlGS:0000000000000000
> [   16.136405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   16.136420] CR2: 00007ff3ce748f90 CR3: 0000000457a6a001 CR4: 00000000000606e0
> [   16.136437] Call Trace:
> [   16.136456]  ? try_to_del_timer_sync+0x3a/0x50
> [   16.136529]  intel_wakeref_wait_for_idle+0x87/0xb0 [i915]
> [   16.136606]  ? intel_engines_release+0x68/0xc0 [i915]
> [   16.136680]  intel_engines_release+0x49/0xc0 [i915]
> [   16.136757]  intel_gt_init+0x2f4/0x5e0 [i915]
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index da5b61085257..c8c14981eb5d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -414,12 +414,12 @@ void intel_engines_release(struct intel_gt *gt)
>  
>  	/* Decouple the backend; but keep the layout for late GPU resets */
>  	for_each_engine(engine, gt, id) {
> -		intel_wakeref_wait_for_idle(&engine->wakeref);
> -		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
> -
>  		if (!engine->release)
>  			continue;
>  
> +		intel_wakeref_wait_for_idle(&engine->wakeref);
> +		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
> +
>  		engine->release(engine);
>  		engine->release = NULL;
>  
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson June 1, 2020, 11:39 a.m. UTC | #2
Quoting Patchwork (2020-06-01 12:00:40)
> == Series Details ==
> 
> Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
> URL   : https://patchwork.freedesktop.org/series/77857/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_8560_full -> Patchwork_17828_full
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_17828_full absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_17828_full, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_17828_full:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt@runner@aborted:
>     - shard-hsw:          NOTRUN -> [FAIL][1]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw6/igt@runner@aborted.html
> 
>   
> #### Suppressed ####
> 
>   The following results come from untrusted machines, tests, or statuses.
>   They do not affect the overall result.
> 
>   * {igt@gem_exec_fence@parallel@vecs0}:
>     - shard-hsw:          [PASS][2] -> [FAIL][3] +3 similar issues
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw1/igt@gem_exec_fence@parallel@vecs0.html
>    [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw2/igt@gem_exec_fence@parallel@vecs0.html

Sigh. They dropped the memory compare from MI_SEMAPHORE_MBOX in Haswell.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index da5b61085257..c8c14981eb5d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -414,12 +414,12 @@  void intel_engines_release(struct intel_gt *gt)
 
 	/* Decouple the backend; but keep the layout for late GPU resets */
 	for_each_engine(engine, gt, id) {
-		intel_wakeref_wait_for_idle(&engine->wakeref);
-		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
-
 		if (!engine->release)
 			continue;
 
+		intel_wakeref_wait_for_idle(&engine->wakeref);
+		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
+
 		engine->release(engine);
 		engine->release = NULL;