mbox series

[v2,0/3] add more probe failures

Message ID 20190731174833.22080-1-michal.wajdeczko@intel.com (mailing list archive)
Headers show
Series add more probe failures | expand

Message

Michal Wajdeczko July 31, 2019, 5:48 p.m. UTC
v2: rebased

Michal Wajdeczko (3):
  drm/i915: Add i915 to i915_inject_probe_failure
  drm/i915/uc: Inject probe errors into intel_uc_init_hw
  drm/i915/wopcm: Don't fail on WOPCM partitioning failure

 .../gpu/drm/i915/display/intel_connector.c    |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_huc.c        |  4 +++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 23 +++++++++++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  5 ++++
 drivers/gpu/drm/i915/i915_drv.c               | 27 +++++++++--------
 drivers/gpu/drm/i915/i915_drv.h               | 12 ++++----
 drivers/gpu/drm/i915/i915_gem.c               | 18 ++++-------
 drivers/gpu/drm/i915/i915_pci.c               |  2 +-
 drivers/gpu/drm/i915/intel_gvt.c              |  2 +-
 drivers/gpu/drm/i915/intel_uncore.c           |  2 +-
 drivers/gpu/drm/i915/intel_wopcm.c            | 30 +++++++++----------
 drivers/gpu/drm/i915/intel_wopcm.h            |  2 +-
 13 files changed, 79 insertions(+), 52 deletions(-)

Comments

Chris Wilson Aug. 1, 2019, 3:27 p.m. UTC | #1
Quoting Patchwork (2019-08-01 16:22:20)
> == Series Details ==
> 
> Series: add more probe failures (rev3)
> URL   : https://patchwork.freedesktop.org/series/64390/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_6602 -> Patchwork_13830
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**

> #### Possible regressions ####
> 
>   * igt@i915_module_load@reload-with-fault-injection:
>     - fi-cfl-guc:         [PASS][1] -> [INCOMPLETE][2]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6602/fi-cfl-guc/igt@i915_module_load@reload-with-fault-injection.html
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13830/fi-cfl-guc/igt@i915_module_load@reload-with-fault-injection.html
>     - fi-skl-guc:         [PASS][3] -> [INCOMPLETE][4]
>    [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6602/fi-skl-guc/igt@i915_module_load@reload-with-fault-injection.html
>    [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13830/fi-skl-guc/igt@i915_module_load@reload-with-fault-injection.html
>     - fi-kbl-guc:         [PASS][5] -> [INCOMPLETE][6]
>    [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6602/fi-kbl-guc/igt@i915_module_load@reload-with-fault-injection.html
>    [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13830/fi-kbl-guc/igt@i915_module_load@reload-with-fault-injection.html

<7> [229.652594] [drm:intel_uc_fw_fetch [i915]] GuC fw size 182912 ptr 0000000070788d09
<7> [229.652626] [drm:intel_uc_fw_fetch [i915]] GuC fw version 33.0 (wanted 33.0)
<7> [229.652895] [drm:intel_uc_fw_fetch [i915]] HuC fw size 218688 ptr 0000000070788d09
<7> [229.652925] [drm:intel_uc_fw_fetch [i915]] HuC fw version 2.0 (wanted 2.0)
<7> [229.653095] [drm:intel_wopcm_init [i915]] Calculated GuC WOPCM Region: [240KiB, 784KiB)
<7> [229.653142] [drm:i915_init_ggtt [i915]] clearing unused GTT space: [1000, fee00000]
<7> [229.653522] [drm:intel_engines_setup [i915]] Initialized 5 engine workarounds on rcs0
<7> [229.653556] [drm:intel_engines_setup [i915]] Initialized 4 whitelist workarounds on rcs0
<7> [229.653609] [drm:__intel_engine_init_ctx_wa [i915]] Initialized 14 context workarounds on rcs0
<7> [229.653920] [drm:i915_gem_contexts_init [i915]] logical context support initialized
<7> [229.654996] [drm:intel_guc_log_create [i915]] guc_log_level=5 (enabled, verbose:yes, verbosity:3)
<7> [229.655251] [drm:intel_guc_init [i915]] param[ 0] = 0x0
<7> [229.655284] [drm:intel_guc_init [i915]] param[ 1] = 0xc9fd3
<7> [229.655314] [drm:intel_guc_init [i915]] param[ 2] = 0x0
<7> [229.655344] [drm:intel_guc_init [i915]] param[ 3] = 0x4000
<7> [229.655374] [drm:intel_guc_init [i915]] param[ 4] = 0x3
<7> [229.655418] [drm:intel_guc_init [i915]] param[ 5] = 0x1b8
<7> [229.655446] [drm:intel_guc_init [i915]] param[ 6] = 0x0
<7> [229.655473] [drm:intel_guc_init [i915]] param[ 7] = 0x0
<7> [229.655501] [drm:intel_guc_init [i915]] param[ 8] = 0x0
<7> [229.655528] [drm:intel_guc_init [i915]] param[ 9] = 0x0
<7> [229.655556] [drm:intel_guc_init [i915]] param[10] = 0x0
<7> [229.655583] [drm:intel_guc_init [i915]] param[11] = 0x0
<7> [229.655610] [drm:intel_guc_init [i915]] param[12] = 0x0
<7> [229.655637] [drm:intel_guc_init [i915]] param[13] = 0x0
<7> [229.655762] [drm:intel_uc_fw_upload [i915]] HuC fw load i915/kbl_huc_ver02_00_1810.bin
<7> [229.656489] [drm:intel_uc_fw_upload [i915]] HuC fw xfer completed
<6> [229.656490] [drm] HuC: Loaded firmware i915/kbl_huc_ver02_00_1810.bin (version 2.0)
<7> [229.656579] [drm:intel_uc_fw_upload [i915]] GuC fw load i915/kbl_guc_33.0.0.bin
<6> [229.656639] i915 0000:00:02.0: [drm:__i915_inject_load_error [i915]] Injecting failure -8 at checkpoint 15 [intel_uc_fw_upload:427]
<7> [229.656688] [drm:intel_uc_init_hw [i915]] GuC fw load failed: -8; will reset and retry 2 more time(s)
<7> [229.656739] [drm:intel_uc_fw_upload [i915]] HuC fw load i915/kbl_huc_ver02_00_1810.bin
<3> [229.656740] intel_uc_fw_upload:425 GEM_BUG_ON(intel_uc_fw_is_loaded(uc_fw))
<4> [229.656798] ------------[ cut here ]------------
<2> [229.656800] kernel BUG at drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c:425!
<4> [229.656813] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [229.656817] CPU: 1 PID: 3279 Comm: i915_module_loa Tainted: G     U            5.3.0-rc2-CI-Patchwork_13830+ #1
<4> [229.656822] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
<4> [229.656857] RIP: 0010:intel_uc_fw_upload+0x314/0x3b0 [i915]
<4> [229.656861] Code: 0b 51 ed e0 48 8b 35 a3 13 1c 00 49 c7 c0 ac 69 38 a0 b9 a9 01 00 00 48 c7 c2 a0 df 32 a0 48 c7 c7 cf 93 25 a0 e8 3c 34 f4 e0 <0f> 0b 48 c7 c1 98 e5 35 a0 ba 79 00 00 00 48 c7 c6 c0 df 32 a0 48
<4> [229.656870] RSP: 0018:ffffc90000aff948 EFLAGS: 00010286
<4> [229.656873] RAX: 000000000000000e RBX: ffff8882214fc4b0 RCX: 0000000000000000
<4> [229.656877] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 0000000000000034
<4> [229.656881] RBP: ffff8882214fc0a8 R08: 0000000000000000 R09: 0000000000000034
<4> [229.656885] R10: 0000000000000000 R11: ffff888264cf0008 R12: 0000000000000001
<4> [229.656889] R13: ffff8882214fc090 R14: 0000000000000200 R15: 0000000000000000
<4> [229.656893] FS:  00007f806f489e40(0000) GS:ffff888266680000(0000) knlGS:0000000000000000
<4> [229.656897] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [229.656900] CR2: 0000556d82156598 CR3: 00000001fbbca004 CR4: 00000000003606e0
<4> [229.656904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [229.656908] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [229.656912] Call Trace:
<4> [229.656944]  intel_uc_init_hw+0x1a3/0x7d0 [i915]
<4> [229.656977]  i915_gem_init_hw+0x15c/0x260 [i915]
<4> [229.657008]  i915_gem_init+0x39a/0xa80 [i915]
<4> [229.657034]  i915_driver_probe+0xe0b/0x18c0 [i915]
<4> [229.657040]  ? __pm_runtime_resume+0x4f/0x80
<4> [229.657066]  i915_pci_probe+0x43/0x1b0 [i915]
<4> [229.657070]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [229.657075]  pci_device_probe+0x9e/0x120
<4> [229.657079]  really_probe+0xea/0x3d0
<4> [229.657082]  driver_probe_device+0x10b/0x120
<4> [229.657085]  device_driver_attach+0x4a/0x50
<4> [229.657089]  __driver_attach+0x97/0x130
<4> [229.657092]  ? device_driver_attach+0x50/0x50
<4> [229.657095]  bus_for_each_dev+0x74/0xc0
<4> [229.657099]  bus_add_driver+0x13f/0x210
<4> [229.657102]  ? 0xffffffffa04bd000
<4> [229.657105]  driver_register+0x56/0xe0
<4> [229.657107]  ? 0xffffffffa04bd000
<4> [229.657111]  do_one_initcall+0x58/0x300
<4> [229.657114]  ? do_init_module+0x1d/0x1f6
<4> [229.657118]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [229.657122]  ? kmem_cache_alloc_trace+0x2d1/0x300
<4> [229.657126]  do_init_module+0x56/0x1f6
<4> [229.657129]  load_module+0x25bd/0x2a40
<4> [229.657135]  ? __se_sys_finit_module+0xd3/0xf0
<4> [229.657138]  __se_sys_finit_module+0xd3/0xf0
<4> [229.657143]  do_syscall_64+0x55/0x1c0
<4> [229.657146]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [229.657150] RIP: 0033:0x7f806e923839
<4> [229.657170] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [229.657179] RSP: 002b:00007ffc40fdb2d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [229.657184] RAX: ffffffffffffffda RBX: 00005601a02956e0 RCX: 00007f806e923839
<4> [229.657188] RDX: 0000000000000000 RSI: 00005601a028e650 RDI: 0000000000000005
<4> [229.657192] RBP: 00005601a028e650 R08: 0000000000000000 R09: 0000000000000000
<4> [229.657196] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000000000
<4> [229.657200] R13: 00005601a028be00 R14: 0000000000000020 R15: 0000000000000016

That looks significant. So, success? \o/
-Chris
Michal Wajdeczko Aug. 1, 2019, 4:14 p.m. UTC | #2
On Thu, 01 Aug 2019 17:27:22 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:


> <7> [229.655762] [drm:intel_uc_fw_upload [i915]] HuC fw load  
> i915/kbl_huc_ver02_00_1810.bin
> <7> [229.656489] [drm:intel_uc_fw_upload [i915]] HuC fw xfer completed
> <6> [229.656490] [drm] HuC: Loaded firmware  
> i915/kbl_huc_ver02_00_1810.bin (version 2.0)

we loaded HuC fw here

> <7> [229.656579] [drm:intel_uc_fw_upload [i915]] GuC fw load  
> i915/kbl_guc_33.0.0.bin
> <6> [229.656639] i915 0000:00:02.0: [drm:__i915_inject_load_error  
> [i915]] Injecting failure -8 at checkpoint 15 [intel_uc_fw_upload:427]
> <7> [229.656688] [drm:intel_uc_init_hw [i915]] GuC fw load failed: -8;  
> will reset and retry 2 more time(s)
> <7> [229.656739] [drm:intel_uc_fw_upload [i915]] HuC fw load  
> i915/kbl_huc_ver02_00_1810.bin
> <3> [229.656740] intel_uc_fw_upload:425  
> GEM_BUG_ON(intel_uc_fw_is_loaded(uc_fw))

and now we try again (Gen9 feature!)

> That looks significant. So, success? \o/

Yes! the other good news is ICL was clean!