diff mbox

drm/i915: Don't check #active_requests from i915_gem_wait_for_idle()

Message ID 20171212132148.8124-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Dec. 12, 2017, 1:21 p.m. UTC
i915_gem_wait_for_idle() is called from inside the shrinker, to ensure
that we drain the last resources from the GPU in dire circumstances (OOM).
As we may allocate whilst building a request, it is then possible to hit
the shrinker with a request under construction, and so we must account
for the incomplete request whilst waiting. In particular, we
preincrement (in reserve_engine) the i915->gt.active_requests counter
and mark the GPU as busy, therefore we can use that counter for quickly
shortcircuiting the wait-for-idle.

[  950.859024] GEM_BUG_ON(i915->gt.active_requests)
[  950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0
[  950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211
[  950.859082]  snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core
[  950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G     U           4.15.0-rc2+ #900
[  950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010
[  950.859107] task: c5119cb4 task.stack: f3ccb8d8
[  950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0
[  950.859113] EFLAGS: 00010296 CPU: 2
[  950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007
[  950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938
[  950.859116]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0
[  950.859118] Call Trace:
[  950.859125]  ? drm_printk+0x70/0x70
[  950.859129]  i915_gem_wait_for_idle+0x18/0x30
[  950.859133]  i915_gem_shrink+0x360/0x410
[  950.859138]  ? vmpressure+0xa8/0xf0
[  950.859142]  ? ktime_get+0x4a/0x100
[  950.859147]  i915_gem_shrink_all+0x21/0x40
[  950.859151]  i915_gem_shrinker_oom+0x23/0x130
[  950.859156]  notifier_call_chain+0x4e/0x70
[  950.859160]  __blocking_notifier_call_chain+0x2f/0x60
[  950.859164]  blocking_notifier_call_chain+0x11/0x20
[  950.859169]  out_of_memory+0x207/0x280
[  950.859174]  __alloc_pages_nodemask+0xd47/0xe60
[  950.859179]  new_slab+0x32d/0x450
[  950.859183]  ___slab_alloc.constprop.81+0x358/0x4e0
[  950.859189]  ? i915_sw_fence_await_dma_fence+0x53/0x160
[  950.859193]  ? __slab_free+0x1fe/0x310
[  950.859197]  ? native_sched_clock+0x1e/0xc0
[  950.859201]  ? i915_gem_request_alloc+0xcf/0x510
[  950.859205]  ? sched_clock+0x9/0x10
[  950.859209]  __slab_alloc.constprop.80+0x29/0x40
[  950.859212]  ? __slab_alloc.constprop.80+0x29/0x40
[  950.859216]  kmem_cache_alloc_trace+0x160/0x1a0
[  950.859220]  ? i915_sw_fence_await_dma_fence+0x53/0x160
[  950.859224]  i915_sw_fence_await_dma_fence+0x53/0x160
[  950.859229]  i915_gem_request_await_dma_fence+0x1eb/0x390
[  950.859233]  i915_gem_request_await_object+0xee/0x230
[  950.859239]  i915_gem_do_execbuffer+0xc16/0x1200
[  950.859246]  ? irqtime_account_irq+0x3e/0xc0
[  950.859251]  ? irq_exit+0x4f/0xb0
[  950.859257]  ? smp_apic_timer_interrupt+0x5f/0x110
[  950.859261]  ? apic_timer_interrupt+0x35/0x3c
[  950.859266]  i915_gem_execbuffer2_ioctl+0x212/0x440
[  950.859270]  ? apic_timer_interrupt+0x35/0x3c
[  950.859274]  ? i915_gem_do_execbuffer+0x1200/0x1200
[  950.859279]  ? insn_get_seg_base+0x1b/0x50
[  950.859283]  ? i915_gem_do_execbuffer+0x1200/0x1200
[  950.859287]  drm_ioctl_kernel+0x51/0xa0
[  950.859291]  drm_ioctl+0x2a3/0x350
[  950.859294]  ? i915_gem_do_execbuffer+0x1200/0x1200
[  950.859300]  ? sched_clock+0x9/0x10
[  950.859303]  ? drm_getunique+0x70/0x70
[  950.859308]  do_vfs_ioctl+0x7d/0x640
[  950.859311]  ? native_sched_clock+0x1e/0xc0
[  950.859315]  ? sched_clock+0x9/0x10
[  950.859319]  ? sched_clock_cpu+0x13/0x120
[  950.859323]  SyS_ioctl+0x4e/0x80
[  950.859326]  do_fast_syscall_32+0x75/0x250
[  950.859331]  ? irq_exit+0x4f/0xb0
[  950.859334]  entry_SYSENTER_32+0x47/0x71
[  950.859338] EIP: 0xb7f81d11
[  950.859339] EFLAGS: 00000296 CPU: 2
[  950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20
[  950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38
[  950.859341]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c        | 2 --
 drivers/gpu/drm/i915/intel_engine_cs.c | 6 ++----
 2 files changed, 2 insertions(+), 6 deletions(-)

Comments

Chris Wilson Dec. 12, 2017, 1:23 p.m. UTC | #1
Quoting Chris Wilson (2017-12-12 13:21:48)
> i915_gem_wait_for_idle() is called from inside the shrinker, to ensure
> that we drain the last resources from the GPU in dire circumstances (OOM).
> As we may allocate whilst building a request, it is then possible to hit
> the shrinker with a request under construction, and so we must account
> for the incomplete request whilst waiting. In particular, we
> preincrement (in reserve_engine) the i915->gt.active_requests counter
> and mark the GPU as busy, therefore we can use that counter for quickly
> shortcircuiting the wait-for-idle.
> 
> [  950.859024] GEM_BUG_ON(i915->gt.active_requests)
> [  950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0
> [  950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211
> [  950.859082]  snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core
> [  950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G     U           4.15.0-rc2+ #900
> [  950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010
> [  950.859107] task: c5119cb4 task.stack: f3ccb8d8
> [  950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0
> [  950.859113] EFLAGS: 00010296 CPU: 2
> [  950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007
> [  950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938
> [  950.859116]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [  950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0
> [  950.859118] Call Trace:
> [  950.859125]  ? drm_printk+0x70/0x70
> [  950.859129]  i915_gem_wait_for_idle+0x18/0x30
> [  950.859133]  i915_gem_shrink+0x360/0x410
> [  950.859138]  ? vmpressure+0xa8/0xf0
> [  950.859142]  ? ktime_get+0x4a/0x100
> [  950.859147]  i915_gem_shrink_all+0x21/0x40
> [  950.859151]  i915_gem_shrinker_oom+0x23/0x130
> [  950.859156]  notifier_call_chain+0x4e/0x70
> [  950.859160]  __blocking_notifier_call_chain+0x2f/0x60
> [  950.859164]  blocking_notifier_call_chain+0x11/0x20
> [  950.859169]  out_of_memory+0x207/0x280
> [  950.859174]  __alloc_pages_nodemask+0xd47/0xe60
> [  950.859179]  new_slab+0x32d/0x450
> [  950.859183]  ___slab_alloc.constprop.81+0x358/0x4e0
> [  950.859189]  ? i915_sw_fence_await_dma_fence+0x53/0x160
> [  950.859193]  ? __slab_free+0x1fe/0x310
> [  950.859197]  ? native_sched_clock+0x1e/0xc0
> [  950.859201]  ? i915_gem_request_alloc+0xcf/0x510
> [  950.859205]  ? sched_clock+0x9/0x10
> [  950.859209]  __slab_alloc.constprop.80+0x29/0x40
> [  950.859212]  ? __slab_alloc.constprop.80+0x29/0x40
> [  950.859216]  kmem_cache_alloc_trace+0x160/0x1a0
> [  950.859220]  ? i915_sw_fence_await_dma_fence+0x53/0x160
> [  950.859224]  i915_sw_fence_await_dma_fence+0x53/0x160
> [  950.859229]  i915_gem_request_await_dma_fence+0x1eb/0x390
> [  950.859233]  i915_gem_request_await_object+0xee/0x230
> [  950.859239]  i915_gem_do_execbuffer+0xc16/0x1200
> [  950.859246]  ? irqtime_account_irq+0x3e/0xc0
> [  950.859251]  ? irq_exit+0x4f/0xb0
> [  950.859257]  ? smp_apic_timer_interrupt+0x5f/0x110
> [  950.859261]  ? apic_timer_interrupt+0x35/0x3c
> [  950.859266]  i915_gem_execbuffer2_ioctl+0x212/0x440
> [  950.859270]  ? apic_timer_interrupt+0x35/0x3c
> [  950.859274]  ? i915_gem_do_execbuffer+0x1200/0x1200
> [  950.859279]  ? insn_get_seg_base+0x1b/0x50
> [  950.859283]  ? i915_gem_do_execbuffer+0x1200/0x1200
> [  950.859287]  drm_ioctl_kernel+0x51/0xa0
> [  950.859291]  drm_ioctl+0x2a3/0x350
> [  950.859294]  ? i915_gem_do_execbuffer+0x1200/0x1200
> [  950.859300]  ? sched_clock+0x9/0x10
> [  950.859303]  ? drm_getunique+0x70/0x70
> [  950.859308]  do_vfs_ioctl+0x7d/0x640
> [  950.859311]  ? native_sched_clock+0x1e/0xc0
> [  950.859315]  ? sched_clock+0x9/0x10
> [  950.859319]  ? sched_clock_cpu+0x13/0x120
> [  950.859323]  SyS_ioctl+0x4e/0x80
> [  950.859326]  do_fast_syscall_32+0x75/0x250
> [  950.859331]  ? irq_exit+0x4f/0xb0
> [  950.859334]  entry_SYSENTER_32+0x47/0x71
> [  950.859338] EIP: 0xb7f81d11
> [  950.859339] EFLAGS: 00000296 CPU: 2
> [  950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20
> [  950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38
> [  950.859341]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
> [  950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15
> 

Fixes: 72022a705e1d ("drm/i915: Move retire-requests into i915_gem_wait_for_idle()")
Fixes: 8490ae207f1d ("drm/i915: Suppress busy status for engines if wedged")

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-Chris
Joonas Lahtinen Dec. 13, 2017, 11:07 a.m. UTC | #2
On Tue, 2017-12-12 at 13:21 +0000, Chris Wilson wrote:
> i915_gem_wait_for_idle() is called from inside the shrinker, to ensure
> that we drain the last resources from the GPU in dire circumstances (OOM).
> As we may allocate whilst building a request, it is then possible to hit
> the shrinker with a request under construction, and so we must account
> for the incomplete request whilst waiting. In particular, we
> preincrement (in reserve_engine) the i915->gt.active_requests counter
> and mark the GPU as busy, therefore we can use that counter for quickly
> shortcircuiting the wait-for-idle.
> 
> [  950.859024] GEM_BUG_ON(i915->gt.active_requests)
> [  950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0
> [  950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211
> [  950.859082]  snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core
> [  950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G     U           4.15.0-rc2+ #900
> [  950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010
> [  950.859107] task: c5119cb4 task.stack: f3ccb8d8
> [  950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0
> [  950.859113] EFLAGS: 00010296 CPU: 2
> [  950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007
> [  950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938
> [  950.859116]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [  950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0
> [  950.859118] Call Trace:
> [  950.859125]  ? drm_printk+0x70/0x70
> [  950.859129]  i915_gem_wait_for_idle+0x18/0x30
> [  950.859133]  i915_gem_shrink+0x360/0x410
> [  950.859138]  ? vmpressure+0xa8/0xf0
> [  950.859142]  ? ktime_get+0x4a/0x100
> [  950.859147]  i915_gem_shrink_all+0x21/0x40
> [  950.859151]  i915_gem_shrinker_oom+0x23/0x130
> [  950.859156]  notifier_call_chain+0x4e/0x70
> [  950.859160]  __blocking_notifier_call_chain+0x2f/0x60
> [  950.859164]  blocking_notifier_call_chain+0x11/0x20
> [  950.859169]  out_of_memory+0x207/0x280
> [  950.859174]  __alloc_pages_nodemask+0xd47/0xe60
> [  950.859179]  new_slab+0x32d/0x450
> [  950.859183]  ___slab_alloc.constprop.81+0x358/0x4e0
> [  950.859189]  ? i915_sw_fence_await_dma_fence+0x53/0x160
> [  950.859193]  ? __slab_free+0x1fe/0x310
> [  950.859197]  ? native_sched_clock+0x1e/0xc0
> [  950.859201]  ? i915_gem_request_alloc+0xcf/0x510
> [  950.859205]  ? sched_clock+0x9/0x10
> [  950.859209]  __slab_alloc.constprop.80+0x29/0x40
> [  950.859212]  ? __slab_alloc.constprop.80+0x29/0x40
> [  950.859216]  kmem_cache_alloc_trace+0x160/0x1a0
> [  950.859220]  ? i915_sw_fence_await_dma_fence+0x53/0x160
> [  950.859224]  i915_sw_fence_await_dma_fence+0x53/0x160
> [  950.859229]  i915_gem_request_await_dma_fence+0x1eb/0x390
> [  950.859233]  i915_gem_request_await_object+0xee/0x230
> [  950.859239]  i915_gem_do_execbuffer+0xc16/0x1200
> [  950.859246]  ? irqtime_account_irq+0x3e/0xc0
> [  950.859251]  ? irq_exit+0x4f/0xb0
> [  950.859257]  ? smp_apic_timer_interrupt+0x5f/0x110
> [  950.859261]  ? apic_timer_interrupt+0x35/0x3c
> [  950.859266]  i915_gem_execbuffer2_ioctl+0x212/0x440
> [  950.859270]  ? apic_timer_interrupt+0x35/0x3c
> [  950.859274]  ? i915_gem_do_execbuffer+0x1200/0x1200
> [  950.859279]  ? insn_get_seg_base+0x1b/0x50
> [  950.859283]  ? i915_gem_do_execbuffer+0x1200/0x1200
> [  950.859287]  drm_ioctl_kernel+0x51/0xa0
> [  950.859291]  drm_ioctl+0x2a3/0x350
> [  950.859294]  ? i915_gem_do_execbuffer+0x1200/0x1200
> [  950.859300]  ? sched_clock+0x9/0x10
> [  950.859303]  ? drm_getunique+0x70/0x70
> [  950.859308]  do_vfs_ioctl+0x7d/0x640
> [  950.859311]  ? native_sched_clock+0x1e/0xc0
> [  950.859315]  ? sched_clock+0x9/0x10
> [  950.859319]  ? sched_clock_cpu+0x13/0x120
> [  950.859323]  SyS_ioctl+0x4e/0x80
> [  950.859326]  do_fast_syscall_32+0x75/0x250
> [  950.859331]  ? irq_exit+0x4f/0xb0
> [  950.859334]  entry_SYSENTER_32+0x47/0x71
> [  950.859338] EIP: 0xb7f81d11
> [  950.859339] EFLAGS: 00000296 CPU: 2
> [  950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20
> [  950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38
> [  950.859341]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
> [  950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
Chris Wilson Dec. 13, 2017, 12:06 p.m. UTC | #3
Quoting Joonas Lahtinen (2017-12-13 11:07:01)
> On Tue, 2017-12-12 at 13:21 +0000, Chris Wilson wrote:
> > i915_gem_wait_for_idle() is called from inside the shrinker, to ensure
> > that we drain the last resources from the GPU in dire circumstances (OOM).
> > As we may allocate whilst building a request, it is then possible to hit
> > the shrinker with a request under construction, and so we must account
> > for the incomplete request whilst waiting. In particular, we
> > preincrement (in reserve_engine) the i915->gt.active_requests counter
> > and mark the GPU as busy, therefore we can use that counter for quickly
> > shortcircuiting the wait-for-idle.
> > 
> > [  950.859024] GEM_BUG_ON(i915->gt.active_requests)
> > [  950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0
> > [  950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211
> > [  950.859082]  snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core
> > [  950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G     U           4.15.0-rc2+ #900
> > [  950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010
> > [  950.859107] task: c5119cb4 task.stack: f3ccb8d8
> > [  950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0
> > [  950.859113] EFLAGS: 00010296 CPU: 2
> > [  950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007
> > [  950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938
> > [  950.859116]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [  950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0
> > [  950.859118] Call Trace:
> > [  950.859125]  ? drm_printk+0x70/0x70
> > [  950.859129]  i915_gem_wait_for_idle+0x18/0x30
> > [  950.859133]  i915_gem_shrink+0x360/0x410
> > [  950.859138]  ? vmpressure+0xa8/0xf0
> > [  950.859142]  ? ktime_get+0x4a/0x100
> > [  950.859147]  i915_gem_shrink_all+0x21/0x40
> > [  950.859151]  i915_gem_shrinker_oom+0x23/0x130
> > [  950.859156]  notifier_call_chain+0x4e/0x70
> > [  950.859160]  __blocking_notifier_call_chain+0x2f/0x60
> > [  950.859164]  blocking_notifier_call_chain+0x11/0x20
> > [  950.859169]  out_of_memory+0x207/0x280
> > [  950.859174]  __alloc_pages_nodemask+0xd47/0xe60
> > [  950.859179]  new_slab+0x32d/0x450
> > [  950.859183]  ___slab_alloc.constprop.81+0x358/0x4e0
> > [  950.859189]  ? i915_sw_fence_await_dma_fence+0x53/0x160
> > [  950.859193]  ? __slab_free+0x1fe/0x310
> > [  950.859197]  ? native_sched_clock+0x1e/0xc0
> > [  950.859201]  ? i915_gem_request_alloc+0xcf/0x510
> > [  950.859205]  ? sched_clock+0x9/0x10
> > [  950.859209]  __slab_alloc.constprop.80+0x29/0x40
> > [  950.859212]  ? __slab_alloc.constprop.80+0x29/0x40
> > [  950.859216]  kmem_cache_alloc_trace+0x160/0x1a0
> > [  950.859220]  ? i915_sw_fence_await_dma_fence+0x53/0x160
> > [  950.859224]  i915_sw_fence_await_dma_fence+0x53/0x160
> > [  950.859229]  i915_gem_request_await_dma_fence+0x1eb/0x390
> > [  950.859233]  i915_gem_request_await_object+0xee/0x230
> > [  950.859239]  i915_gem_do_execbuffer+0xc16/0x1200
> > [  950.859246]  ? irqtime_account_irq+0x3e/0xc0
> > [  950.859251]  ? irq_exit+0x4f/0xb0
> > [  950.859257]  ? smp_apic_timer_interrupt+0x5f/0x110
> > [  950.859261]  ? apic_timer_interrupt+0x35/0x3c
> > [  950.859266]  i915_gem_execbuffer2_ioctl+0x212/0x440
> > [  950.859270]  ? apic_timer_interrupt+0x35/0x3c
> > [  950.859274]  ? i915_gem_do_execbuffer+0x1200/0x1200
> > [  950.859279]  ? insn_get_seg_base+0x1b/0x50
> > [  950.859283]  ? i915_gem_do_execbuffer+0x1200/0x1200
> > [  950.859287]  drm_ioctl_kernel+0x51/0xa0
> > [  950.859291]  drm_ioctl+0x2a3/0x350
> > [  950.859294]  ? i915_gem_do_execbuffer+0x1200/0x1200
> > [  950.859300]  ? sched_clock+0x9/0x10
> > [  950.859303]  ? drm_getunique+0x70/0x70
> > [  950.859308]  do_vfs_ioctl+0x7d/0x640
> > [  950.859311]  ? native_sched_clock+0x1e/0xc0
> > [  950.859315]  ? sched_clock+0x9/0x10
> > [  950.859319]  ? sched_clock_cpu+0x13/0x120
> > [  950.859323]  SyS_ioctl+0x4e/0x80
> > [  950.859326]  do_fast_syscall_32+0x75/0x250
> > [  950.859331]  ? irq_exit+0x4f/0xb0
> > [  950.859334]  entry_SYSENTER_32+0x47/0x71
> > [  950.859338] EIP: 0xb7f81d11
> > [  950.859339] EFLAGS: 00000296 CPU: 2
> > [  950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20
> > [  950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38
> > [  950.859341]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
> > [  950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Applied, thanks for the review. Another hopeful explanation for a few
mystery incompletes, although we typically don't stress memory enough in
CI to worry..
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e89496aec857..5f04c790e30e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3560,9 +3560,7 @@  int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
 			if (ret)
 				return ret;
 		}
-
 		i915_gem_retire_requests(i915);
-		GEM_BUG_ON(i915->gt.active_requests);
 
 		ret = wait_for_engines(i915);
 	} else {
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index aad353195f17..510e0bc3a377 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1513,10 +1513,8 @@  bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (READ_ONCE(dev_priv->gt.active_requests))
-		return false;
-
-	/* If the driver is wedged, HW state may be very inconsistent and
+	/*
+	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
 	if (i915_terminally_wedged(&dev_priv->gpu_error))