diff mbox

drm/i915: Remove assertion of active_rings must be non-empty if active_requests

Message ID 20180504101147.26286-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 4, 2018, 10:11 a.m. UTC
"An outstanding request must still be on an active ring somewhere" is
only true if we haven't just been interrupted by the shrinker in the
middle of allocating the request itself. (At the start of
i915_request_alloc() we pin the context and prepare the GT for activity,
marking it as active, and then try to allocate the request. If this
allocation invokes the shrinker, we try to reclaim some space by calling
i915_retire_requests() which may then be confused by the pre-reservation
of active_requests.)

<3>[  125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings))
<2>[  125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429!
<4>[  125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
<4>[  125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers
<4>[  125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G     U            4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1
<4>[  125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018
<4>[  125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915]
<4>[  125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282
<4>[  125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000
<4>[  125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0
<4>[  125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6
<4>[  125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160
<4>[  125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0
<4>[  125.499235] FS:  00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000
<4>[  125.499262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0
<4>[  125.499304] Call Trace:
<4>[  125.499417]  i915_gem_shrink+0x576/0xb50 [i915]
<4>[  125.499532]  ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915]
<4>[  125.499561]  ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[  125.499671]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
<4>[  125.499782]  ? i915_gem_shrinker_scan+0xc4/0x320 [i915]
<4>[  125.499889]  i915_gem_shrinker_scan+0xc4/0x320 [i915]
<4>[  125.499997]  ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915]
<4>[  125.500021]  ? do_raw_spin_unlock+0x4f/0x240
<4>[  125.500042]  ? _raw_spin_unlock+0x29/0x40
<4>[  125.500149]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
<4>[  125.500177]  shrink_slab.part.18+0x23e/0x8f0
<4>[  125.500202]  ? unregister_shrinker+0x1f0/0x1f0
<4>[  125.500226]  ? mem_cgroup_iter+0x379/0xcc0
<4>[  125.500249]  shrink_node+0xa7e/0x1180
<4>[  125.500276]  ? shrink_node_memcg+0x11f0/0x11f0
<4>[  125.500297]  ? __delayacct_freepages_start+0x38/0x80
<4>[  125.500319]  ? __is_insn_slot_addr+0xe3/0x1a0
<4>[  125.500342]  ? recalibrate_cpu_khz+0x10/0x10
<4>[  125.500361]  ? ktime_get+0xb2/0x140
<4>[  125.500382]  do_try_to_free_pages+0x2d3/0xe40
<4>[  125.500407]  ? allow_direct_reclaim.part.23+0x1e0/0x1e0
<4>[  125.500429]  ? shrink_node+0x1180/0x1180
<4>[  125.500450]  ? __read_once_size_nocheck.constprop.4+0x10/0x10
<4>[  125.500476]  try_to_free_pages+0x1af/0x560
<4>[  125.500497]  ? do_try_to_free_pages+0xe40/0xe40
<4>[  125.500525]  __alloc_pages_nodemask+0xadc/0x2130
<4>[  125.500553]  ? gfp_pfmemalloc_allowed+0x150/0x150
<4>[  125.500654]  ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915]
<4>[  125.500678]  ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[  125.500701]  ? __debug_object_init+0x322/0xd90
<4>[  125.500722]  ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[  125.500827]  ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915]
<4>[  125.500942]  ? i915_request_alloc+0x5b5/0x13f0 [i915]
<4>[  125.500964]  ? page_frag_free+0x170/0x170
<4>[  125.500984]  ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[  125.501008]  new_slab+0x21d/0x5c0
<4>[  125.501029]  ___slab_alloc.constprop.35+0x322/0x3e0
<4>[  125.501052]  ? reservation_object_reserve_shared+0x10b/0x250
<4>[  125.501074]  ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0
<4>[  125.501097]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[  125.501120]  ? fs_reclaim_acquire+0x10/0x10
<4>[  125.501138]  ? lock_acquire+0x138/0x3c0
<4>[  125.501156]  ? lock_acquire+0x3c0/0x3c0
<4>[  125.501176]  ? reservation_object_reserve_shared+0x10b/0x250
<4>[  125.501198]  ? __slab_alloc.isra.27.constprop.34+0x3d/0x70
<4>[  125.501219]  __slab_alloc.isra.27.constprop.34+0x3d/0x70
<4>[  125.501243]  ? reservation_object_reserve_shared+0x10b/0x250
<4>[  125.501265]  __kmalloc_track_caller+0x313/0x350
<4>[  125.501287]  krealloc+0x62/0xb0
<4>[  125.501305]  reservation_object_reserve_shared+0x10b/0x250
<4>[  125.501411]  i915_gem_do_execbuffer+0x2040/0x32e0 [i915]
<4>[  125.501522]  ? eb_relocate_slow+0xad0/0xad0 [i915]
<4>[  125.501544]  ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[  125.501646]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
<4>[  125.501755]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
<4>[  125.501779]  ? drm_dev_get+0x20/0x20
<4>[  125.501803]  ? __might_fault+0xea/0x1a0
<4>[  125.501902]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
<4>[  125.502012]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[  125.502116]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[  125.502218]  i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915]
<4>[  125.502243]  ? drm_dev_enter+0xe0/0xe0
<4>[  125.502260]  ? lock_acquire+0x138/0x3c0
<4>[  125.502362]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[  125.502470]  ? i915_gem_object_create.part.28+0x570/0x570 [i915]
<4>[  125.502575]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[  125.502680]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[  125.502702]  drm_ioctl_kernel+0x151/0x200
<4>[  125.502721]  ? drm_ioctl_permit+0x2a0/0x2a0
<4>[  125.502746]  drm_ioctl+0x63a/0x920
<4>[  125.502844]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[  125.502868]  ? drm_getstats+0x20/0x20
<4>[  125.502886]  ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[  125.502919]  do_vfs_ioctl+0x173/0xe90
<4>[  125.502936]  ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[  125.502957]  ? ioctl_preallocate+0x170/0x170
<4>[  125.502978]  ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[  125.503002]  ? retint_kernel+0x2d/0x2d
<4>[  125.503024]  ksys_ioctl+0x35/0x60
<4>[  125.503043]  __x64_sys_ioctl+0x6a/0xb0
<4>[  125.503061]  do_syscall_64+0x97/0x400
<4>[  125.503081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  125.503101] RIP: 0033:0x7fe18e4f65d7
<4>[  125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[  125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7
<4>[  125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003
<4>[  125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080
<4>[  125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469
<4>[  125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
<4>[  125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9
<1>[  125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40

Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Chris Wilson May 4, 2018, 10:24 a.m. UTC | #1
Quoting Chris Wilson (2018-05-04 11:11:47)
> "An outstanding request must still be on an active ring somewhere" is
> only true if we haven't just been interrupted by the shrinker in the
> middle of allocating the request itself. (At the start of
> i915_request_alloc() we pin the context and prepare the GT for activity,
> marking it as active, and then try to allocate the request. If this
> allocation invokes the shrinker, we try to reclaim some space by calling
> i915_retire_requests() which may then be confused by the pre-reservation
> of active_requests.)

Note that the oops here is actually of any allocation after
i915_request_alloc and before i915_request_add. To close that coarse
window, we could move the list_add(rq, ring->request_list) to
i915_request_alloc, but we still have the issue of the allocations inside
i915_request_alloc itself.
-Chris
Tvrtko Ursulin May 4, 2018, 10:37 a.m. UTC | #2
On 04/05/2018 11:11, Chris Wilson wrote:
> "An outstanding request must still be on an active ring somewhere" is
> only true if we haven't just been interrupted by the shrinker in the
> middle of allocating the request itself. (At the start of
> i915_request_alloc() we pin the context and prepare the GT for activity,
> marking it as active, and then try to allocate the request. If this
> allocation invokes the shrinker, we try to reclaim some space by calling
> i915_retire_requests() which may then be confused by the pre-reservation
> of active_requests.)
> 
> <3>[  125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings))
> <2>[  125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429!
> <4>[  125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> <4>[  125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers
> <4>[  125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G     U            4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1
> <4>[  125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018
> <4>[  125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915]
> <4>[  125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282
> <4>[  125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000
> <4>[  125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0
> <4>[  125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6
> <4>[  125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160
> <4>[  125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0
> <4>[  125.499235] FS:  00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000
> <4>[  125.499262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0
> <4>[  125.499304] Call Trace:
> <4>[  125.499417]  i915_gem_shrink+0x576/0xb50 [i915]
> <4>[  125.499532]  ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915]
> <4>[  125.499561]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> <4>[  125.499671]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
> <4>[  125.499782]  ? i915_gem_shrinker_scan+0xc4/0x320 [i915]
> <4>[  125.499889]  i915_gem_shrinker_scan+0xc4/0x320 [i915]
> <4>[  125.499997]  ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915]
> <4>[  125.500021]  ? do_raw_spin_unlock+0x4f/0x240
> <4>[  125.500042]  ? _raw_spin_unlock+0x29/0x40
> <4>[  125.500149]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
> <4>[  125.500177]  shrink_slab.part.18+0x23e/0x8f0
> <4>[  125.500202]  ? unregister_shrinker+0x1f0/0x1f0
> <4>[  125.500226]  ? mem_cgroup_iter+0x379/0xcc0
> <4>[  125.500249]  shrink_node+0xa7e/0x1180
> <4>[  125.500276]  ? shrink_node_memcg+0x11f0/0x11f0
> <4>[  125.500297]  ? __delayacct_freepages_start+0x38/0x80
> <4>[  125.500319]  ? __is_insn_slot_addr+0xe3/0x1a0
> <4>[  125.500342]  ? recalibrate_cpu_khz+0x10/0x10
> <4>[  125.500361]  ? ktime_get+0xb2/0x140
> <4>[  125.500382]  do_try_to_free_pages+0x2d3/0xe40
> <4>[  125.500407]  ? allow_direct_reclaim.part.23+0x1e0/0x1e0
> <4>[  125.500429]  ? shrink_node+0x1180/0x1180
> <4>[  125.500450]  ? __read_once_size_nocheck.constprop.4+0x10/0x10
> <4>[  125.500476]  try_to_free_pages+0x1af/0x560
> <4>[  125.500497]  ? do_try_to_free_pages+0xe40/0xe40
> <4>[  125.500525]  __alloc_pages_nodemask+0xadc/0x2130
> <4>[  125.500553]  ? gfp_pfmemalloc_allowed+0x150/0x150
> <4>[  125.500654]  ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915]
> <4>[  125.500678]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> <4>[  125.500701]  ? __debug_object_init+0x322/0xd90
> <4>[  125.500722]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> <4>[  125.500827]  ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915]
> <4>[  125.500942]  ? i915_request_alloc+0x5b5/0x13f0 [i915]
> <4>[  125.500964]  ? page_frag_free+0x170/0x170
> <4>[  125.500984]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> <4>[  125.501008]  new_slab+0x21d/0x5c0
> <4>[  125.501029]  ___slab_alloc.constprop.35+0x322/0x3e0
> <4>[  125.501052]  ? reservation_object_reserve_shared+0x10b/0x250
> <4>[  125.501074]  ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0
> <4>[  125.501097]  ? _raw_spin_unlock_irqrestore+0x39/0x60
> <4>[  125.501120]  ? fs_reclaim_acquire+0x10/0x10
> <4>[  125.501138]  ? lock_acquire+0x138/0x3c0
> <4>[  125.501156]  ? lock_acquire+0x3c0/0x3c0
> <4>[  125.501176]  ? reservation_object_reserve_shared+0x10b/0x250
> <4>[  125.501198]  ? __slab_alloc.isra.27.constprop.34+0x3d/0x70
> <4>[  125.501219]  __slab_alloc.isra.27.constprop.34+0x3d/0x70
> <4>[  125.501243]  ? reservation_object_reserve_shared+0x10b/0x250
> <4>[  125.501265]  __kmalloc_track_caller+0x313/0x350
> <4>[  125.501287]  krealloc+0x62/0xb0
> <4>[  125.501305]  reservation_object_reserve_shared+0x10b/0x250
> <4>[  125.501411]  i915_gem_do_execbuffer+0x2040/0x32e0 [i915]
> <4>[  125.501522]  ? eb_relocate_slow+0xad0/0xad0 [i915]
> <4>[  125.501544]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> <4>[  125.501646]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
> <4>[  125.501755]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
> <4>[  125.501779]  ? drm_dev_get+0x20/0x20
> <4>[  125.501803]  ? __might_fault+0xea/0x1a0
> <4>[  125.501902]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
> <4>[  125.502012]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> <4>[  125.502116]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> <4>[  125.502218]  i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915]
> <4>[  125.502243]  ? drm_dev_enter+0xe0/0xe0
> <4>[  125.502260]  ? lock_acquire+0x138/0x3c0
> <4>[  125.502362]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> <4>[  125.502470]  ? i915_gem_object_create.part.28+0x570/0x570 [i915]
> <4>[  125.502575]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> <4>[  125.502680]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> <4>[  125.502702]  drm_ioctl_kernel+0x151/0x200
> <4>[  125.502721]  ? drm_ioctl_permit+0x2a0/0x2a0
> <4>[  125.502746]  drm_ioctl+0x63a/0x920
> <4>[  125.502844]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> <4>[  125.502868]  ? drm_getstats+0x20/0x20
> <4>[  125.502886]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> <4>[  125.502919]  do_vfs_ioctl+0x173/0xe90
> <4>[  125.502936]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> <4>[  125.502957]  ? ioctl_preallocate+0x170/0x170
> <4>[  125.502978]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> <4>[  125.503002]  ? retint_kernel+0x2d/0x2d
> <4>[  125.503024]  ksys_ioctl+0x35/0x60
> <4>[  125.503043]  __x64_sys_ioctl+0x6a/0xb0
> <4>[  125.503061]  do_syscall_64+0x97/0x400
> <4>[  125.503081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4>[  125.503101] RIP: 0033:0x7fe18e4f65d7
> <4>[  125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> <4>[  125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7
> <4>[  125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003
> <4>[  125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080
> <4>[  125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469
> <4>[  125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
> <4>[  125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9
> <1>[  125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40
> 
> Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d68739b94dac..e4cf76ec14a6 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1426,9 +1426,6 @@ void i915_retire_requests(struct drm_i915_private *i915)
>   	if (!i915->gt.active_requests)
>   		return;
>   
> -	/* An outstanding request must be on a still active ring somewhere */
> -	GEM_BUG_ON(list_empty(&i915->gt.active_rings));
> -
>   	list_for_each_entry_safe(ring, tmp, &i915->gt.active_rings, active_link)
>   		ring_retire_requests(ring);
>   }
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Chris Wilson May 4, 2018, 12:06 p.m. UTC | #3
Quoting Tvrtko Ursulin (2018-05-04 11:37:25)
> 
> On 04/05/2018 11:11, Chris Wilson wrote:
> > "An outstanding request must still be on an active ring somewhere" is
> > only true if we haven't just been interrupted by the shrinker in the
> > middle of allocating the request itself. (At the start of
> > i915_request_alloc() we pin the context and prepare the GT for activity,
> > marking it as active, and then try to allocate the request. If this
> > allocation invokes the shrinker, we try to reclaim some space by calling
> > i915_retire_requests() which may then be confused by the pre-reservation
> > of active_requests.)
> > 
> > <3>[  125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings))
> > <2>[  125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429!
> > <4>[  125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> > <4>[  125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers
> > <4>[  125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G     U            4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1
> > <4>[  125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018
> > <4>[  125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915]
> > <4>[  125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282
> > <4>[  125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000
> > <4>[  125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0
> > <4>[  125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6
> > <4>[  125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160
> > <4>[  125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0
> > <4>[  125.499235] FS:  00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000
> > <4>[  125.499262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[  125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0
> > <4>[  125.499304] Call Trace:
> > <4>[  125.499417]  i915_gem_shrink+0x576/0xb50 [i915]
> > <4>[  125.499532]  ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915]
> > <4>[  125.499561]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> > <4>[  125.499671]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
> > <4>[  125.499782]  ? i915_gem_shrinker_scan+0xc4/0x320 [i915]
> > <4>[  125.499889]  i915_gem_shrinker_scan+0xc4/0x320 [i915]
> > <4>[  125.499997]  ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915]
> > <4>[  125.500021]  ? do_raw_spin_unlock+0x4f/0x240
> > <4>[  125.500042]  ? _raw_spin_unlock+0x29/0x40
> > <4>[  125.500149]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
> > <4>[  125.500177]  shrink_slab.part.18+0x23e/0x8f0
> > <4>[  125.500202]  ? unregister_shrinker+0x1f0/0x1f0
> > <4>[  125.500226]  ? mem_cgroup_iter+0x379/0xcc0
> > <4>[  125.500249]  shrink_node+0xa7e/0x1180
> > <4>[  125.500276]  ? shrink_node_memcg+0x11f0/0x11f0
> > <4>[  125.500297]  ? __delayacct_freepages_start+0x38/0x80
> > <4>[  125.500319]  ? __is_insn_slot_addr+0xe3/0x1a0
> > <4>[  125.500342]  ? recalibrate_cpu_khz+0x10/0x10
> > <4>[  125.500361]  ? ktime_get+0xb2/0x140
> > <4>[  125.500382]  do_try_to_free_pages+0x2d3/0xe40
> > <4>[  125.500407]  ? allow_direct_reclaim.part.23+0x1e0/0x1e0
> > <4>[  125.500429]  ? shrink_node+0x1180/0x1180
> > <4>[  125.500450]  ? __read_once_size_nocheck.constprop.4+0x10/0x10
> > <4>[  125.500476]  try_to_free_pages+0x1af/0x560
> > <4>[  125.500497]  ? do_try_to_free_pages+0xe40/0xe40
> > <4>[  125.500525]  __alloc_pages_nodemask+0xadc/0x2130
> > <4>[  125.500553]  ? gfp_pfmemalloc_allowed+0x150/0x150
> > <4>[  125.500654]  ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915]
> > <4>[  125.500678]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> > <4>[  125.500701]  ? __debug_object_init+0x322/0xd90
> > <4>[  125.500722]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> > <4>[  125.500827]  ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915]
> > <4>[  125.500942]  ? i915_request_alloc+0x5b5/0x13f0 [i915]
> > <4>[  125.500964]  ? page_frag_free+0x170/0x170
> > <4>[  125.500984]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> > <4>[  125.501008]  new_slab+0x21d/0x5c0
> > <4>[  125.501029]  ___slab_alloc.constprop.35+0x322/0x3e0
> > <4>[  125.501052]  ? reservation_object_reserve_shared+0x10b/0x250
> > <4>[  125.501074]  ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0
> > <4>[  125.501097]  ? _raw_spin_unlock_irqrestore+0x39/0x60
> > <4>[  125.501120]  ? fs_reclaim_acquire+0x10/0x10
> > <4>[  125.501138]  ? lock_acquire+0x138/0x3c0
> > <4>[  125.501156]  ? lock_acquire+0x3c0/0x3c0
> > <4>[  125.501176]  ? reservation_object_reserve_shared+0x10b/0x250
> > <4>[  125.501198]  ? __slab_alloc.isra.27.constprop.34+0x3d/0x70
> > <4>[  125.501219]  __slab_alloc.isra.27.constprop.34+0x3d/0x70
> > <4>[  125.501243]  ? reservation_object_reserve_shared+0x10b/0x250
> > <4>[  125.501265]  __kmalloc_track_caller+0x313/0x350
> > <4>[  125.501287]  krealloc+0x62/0xb0
> > <4>[  125.501305]  reservation_object_reserve_shared+0x10b/0x250
> > <4>[  125.501411]  i915_gem_do_execbuffer+0x2040/0x32e0 [i915]
> > <4>[  125.501522]  ? eb_relocate_slow+0xad0/0xad0 [i915]
> > <4>[  125.501544]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> > <4>[  125.501646]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
> > <4>[  125.501755]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
> > <4>[  125.501779]  ? drm_dev_get+0x20/0x20
> > <4>[  125.501803]  ? __might_fault+0xea/0x1a0
> > <4>[  125.501902]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
> > <4>[  125.502012]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> > <4>[  125.502116]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> > <4>[  125.502218]  i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915]
> > <4>[  125.502243]  ? drm_dev_enter+0xe0/0xe0
> > <4>[  125.502260]  ? lock_acquire+0x138/0x3c0
> > <4>[  125.502362]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> > <4>[  125.502470]  ? i915_gem_object_create.part.28+0x570/0x570 [i915]
> > <4>[  125.502575]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> > <4>[  125.502680]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> > <4>[  125.502702]  drm_ioctl_kernel+0x151/0x200
> > <4>[  125.502721]  ? drm_ioctl_permit+0x2a0/0x2a0
> > <4>[  125.502746]  drm_ioctl+0x63a/0x920
> > <4>[  125.502844]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
> > <4>[  125.502868]  ? drm_getstats+0x20/0x20
> > <4>[  125.502886]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> > <4>[  125.502919]  do_vfs_ioctl+0x173/0xe90
> > <4>[  125.502936]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> > <4>[  125.502957]  ? ioctl_preallocate+0x170/0x170
> > <4>[  125.502978]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> > <4>[  125.503002]  ? retint_kernel+0x2d/0x2d
> > <4>[  125.503024]  ksys_ioctl+0x35/0x60
> > <4>[  125.503043]  __x64_sys_ioctl+0x6a/0xb0
> > <4>[  125.503061]  do_syscall_64+0x97/0x400
> > <4>[  125.503081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > <4>[  125.503101] RIP: 0033:0x7fe18e4f65d7
> > <4>[  125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > <4>[  125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7
> > <4>[  125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003
> > <4>[  125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080
> > <4>[  125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469
> > <4>[  125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
> > <4>[  125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9
> > <1>[  125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40
> > 
> > Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 3 ---
> >   1 file changed, 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index d68739b94dac..e4cf76ec14a6 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1426,9 +1426,6 @@ void i915_retire_requests(struct drm_i915_private *i915)
> >       if (!i915->gt.active_requests)
> >               return;
> >   
> > -     /* An outstanding request must be on a still active ring somewhere */
> > -     GEM_BUG_ON(list_empty(&i915->gt.active_rings));
> > -
> >       list_for_each_entry_safe(ring, tmp, &i915->gt.active_rings, active_link)
> >               ring_retire_requests(ring);
> >   }
> > 
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

And pushed before anyone else notices the oops...

Thanks for the review,
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d68739b94dac..e4cf76ec14a6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1426,9 +1426,6 @@  void i915_retire_requests(struct drm_i915_private *i915)
 	if (!i915->gt.active_requests)
 		return;
 
-	/* An outstanding request must be on a still active ring somewhere */
-	GEM_BUG_ON(list_empty(&i915->gt.active_rings));
-
 	list_for_each_entry_safe(ring, tmp, &i915->gt.active_rings, active_link)
 		ring_retire_requests(ring);
 }