Message ID | 20180405110220.27008-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Op 05-04-18 om 13:02 schreef Chris Wilson: > If we skip the intel_prepare_reset(), we should also skip the > intel_display_reset(). If we we use a flag set by intel_prepare_reset() > then we do not have to second guess based on external user controlled > state whether or not the prepare was called before deciding to finish > it after the reset. igt/gem_eio is one such example that may tweak > i915.reset faster than the code is expecting, leading to > > [ 190.233528] ===================================== > [ 190.233534] WARNING: bad unlock balance detected! > [ 190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G U > [ 190.233547] ------------------------------------- > [ 190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at: > [ 190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60 > [ 190.233575] but there are no more locks to release! > [ 190.233580] > other info that might help us debug this: > [ 190.233588] 3 locks held by gem_eio/1348: > [ 190.233592] #0: (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50 > [ 190.233607] #1: (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0 > [ 190.233622] #2: (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0 > [ 190.233635] > stack backtrace: > [ 190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G U 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 > [ 190.233655] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 > [ 190.233664] Call Trace: > [ 190.233674] dump_stack+0x67/0x95 > [ 190.233682] ? drm_modeset_backoff+0x1b0/0x1b0 > [ 190.233690] print_unlock_imbalance_bug+0xd2/0xe0 > [ 190.233698] ? drm_modeset_backoff+0x1b0/0x1b0 > [ 190.233704] lock_release+0x23e/0x300 > [ 190.233712] drm_modeset_acquire_fini+0x16/0x60 > [ 190.233835] intel_finish_reset+0x72/0x160 [i915] > [ 190.233894] i915_reset_device+0x1e9/0x240 [i915] > [ 190.233953] ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915] > [ 190.233962] ? work_on_cpu_safe+0x50/0x50 > [ 190.234020] i915_handle_error+0x1f2/0x470 [i915] > [ 190.234031] ? __might_fault+0x39/0x90 > [ 190.234037] ? __might_fault+0x39/0x90 > [ 190.234099] i915_wedged_set+0x7f/0xc0 [i915] > [ 190.234107] simple_attr_write+0xb0/0xd0 > [ 190.234117] full_proxy_write+0x51/0x80 > [ 190.234125] __vfs_write+0x21/0x140 > [ 190.234133] ? rcu_read_lock_sched_held+0x6f/0x80 > [ 190.234140] ? rcu_sync_lockdep_assert+0x29/0x50 > [ 190.234147] ? __sb_start_write+0x152/0x1f0 > [ 190.234152] ? __sb_start_write+0x168/0x1f0 > [ 190.234159] vfs_write+0xbd/0x1a0 > [ 190.234166] SyS_write+0x40/0xa0 > [ 190.234173] ? do_syscall_64+0x19/0x1b0 > [ 190.234180] do_syscall_64+0x6b/0x1b0 > [ 190.234188] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > [ 190.234196] RIP: 0033:0x7f84c1b392b7 > [ 190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 > [ 190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7 > [ 190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046 > [ 190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000 > [ 190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002 > [ 190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8 > > Testcase: igt/gem_eio > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_display.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index 415fb8cf2cf4..9c6156216e5e 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -3677,7 +3677,6 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) > struct drm_atomic_state *state; > int ret; > > - > /* reset doesn't touch the display */ > if (!i915_modparams.force_reset_modeset_test && > !gpu_reset_clobbers_display(dev_priv)) > @@ -3731,19 +3730,17 @@ void intel_finish_reset(struct drm_i915_private *dev_priv) > { > struct drm_device *dev = &dev_priv->drm; > struct drm_modeset_acquire_ctx *ctx = &dev_priv->reset_ctx; > - struct drm_atomic_state *state = dev_priv->modeset_restore_state; > + struct drm_atomic_state *state; > int ret; > > /* reset doesn't touch the display */ > - if (!i915_modparams.force_reset_modeset_test && > - !gpu_reset_clobbers_display(dev_priv)) > + if (!test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags)) > return; > > + state = fetch_and_zero(&dev_priv->modeset_restore_state); > if (!state) > goto unlock; > > - dev_priv->modeset_restore_state = NULL; > - > /* reset doesn't touch the display */ > if (!gpu_reset_clobbers_display(dev_priv)) { > /* for testing only restore the display */ Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Quoting Maarten Lankhorst (2018-04-05 12:10:23) > Op 05-04-18 om 13:02 schreef Chris Wilson: > > If we skip the intel_prepare_reset(), we should also skip the > > intel_display_reset(). If we we use a flag set by intel_prepare_reset() > > then we do not have to second guess based on external user controlled > > state whether or not the prepare was called before deciding to finish > > it after the reset. igt/gem_eio is one such example that may tweak > > i915.reset faster than the code is expecting, leading to > > > > [ 190.233528] ===================================== > > [ 190.233534] WARNING: bad unlock balance detected! > > [ 190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G U > > [ 190.233547] ------------------------------------- > > [ 190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at: > > [ 190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60 > > [ 190.233575] but there are no more locks to release! > > [ 190.233580] > > other info that might help us debug this: > > [ 190.233588] 3 locks held by gem_eio/1348: > > [ 190.233592] #0: (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50 > > [ 190.233607] #1: (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0 > > [ 190.233622] #2: (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0 > > [ 190.233635] > > stack backtrace: > > [ 190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G U 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 > > [ 190.233655] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 > > [ 190.233664] Call Trace: > > [ 190.233674] dump_stack+0x67/0x95 > > [ 190.233682] ? drm_modeset_backoff+0x1b0/0x1b0 > > [ 190.233690] print_unlock_imbalance_bug+0xd2/0xe0 > > [ 190.233698] ? drm_modeset_backoff+0x1b0/0x1b0 > > [ 190.233704] lock_release+0x23e/0x300 > > [ 190.233712] drm_modeset_acquire_fini+0x16/0x60 > > [ 190.233835] intel_finish_reset+0x72/0x160 [i915] > > [ 190.233894] i915_reset_device+0x1e9/0x240 [i915] > > [ 190.233953] ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915] > > [ 190.233962] ? work_on_cpu_safe+0x50/0x50 > > [ 190.234020] i915_handle_error+0x1f2/0x470 [i915] > > [ 190.234031] ? __might_fault+0x39/0x90 > > [ 190.234037] ? __might_fault+0x39/0x90 > > [ 190.234099] i915_wedged_set+0x7f/0xc0 [i915] > > [ 190.234107] simple_attr_write+0xb0/0xd0 > > [ 190.234117] full_proxy_write+0x51/0x80 > > [ 190.234125] __vfs_write+0x21/0x140 > > [ 190.234133] ? rcu_read_lock_sched_held+0x6f/0x80 > > [ 190.234140] ? rcu_sync_lockdep_assert+0x29/0x50 > > [ 190.234147] ? __sb_start_write+0x152/0x1f0 > > [ 190.234152] ? __sb_start_write+0x168/0x1f0 > > [ 190.234159] vfs_write+0xbd/0x1a0 > > [ 190.234166] SyS_write+0x40/0xa0 > > [ 190.234173] ? do_syscall_64+0x19/0x1b0 > > [ 190.234180] do_syscall_64+0x6b/0x1b0 > > [ 190.234188] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > [ 190.234196] RIP: 0033:0x7f84c1b392b7 > > [ 190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 > > [ 190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7 > > [ 190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046 > > [ 190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000 > > [ 190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002 > > [ 190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8 > > > > Testcase: igt/gem_eio > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [snip] > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Thanks for the review, applied this patch and hopefully we only need this one by itself to clear up the warnings from the shards on these old machine. -Chris
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 415fb8cf2cf4..9c6156216e5e 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3677,7 +3677,6 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) struct drm_atomic_state *state; int ret; - /* reset doesn't touch the display */ if (!i915_modparams.force_reset_modeset_test && !gpu_reset_clobbers_display(dev_priv)) @@ -3731,19 +3730,17 @@ void intel_finish_reset(struct drm_i915_private *dev_priv) { struct drm_device *dev = &dev_priv->drm; struct drm_modeset_acquire_ctx *ctx = &dev_priv->reset_ctx; - struct drm_atomic_state *state = dev_priv->modeset_restore_state; + struct drm_atomic_state *state; int ret; /* reset doesn't touch the display */ - if (!i915_modparams.force_reset_modeset_test && - !gpu_reset_clobbers_display(dev_priv)) + if (!test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags)) return; + state = fetch_and_zero(&dev_priv->modeset_restore_state); if (!state) goto unlock; - dev_priv->modeset_restore_state = NULL; - /* reset doesn't touch the display */ if (!gpu_reset_clobbers_display(dev_priv)) { /* for testing only restore the display */
If we skip the intel_prepare_reset(), we should also skip the intel_display_reset(). If we we use a flag set by intel_prepare_reset() then we do not have to second guess based on external user controlled state whether or not the prepare was called before deciding to finish it after the reset. igt/gem_eio is one such example that may tweak i915.reset faster than the code is expecting, leading to [ 190.233528] ===================================== [ 190.233534] WARNING: bad unlock balance detected! [ 190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G U [ 190.233547] ------------------------------------- [ 190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at: [ 190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60 [ 190.233575] but there are no more locks to release! [ 190.233580] other info that might help us debug this: [ 190.233588] 3 locks held by gem_eio/1348: [ 190.233592] #0: (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50 [ 190.233607] #1: (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0 [ 190.233622] #2: (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0 [ 190.233635] stack backtrace: [ 190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G U 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 [ 190.233655] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 [ 190.233664] Call Trace: [ 190.233674] dump_stack+0x67/0x95 [ 190.233682] ? drm_modeset_backoff+0x1b0/0x1b0 [ 190.233690] print_unlock_imbalance_bug+0xd2/0xe0 [ 190.233698] ? drm_modeset_backoff+0x1b0/0x1b0 [ 190.233704] lock_release+0x23e/0x300 [ 190.233712] drm_modeset_acquire_fini+0x16/0x60 [ 190.233835] intel_finish_reset+0x72/0x160 [i915] [ 190.233894] i915_reset_device+0x1e9/0x240 [i915] [ 190.233953] ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915] [ 190.233962] ? work_on_cpu_safe+0x50/0x50 [ 190.234020] i915_handle_error+0x1f2/0x470 [i915] [ 190.234031] ? __might_fault+0x39/0x90 [ 190.234037] ? __might_fault+0x39/0x90 [ 190.234099] i915_wedged_set+0x7f/0xc0 [i915] [ 190.234107] simple_attr_write+0xb0/0xd0 [ 190.234117] full_proxy_write+0x51/0x80 [ 190.234125] __vfs_write+0x21/0x140 [ 190.234133] ? rcu_read_lock_sched_held+0x6f/0x80 [ 190.234140] ? rcu_sync_lockdep_assert+0x29/0x50 [ 190.234147] ? __sb_start_write+0x152/0x1f0 [ 190.234152] ? __sb_start_write+0x168/0x1f0 [ 190.234159] vfs_write+0xbd/0x1a0 [ 190.234166] SyS_write+0x40/0xa0 [ 190.234173] ? do_syscall_64+0x19/0x1b0 [ 190.234180] do_syscall_64+0x6b/0x1b0 [ 190.234188] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 190.234196] RIP: 0033:0x7f84c1b392b7 [ 190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7 [ 190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046 [ 190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000 [ 190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002 [ 190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8 Testcase: igt/gem_eio Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> --- drivers/gpu/drm/i915/intel_display.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)