diff mbox

[1/4] drm/i915: Only call finish_reset after a prepare_reset

Message ID 20180405110220.27008-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 5, 2018, 11:02 a.m. UTC
If we skip the intel_prepare_reset(), we should also skip the
intel_display_reset(). If we we use a flag set by intel_prepare_reset()
then we do not have to second guess based on external user controlled
state whether or not the prepare was called before deciding to finish
it after the reset. igt/gem_eio is one such example that may tweak
i915.reset faster than the code is expecting, leading to

[  190.233528] =====================================
[  190.233534] WARNING: bad unlock balance detected!
[  190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G     U
[  190.233547] -------------------------------------
[  190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at:
[  190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60
[  190.233575] but there are no more locks to release!
[  190.233580]
               other info that might help us debug this:
[  190.233588] 3 locks held by gem_eio/1348:
[  190.233592]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50
[  190.233607]  #1:  (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0
[  190.233622]  #2:  (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0
[  190.233635]
               stack backtrace:
[  190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G     U           4.16.0-rc7-g335ef9849310-drmtip_10+ #1
[  190.233655] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
[  190.233664] Call Trace:
[  190.233674]  dump_stack+0x67/0x95
[  190.233682]  ? drm_modeset_backoff+0x1b0/0x1b0
[  190.233690]  print_unlock_imbalance_bug+0xd2/0xe0
[  190.233698]  ? drm_modeset_backoff+0x1b0/0x1b0
[  190.233704]  lock_release+0x23e/0x300
[  190.233712]  drm_modeset_acquire_fini+0x16/0x60
[  190.233835]  intel_finish_reset+0x72/0x160 [i915]
[  190.233894]  i915_reset_device+0x1e9/0x240 [i915]
[  190.233953]  ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915]
[  190.233962]  ? work_on_cpu_safe+0x50/0x50
[  190.234020]  i915_handle_error+0x1f2/0x470 [i915]
[  190.234031]  ? __might_fault+0x39/0x90
[  190.234037]  ? __might_fault+0x39/0x90
[  190.234099]  i915_wedged_set+0x7f/0xc0 [i915]
[  190.234107]  simple_attr_write+0xb0/0xd0
[  190.234117]  full_proxy_write+0x51/0x80
[  190.234125]  __vfs_write+0x21/0x140
[  190.234133]  ? rcu_read_lock_sched_held+0x6f/0x80
[  190.234140]  ? rcu_sync_lockdep_assert+0x29/0x50
[  190.234147]  ? __sb_start_write+0x152/0x1f0
[  190.234152]  ? __sb_start_write+0x168/0x1f0
[  190.234159]  vfs_write+0xbd/0x1a0
[  190.234166]  SyS_write+0x40/0xa0
[  190.234173]  ? do_syscall_64+0x19/0x1b0
[  190.234180]  do_syscall_64+0x6b/0x1b0
[  190.234188]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  190.234196] RIP: 0033:0x7f84c1b392b7
[  190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[  190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7
[  190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046
[  190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000
[  190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
[  190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8

Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Maarten Lankhorst April 5, 2018, 11:10 a.m. UTC | #1
Op 05-04-18 om 13:02 schreef Chris Wilson:
> If we skip the intel_prepare_reset(), we should also skip the
> intel_display_reset(). If we we use a flag set by intel_prepare_reset()
> then we do not have to second guess based on external user controlled
> state whether or not the prepare was called before deciding to finish
> it after the reset. igt/gem_eio is one such example that may tweak
> i915.reset faster than the code is expecting, leading to
>
> [  190.233528] =====================================
> [  190.233534] WARNING: bad unlock balance detected!
> [  190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G     U
> [  190.233547] -------------------------------------
> [  190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at:
> [  190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60
> [  190.233575] but there are no more locks to release!
> [  190.233580]
>                other info that might help us debug this:
> [  190.233588] 3 locks held by gem_eio/1348:
> [  190.233592]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50
> [  190.233607]  #1:  (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0
> [  190.233622]  #2:  (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0
> [  190.233635]
>                stack backtrace:
> [  190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G     U           4.16.0-rc7-g335ef9849310-drmtip_10+ #1
> [  190.233655] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
> [  190.233664] Call Trace:
> [  190.233674]  dump_stack+0x67/0x95
> [  190.233682]  ? drm_modeset_backoff+0x1b0/0x1b0
> [  190.233690]  print_unlock_imbalance_bug+0xd2/0xe0
> [  190.233698]  ? drm_modeset_backoff+0x1b0/0x1b0
> [  190.233704]  lock_release+0x23e/0x300
> [  190.233712]  drm_modeset_acquire_fini+0x16/0x60
> [  190.233835]  intel_finish_reset+0x72/0x160 [i915]
> [  190.233894]  i915_reset_device+0x1e9/0x240 [i915]
> [  190.233953]  ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915]
> [  190.233962]  ? work_on_cpu_safe+0x50/0x50
> [  190.234020]  i915_handle_error+0x1f2/0x470 [i915]
> [  190.234031]  ? __might_fault+0x39/0x90
> [  190.234037]  ? __might_fault+0x39/0x90
> [  190.234099]  i915_wedged_set+0x7f/0xc0 [i915]
> [  190.234107]  simple_attr_write+0xb0/0xd0
> [  190.234117]  full_proxy_write+0x51/0x80
> [  190.234125]  __vfs_write+0x21/0x140
> [  190.234133]  ? rcu_read_lock_sched_held+0x6f/0x80
> [  190.234140]  ? rcu_sync_lockdep_assert+0x29/0x50
> [  190.234147]  ? __sb_start_write+0x152/0x1f0
> [  190.234152]  ? __sb_start_write+0x168/0x1f0
> [  190.234159]  vfs_write+0xbd/0x1a0
> [  190.234166]  SyS_write+0x40/0xa0
> [  190.234173]  ? do_syscall_64+0x19/0x1b0
> [  190.234180]  do_syscall_64+0x6b/0x1b0
> [  190.234188]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [  190.234196] RIP: 0033:0x7f84c1b392b7
> [  190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> [  190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7
> [  190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046
> [  190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000
> [  190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
> [  190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8
>
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 415fb8cf2cf4..9c6156216e5e 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3677,7 +3677,6 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>  	struct drm_atomic_state *state;
>  	int ret;
>  
> -
>  	/* reset doesn't touch the display */
>  	if (!i915_modparams.force_reset_modeset_test &&
>  	    !gpu_reset_clobbers_display(dev_priv))
> @@ -3731,19 +3730,17 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
>  {
>  	struct drm_device *dev = &dev_priv->drm;
>  	struct drm_modeset_acquire_ctx *ctx = &dev_priv->reset_ctx;
> -	struct drm_atomic_state *state = dev_priv->modeset_restore_state;
> +	struct drm_atomic_state *state;
>  	int ret;
>  
>  	/* reset doesn't touch the display */
> -	if (!i915_modparams.force_reset_modeset_test &&
> -	    !gpu_reset_clobbers_display(dev_priv))
> +	if (!test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
>  		return;
>  
> +	state = fetch_and_zero(&dev_priv->modeset_restore_state);
>  	if (!state)
>  		goto unlock;
>  
> -	dev_priv->modeset_restore_state = NULL;
> -
>  	/* reset doesn't touch the display */
>  	if (!gpu_reset_clobbers_display(dev_priv)) {
>  		/* for testing only restore the display */

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Chris Wilson April 5, 2018, 4:19 p.m. UTC | #2
Quoting Maarten Lankhorst (2018-04-05 12:10:23)
> Op 05-04-18 om 13:02 schreef Chris Wilson:
> > If we skip the intel_prepare_reset(), we should also skip the
> > intel_display_reset(). If we we use a flag set by intel_prepare_reset()
> > then we do not have to second guess based on external user controlled
> > state whether or not the prepare was called before deciding to finish
> > it after the reset. igt/gem_eio is one such example that may tweak
> > i915.reset faster than the code is expecting, leading to
> >
> > [  190.233528] =====================================
> > [  190.233534] WARNING: bad unlock balance detected!
> > [  190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G     U
> > [  190.233547] -------------------------------------
> > [  190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at:
> > [  190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60
> > [  190.233575] but there are no more locks to release!
> > [  190.233580]
> >                other info that might help us debug this:
> > [  190.233588] 3 locks held by gem_eio/1348:
> > [  190.233592]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50
> > [  190.233607]  #1:  (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0
> > [  190.233622]  #2:  (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0
> > [  190.233635]
> >                stack backtrace:
> > [  190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G     U           4.16.0-rc7-g335ef9849310-drmtip_10+ #1
> > [  190.233655] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
> > [  190.233664] Call Trace:
> > [  190.233674]  dump_stack+0x67/0x95
> > [  190.233682]  ? drm_modeset_backoff+0x1b0/0x1b0
> > [  190.233690]  print_unlock_imbalance_bug+0xd2/0xe0
> > [  190.233698]  ? drm_modeset_backoff+0x1b0/0x1b0
> > [  190.233704]  lock_release+0x23e/0x300
> > [  190.233712]  drm_modeset_acquire_fini+0x16/0x60
> > [  190.233835]  intel_finish_reset+0x72/0x160 [i915]
> > [  190.233894]  i915_reset_device+0x1e9/0x240 [i915]
> > [  190.233953]  ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915]
> > [  190.233962]  ? work_on_cpu_safe+0x50/0x50
> > [  190.234020]  i915_handle_error+0x1f2/0x470 [i915]
> > [  190.234031]  ? __might_fault+0x39/0x90
> > [  190.234037]  ? __might_fault+0x39/0x90
> > [  190.234099]  i915_wedged_set+0x7f/0xc0 [i915]
> > [  190.234107]  simple_attr_write+0xb0/0xd0
> > [  190.234117]  full_proxy_write+0x51/0x80
> > [  190.234125]  __vfs_write+0x21/0x140
> > [  190.234133]  ? rcu_read_lock_sched_held+0x6f/0x80
> > [  190.234140]  ? rcu_sync_lockdep_assert+0x29/0x50
> > [  190.234147]  ? __sb_start_write+0x152/0x1f0
> > [  190.234152]  ? __sb_start_write+0x168/0x1f0
> > [  190.234159]  vfs_write+0xbd/0x1a0
> > [  190.234166]  SyS_write+0x40/0xa0
> > [  190.234173]  ? do_syscall_64+0x19/0x1b0
> > [  190.234180]  do_syscall_64+0x6b/0x1b0
> > [  190.234188]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > [  190.234196] RIP: 0033:0x7f84c1b392b7
> > [  190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > [  190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7
> > [  190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046
> > [  190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000
> > [  190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
> > [  190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8
> >
> > Testcase: igt/gem_eio
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[snip]
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Thanks for the review, applied this patch and hopefully we only need
this one by itself to clear up the warnings from the shards on these old
machine.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 415fb8cf2cf4..9c6156216e5e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3677,7 +3677,6 @@  void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	struct drm_atomic_state *state;
 	int ret;
 
-
 	/* reset doesn't touch the display */
 	if (!i915_modparams.force_reset_modeset_test &&
 	    !gpu_reset_clobbers_display(dev_priv))
@@ -3731,19 +3730,17 @@  void intel_finish_reset(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *dev = &dev_priv->drm;
 	struct drm_modeset_acquire_ctx *ctx = &dev_priv->reset_ctx;
-	struct drm_atomic_state *state = dev_priv->modeset_restore_state;
+	struct drm_atomic_state *state;
 	int ret;
 
 	/* reset doesn't touch the display */
-	if (!i915_modparams.force_reset_modeset_test &&
-	    !gpu_reset_clobbers_display(dev_priv))
+	if (!test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
 		return;
 
+	state = fetch_and_zero(&dev_priv->modeset_restore_state);
 	if (!state)
 		goto unlock;
 
-	dev_priv->modeset_restore_state = NULL;
-
 	/* reset doesn't touch the display */
 	if (!gpu_reset_clobbers_display(dev_priv)) {
 		/* for testing only restore the display */