diff mbox

[v2,1/3] drm/i915: Wake up pending_flip_queue as part of reset handling

Message ID 1360940866-22435-2-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä Feb. 15, 2013, 3:07 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Someone may be waiting for a flip that will never complete due to a GPU
reset. Wake up all such waiters after the GPU reset processing has
finished.

v2: Dropped the wake_up_all() from i915_handle_error() since
    we no longer wait for pending flips with struct_mutex held.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Chris Wilson Feb. 15, 2013, 11:53 p.m. UTC | #1
On Fri, Feb 15, 2013 at 05:07:44PM +0200, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Someone may be waiting for a flip that will never complete due to a GPU
> reset. Wake up all such waiters after the GPU reset processing has
> finished.
> 
> v2: Dropped the wake_up_all() from i915_handle_error() since
>     we no longer wait for pending flips with struct_mutex held.

Isn't the wake_up(pending_flip_queue) superseded by performing the
explicit do_intel_finish_page_flip() in patch 3?
-Chris
Ville Syrjälä Feb. 18, 2013, 9:58 a.m. UTC | #2
On Fri, Feb 15, 2013 at 11:53:14PM +0000, Chris Wilson wrote:
> On Fri, Feb 15, 2013 at 05:07:44PM +0200, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Someone may be waiting for a flip that will never complete due to a GPU
> > reset. Wake up all such waiters after the GPU reset processing has
> > finished.
> > 
> > v2: Dropped the wake_up_all() from i915_handle_error() since
> >     we no longer wait for pending flips with struct_mutex held.
> 
> Isn't the wake_up(pending_flip_queue) superseded by performing the
> explicit do_intel_finish_page_flip() in patch 3?

Yes that's correct. But I actually forgot to remove the wake_up patch
from my tree when I tested this. I'll run a few more tests just to make
sure it still works.
Ville Syrjälä Feb. 18, 2013, 12:41 p.m. UTC | #3
On Mon, Feb 18, 2013 at 11:58:23AM +0200, Ville Syrjälä wrote:
> On Fri, Feb 15, 2013 at 11:53:14PM +0000, Chris Wilson wrote:
> > On Fri, Feb 15, 2013 at 05:07:44PM +0200, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > Someone may be waiting for a flip that will never complete due to a GPU
> > > reset. Wake up all such waiters after the GPU reset processing has
> > > finished.
> > > 
> > > v2: Dropped the wake_up_all() from i915_handle_error() since
> > >     we no longer wait for pending flips with struct_mutex held.
> > 
> > Isn't the wake_up(pending_flip_queue) superseded by performing the
> > explicit do_intel_finish_page_flip() in patch 3?
> 
> Yes that's correct. But I actually forgot to remove the wake_up patch
> from my tree when I tested this. I'll run a few more tests just to make
> sure it still works.

I just tried it w/o the wake_up_all() and unfortunately it hung :(

Need to think about it a bit more I suppose.
Ville Syrjälä Feb. 18, 2013, 3:31 p.m. UTC | #4
On Mon, Feb 18, 2013 at 02:41:00PM +0200, Ville Syrjälä wrote:
> On Mon, Feb 18, 2013 at 11:58:23AM +0200, Ville Syrjälä wrote:
> > On Fri, Feb 15, 2013 at 11:53:14PM +0000, Chris Wilson wrote:
> > > On Fri, Feb 15, 2013 at 05:07:44PM +0200, ville.syrjala@linux.intel.com wrote:
> > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > 
> > > > Someone may be waiting for a flip that will never complete due to a GPU
> > > > reset. Wake up all such waiters after the GPU reset processing has
> > > > finished.
> > > > 
> > > > v2: Dropped the wake_up_all() from i915_handle_error() since
> > > >     we no longer wait for pending flips with struct_mutex held.
> > > 
> > > Isn't the wake_up(pending_flip_queue) superseded by performing the
> > > explicit do_intel_finish_page_flip() in patch 3?
> > 
> > Yes that's correct. But I actually forgot to remove the wake_up patch
> > from my tree when I tested this. I'll run a few more tests just to make
> > sure it still works.
> 
> I just tried it w/o the wake_up_all() and unfortunately it hung :(
> 
> Need to think about it a bit more I suppose.

Well after a bit more though I think I know what happened:

1. page flip submitted on crtc which is not the first crtc in the list
2. mode set operation on any crtc (will grab all crtc locks)
3. reset handling loops over crtcs
 3.1 first crtc doesn't have a pending flip, so pending_flip_queue is not woken up
 3.2 code tries to grab first crtc mutex to update the base address
  -> deadlock

So either I need to keep the wake_up_all() call in the reset handling,
or I need to do two loops over the crtcs, first loop would complete all
page flips, and second loop would update the base address registers.
I don't really care which way we go. Anyone else have a preference?
Daniel Vetter Feb. 18, 2013, 4:39 p.m. UTC | #5
On Mon, Feb 18, 2013 at 05:31:14PM +0200, Ville Syrjälä wrote:
> On Mon, Feb 18, 2013 at 02:41:00PM +0200, Ville Syrjälä wrote:
> > On Mon, Feb 18, 2013 at 11:58:23AM +0200, Ville Syrjälä wrote:
> > > On Fri, Feb 15, 2013 at 11:53:14PM +0000, Chris Wilson wrote:
> > > > On Fri, Feb 15, 2013 at 05:07:44PM +0200, ville.syrjala@linux.intel.com wrote:
> > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > 
> > > > > Someone may be waiting for a flip that will never complete due to a GPU
> > > > > reset. Wake up all such waiters after the GPU reset processing has
> > > > > finished.
> > > > > 
> > > > > v2: Dropped the wake_up_all() from i915_handle_error() since
> > > > >     we no longer wait for pending flips with struct_mutex held.
> > > > 
> > > > Isn't the wake_up(pending_flip_queue) superseded by performing the
> > > > explicit do_intel_finish_page_flip() in patch 3?
> > > 
> > > Yes that's correct. But I actually forgot to remove the wake_up patch
> > > from my tree when I tested this. I'll run a few more tests just to make
> > > sure it still works.
> > 
> > I just tried it w/o the wake_up_all() and unfortunately it hung :(
> > 
> > Need to think about it a bit more I suppose.
> 
> Well after a bit more though I think I know what happened:
> 
> 1. page flip submitted on crtc which is not the first crtc in the list
> 2. mode set operation on any crtc (will grab all crtc locks)
> 3. reset handling loops over crtcs
>  3.1 first crtc doesn't have a pending flip, so pending_flip_queue is not woken up
>  3.2 code tries to grab first crtc mutex to update the base address
>   -> deadlock
> 
> So either I need to keep the wake_up_all() call in the reset handling,
> or I need to do two loops over the crtcs, first loop would complete all
> page flips, and second loop would update the base address registers.
> I don't really care which way we go. Anyone else have a preference?

I'd vote for the second approach of first waking everyone up, and then
doing the modeset updates. I guess you could even use
drm_modeset_lock_all, not really worth optimizing things here imo. Also
please add a comment to explain this ordering constraint in the reset
code.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2cd97d1..672816a 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -915,6 +915,8 @@  static void i915_error_work_func(struct work_struct *work)
 		for_each_ring(ring, dev_priv, i)
 			wake_up_all(&ring->irq_queue);
 
+		wake_up_all(&dev_priv->pending_flip_queue);
+
 		wake_up_all(&dev_priv->gpu_error.reset_queue);
 	}
 }