diff mbox

[1/2] drm/i915: Initialize seqno for VECS too

Message ID 1376351584-733-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ben Widawsky Aug. 12, 2013, 11:53 p.m. UTC
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65387
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67198
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Daniel Vetter Aug. 13, 2013, 7:02 a.m. UTC | #1
On Mon, Aug 12, 2013 at 04:53:03PM -0700, Ben Widawsky wrote:
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65387
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67198
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: lu hua <huax.lu@intel.com> (both bug reports)

Please pimp the commit message as discussion on irc (reply with just the
text is ok). Strictly speaking we initialize the 3rd mbox for the 4th ring
here (which ofc is the vebox one for all rings != vebox). And also please
mention the effects of this bug, i.e. how it blows up.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 10c2aaa..665602f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1592,6 +1592,8 @@ void intel_ring_init_seqno(struct intel_ring_buffer *ring, u32 seqno)
>  	if (INTEL_INFO(ring->dev)->gen >= 6) {
>  		I915_WRITE(RING_SYNC_0(ring->mmio_base), 0);
>  		I915_WRITE(RING_SYNC_1(ring->mmio_base), 0);
> +		if (HAS_VEBOX(ring->dev))
> +			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
>  	}
>  
>  	ring->set_seqno(ring, seqno);
> -- 
> 1.8.3.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ben Widawsky Aug. 13, 2013, 8:55 p.m. UTC | #2
On Mon, Aug 12, 2013 at 04:53:03PM -0700, Ben Widawsky wrote:
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65387
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67198
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 10c2aaa..665602f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1592,6 +1592,8 @@ void intel_ring_init_seqno(struct intel_ring_buffer *ring, u32 seqno)
>  	if (INTEL_INFO(ring->dev)->gen >= 6) {
>  		I915_WRITE(RING_SYNC_0(ring->mmio_base), 0);
>  		I915_WRITE(RING_SYNC_1(ring->mmio_base), 0);
> +		if (HAS_VEBOX(ring->dev))
> +			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
>  	}
>  
>  	ring->set_seqno(ring, seqno);

We require n-1 mailboxes for proper semaphore synchronization. All
semaphore synchronization code relies on proper values in these
mailboxes. The fact that we failed to touch the vebox ring by itself was
unlikely to be an issue since the HW should be initializing the values
to 0. However the error framework for testing seqno wrap introduced by
Mika, in addition to the hangcheck via seqno, and
i915_error_first_batchbuffer() combined caused a nice explosion.

The problem is caused by seqno wrap because the wrap condition is not
properly setup. The wrap code attempts to set the sync mailboxes all to
0, and then set the current seqno to one less than 0. In all cases, the
vebox mailbox wasn't properly being initialized. This caused a wrap to
not occur. When hangcheck kicks in with the bogus seqno values, the rest
just doesn't work. It makes me wonder if we shouldn't consider a dumber
version of hangcheck...

How we messed this up:
VECS support was written before the aforementioned other features. Upon
VECS being rebased, these facts were missed.
Daniel Vetter Aug. 13, 2013, 9:12 p.m. UTC | #3
On Tue, Aug 13, 2013 at 01:55:36PM -0700, Ben Widawsky wrote:
> On Mon, Aug 12, 2013 at 04:53:03PM -0700, Ben Widawsky wrote:
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65387
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67198
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 10c2aaa..665602f 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1592,6 +1592,8 @@ void intel_ring_init_seqno(struct intel_ring_buffer *ring, u32 seqno)
> >  	if (INTEL_INFO(ring->dev)->gen >= 6) {
> >  		I915_WRITE(RING_SYNC_0(ring->mmio_base), 0);
> >  		I915_WRITE(RING_SYNC_1(ring->mmio_base), 0);
> > +		if (HAS_VEBOX(ring->dev))
> > +			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
> >  	}
> >  
> >  	ring->set_seqno(ring, seqno);
> 
> We require n-1 mailboxes for proper semaphore synchronization. All
> semaphore synchronization code relies on proper values in these
> mailboxes. The fact that we failed to touch the vebox ring by itself was
> unlikely to be an issue since the HW should be initializing the values
> to 0. However the error framework for testing seqno wrap introduced by
> Mika, in addition to the hangcheck via seqno, and
> i915_error_first_batchbuffer() combined caused a nice explosion.
> 
> The problem is caused by seqno wrap because the wrap condition is not
> properly setup. The wrap code attempts to set the sync mailboxes all to
> 0, and then set the current seqno to one less than 0. In all cases, the
> vebox mailbox wasn't properly being initialized. This caused a wrap to
> not occur. When hangcheck kicks in with the bogus seqno values, the rest
> just doesn't work. It makes me wonder if we shouldn't consider a dumber
> version of hangcheck...
> 
> How we messed this up:
> VECS support was written before the aforementioned other features. Upon
> VECS being rebased, these facts were missed.

Both patches applied, thanks.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 10c2aaa..665602f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1592,6 +1592,8 @@  void intel_ring_init_seqno(struct intel_ring_buffer *ring, u32 seqno)
 	if (INTEL_INFO(ring->dev)->gen >= 6) {
 		I915_WRITE(RING_SYNC_0(ring->mmio_base), 0);
 		I915_WRITE(RING_SYNC_1(ring->mmio_base), 0);
+		if (HAS_VEBOX(ring->dev))
+			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
 	}
 
 	ring->set_seqno(ring, seqno);