drm/i915: Try enabling RC6 by default (again)

Message ID	1312428234-8526-1-git-send-email-keithp@keithp.com (mailing list archive)
State	New, archived
Headers	show Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p743OYrJ012332 for <patchwork-intel-gfx@patchwork.kernel.org>; Thu, 4 Aug 2011 03:24:55 GMT From: Keith Packard <keithp@keithp.com> To: intel-gfx@lists.freedesktop.org Date: Wed, 3 Aug 2011 20:23:54 -0700 Message-Id: <1312428234-8526-1-git-send-email-keithp@keithp.com> Cc: Pekka Enberg <penberg@kernel.org>, Gu Rui <chaos.proton@gmail.com>, Francesco Allertsen <fallertsen@gmail.com> Subject: [Intel-gfx] [PATCH] drm/i915: Try enabling RC6 by default (again) Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Keith Packard Aug. 4, 2011, 3:23 a.m. UTC

Jesse Barnes and I found a couple of issues where incorrect mode
setting would cause problems with RC6 enabled. We're hopeful that
fixing those will resolve the outstanding issues with a few machines
that had trouble before 3.0 with rc6.

Cc: Pekka Enberg <penberg@kernel.org>
Cc: Francesco Allertsen <fallertsen@gmail.com>
Cc: Ted Phelps <phelps@gnusto.com>
Cc: Gu Rui <chaos.proton@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38567
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=38332
Signed-off-by: Keith Packard <keithp@keithp.com>
---
 drivers/gpu/drm/i915/i915_drv.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Ted Phelps Aug. 4, 2011, 2:05 p.m. UTC | #1

Keith Packard writes:
> Jesse Barnes and I found a couple of issues where incorrect mode
> setting would cause problems with RC6 enabled. We're hopeful that
> fixing those will resolve the outstanding issues with a few machines
> that had trouble before 3.0 with rc6.

Thanks for continuing to bash on this, but I'm afraid I'm still having
issues.

I saw a bunch of stack traces like the one below, alternating between
ring_put and ring_get, before the GPU was declared wedged.

------------[ cut here ]------------
WARNING: at drivers/gpu/drm/i915/i915_drv.c:387 __gen6_gt_wait_for_fifo+0x8a/0x90 [i915]()
Hardware name:         
Modules linked in: tun ohci_hcd i915 fbcon font bitblit softcursor drm_kms_helper drm pl2303 usbserial e1000e cfbcopyarea video cfbimgblt cfbfillrect processor button xfs
Pid: 2045, comm: X Not tainted 3.0.0-00175-g07b7ddd #262
Call Trace:
 [<ffffffff81047dca>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81047e15>] warn_slowpath_null+0x15/0x20
 [<ffffffffa019573a>] __gen6_gt_wait_for_fifo+0x8a/0x90 [i915]
 [<ffffffffa01eb7a3>] gen6_ring_put_irq+0xb3/0x150 [i915]
 [<ffffffffa01eb853>] blt_ring_put_irq+0x13/0x20 [i915]
 [<ffffffffa01b55b4>] i915_wait_request+0x1d4/0x710 [i915]
 [<ffffffff8106a0f0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa01b5b22>] i915_gem_object_wait_rendering+0x32/0x40 [i915]
 [<ffffffffa01b78b8>] i915_gem_object_set_to_gtt_domain+0xc8/0x160 [i915]
 [<ffffffffa01b7a09>] i915_gem_set_domain_ioctl+0xb9/0x100 [i915]
 [<ffffffffa0113514>] drm_ioctl+0x3d4/0x4c0 [drm]
 [<ffffffffa01b7950>] ? i915_gem_object_set_to_gtt_domain+0x160/0x160 [i915]
 [<ffffffff81537169>] ? _raw_spin_unlock_irq+0x9/0x40
 [<ffffffff8100b654>] ? check_for_xstate+0x44/0xb0
 [<ffffffff81117037>] do_vfs_ioctl+0x97/0x5d0
 [<ffffffff8100234b>] ? sys_rt_sigreturn+0x1fb/0x210
 [<ffffffff81117601>] sys_ioctl+0x91/0xa0
 [<ffffffff8153dafb>] system_call_fastpath+0x16/0x1b
---[ end trace 070cd7741c91254d ]---
------------[ cut here ]------------

... lots of the above ...

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 936788 at 936786, next 936789)
[drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[drm:init_ring_common] *ERROR* gen6 bsd ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[drm:init_ring_common] *ERROR* blt ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 936792 at 936783, next 936793)
[drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[drm:i915_reset] *ERROR* Failed to reset chip.

This is with mesa-7.11rc4, xf86-video-intel-2.15.0 and a Linux kernel
3.0.0-00175-g07b7ddd (head of keithp's drm-intel-next).

I'll attach the i915_error_state to bug #38567.

Cheers,
-Ted

Daniel Vetter Aug. 4, 2011, 3:51 p.m. UTC | #2

On Thu, Aug 4, 2011 at 16:05, Ted Phelps <phelps@gnusto.com> wrote:
> I saw a bunch of stack traces like the one below, alternating between
> ring_put and ring_get, before the GPU was declared wedged.

How much time happens between the first wait_for_fifo backtrace and
the final gpu hang report? If it's short (order of a second or less)
can you rehang your gpu with kernel log timestamping switched on?
-Daniel

Keith Packard Aug. 4, 2011, 6:05 p.m. UTC | #3

On Thu, 4 Aug 2011 07:10:27 +0200, Francesco Allertsen <fallertsen@gmail.com> wrote:
> Hi Keith,
> 
> On Wed, Aug 03, 2011 at 08:23:54PM -0700, Keith Packard wrote:
> > Jesse Barnes and I found a couple of issues where incorrect mode
> > setting would cause problems with RC6 enabled. We're hopeful that
> > fixing those will resolve the outstanding issues with a few machines
> > that had trouble before 3.0 with rc6.
> 
> I have applied this patch on top of the last Linus tree, and I got the
> same freeze as before.

Linus' tree does not have the relevant fixes; I was posting that patch
to let you know it was coming. The fixes are in my drm-intel-next
branch, which I've asked Dave Airlie to pull in.

> Unfortunatly I'm not at home and I cannot debug it, but it seems to be
> the same problem.

Let's hope the new code works then.

Keith Packard Aug. 4, 2011, 6:08 p.m. UTC | #4

On Thu, 4 Aug 2011 16:29:50 +0800, Grissiom <chaos.proton@gmail.com> wrote:

> 1) I got to know that I'm not the only one who encountered with this bug.

Even one failure would have been sufficient to revert this for 3.0 (and,
for 3.1 as well if we find more troubles).

> 2) I tried keithp/drm-intel-next with i915.i915_enable_rc6=1 set on
> cmdline. And it *works* ;)

Yay!

> I'm not sure whether it means rc6 is truly enabled.

If you've set i916_enable_rc6=1 (either on the command line, or via the
patch), then it should be on.

> AKAIK, Linus haven't merge keithp right now. At least, keithp's
> drm-intel-next works for me.

Right, I sent out the patch to let people know it was coming and to get
the few people with trouble to re-try once it had gotten merged.

Ted Phelps Aug. 5, 2011, 1:41 a.m. UTC | #5

Daniel Vetter writes:
> On Thu, Aug 4, 2011 at 16:05, Ted Phelps <phelps@gnusto.com> wrote:
> > I saw a bunch of stack traces like the one below, alternating between
> > ring_put and ring_get, before the GPU was declared wedged.
> 
> How much time happens between the first wait_for_fifo backtrace and
> the final gpu hang report?

About 6 seconds.  All 20 ring_put/ring_get failures happen in the same
second, then there's a 4 second pause before the GPU is first declared
hung, and another couple of seconds before the GPU is declared wedged:

Aug  4 23:37:02 orpheus kernel: ------------[ cut here ]------------
...
Aug  4 23:37:06 orpheus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
...
Aug  4 23:37:08 orpheus kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Aug  4 23:37:08 orpheus kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

> If it's short (order of a second or less) can you rehang your gpu with
> kernel log timestamping switched on?

I'm happy to re-hang the GPU with kernel log timestamping enabled, but I
won't be able to do so until Sunday night.

Cheers,
-Ted

Francesco Allertsen Aug. 5, 2011, 4:54 a.m. UTC | #6

On Thu, Aug 04, 2011 at 11:05:39AM -0700, Keith Packard wrote:
> Linus' tree does not have the relevant fixes; I was posting that patch
> to let you know it was coming. The fixes are in my drm-intel-next
> branch, which I've asked Dave Airlie to pull in.

I have tested your branch drm-intel-next and I got the same freeze,
nothing seems to be solved.

And as I have seen, Dave has already sent a pull request to Linus, so
probably in a few days also the git master will freeze my laptop :-(.

Unfortunatly I don't have another computer to debug the problem, but if
you want to send me other patches to try (or other stuff) I'll try those
as soon as possibile.

Bye
Francesco

Ted Phelps Aug. 8, 2011, 3:44 a.m. UTC | #7

Ted Phelps writes:
> Daniel Vetter writes:
> > If it's short (order of a second or less) can you rehang your gpu with
> > kernel log timestamping switched on?
> 
> I'm happy to re-hang the GPU with kernel log timestamping enabled, but I
> won't be able to do so until Sunday night.

I still haven't managed to wedge the GPU with timestamping enabled, but
it has reported a few hangs:

[ 5529.990334] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5529.990342] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5529.992971] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1226969 at 1226966, next 1226970)
[ 6790.804605] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 6790.804626] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1906573 at 1906570, next 1906574)
[13148.089287] ------------[ cut here ]------------
[13148.089296] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387 __gen6_gt_wait_for_fifo+0x8a/0x90 [i915]()
[13148.089297] Hardware name:         
[13148.089298] Modules linked in: tun ohci_hcd i915 fbcon font bitblit softcursor drm_kms_helper drm pl2303 usbserial e1000e cfbcopyarea video cfbimgblt cfbfillrect processor button xfs
[13148.089308] Pid: 2042, comm: X Not tainted 3.0.0-00175-g07b7ddd #263
[13148.089310] Call Trace:
[13148.089315]  [<ffffffff81047eea>] warn_slowpath_common+0x7a/0xb0
[13148.089317]  [<ffffffff81047f35>] warn_slowpath_null+0x15/0x20
[13148.089320]  [<ffffffffa019073a>] __gen6_gt_wait_for_fifo+0x8a/0x90 [i915]
[13148.089327]  [<ffffffffa01e5f6d>] ring_write_tail+0x5d/0xf0 [i915]
[13148.089333]  [<ffffffffa01e95c7>] blt_ring_flush+0xe7/0x110 [i915]
[13148.089337]  [<ffffffffa01b0b72>] i915_gem_flush_ring+0x42/0x230 [i915]
[13148.089342]  [<ffffffffa01b4abe>] i915_gem_busy_ioctl+0xee/0x160 [i915]
[13148.089348]  [<ffffffffa0109514>] drm_ioctl+0x3d4/0x4c0 [drm]
[13148.089352]  [<ffffffffa01b49d0>] ? i915_gem_unpin_ioctl+0xf0/0xf0 [i915]
[13148.089356]  [<ffffffff81269cd5>] ? timerqueue_del+0x35/0x90
[13148.089358]  [<ffffffff8111b7c7>] do_vfs_ioctl+0x97/0x5d0
[13148.089360]  [<ffffffff8153bb79>] ? _raw_spin_unlock_irq+0x9/0x40
[13148.089363]  [<ffffffff8104d044>] ? do_setitimer+0x1b4/0x240
[13148.089364]  [<ffffffff8111bd91>] sys_ioctl+0x91/0xa0
[13148.089366]  [<ffffffff815424fb>] system_call_fastpath+0x16/0x1b
[13148.089368] ---[ end trace cbadf99dfca64e1a ]---
[13148.094455] ------------[ cut here ]------------
[13148.094460] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387 __gen6_gt_wait_for_fifo+0x8a/0x90 [i915]()
[13148.094461] Hardware name:         
[13148.094462] Modules linked in: tun ohci_hcd i915 fbcon font bitblit softcursor drm_kms_helper drm pl2303 usbserial e1000e cfbcopyarea video cfbimgblt cfbfillrect processor button xfs
[13148.094470] Pid: 2042, comm: X Tainted: G        W   3.0.0-00175-g07b7ddd #263
[13148.094471] Call Trace:
[13148.094473]  [<ffffffff81047eea>] warn_slowpath_common+0x7a/0xb0
[13148.094475]  [<ffffffff81047f35>] warn_slowpath_null+0x15/0x20
[13148.094478]  [<ffffffffa019073a>] __gen6_gt_wait_for_fifo+0x8a/0x90 [i915]
[13148.094484]  [<ffffffffa01e5f6d>] ring_write_tail+0x5d/0xf0 [i915]
[13148.094489]  [<ffffffffa01e95c7>] blt_ring_flush+0xe7/0x110 [i915]
[13148.094493]  [<ffffffffa01b0b72>] i915_gem_flush_ring+0x42/0x230 [i915]
[13148.094498]  [<ffffffffa01b7c08>] i915_gem_do_execbuffer.clone.7+0xdb8/0x1790 [i915]
[13148.094503]  [<ffffffffa01b8a70>] ? i915_gem_execbuffer2+0x40/0x250 [i915]
[13148.094507]  [<ffffffffa01b8ac7>] i915_gem_execbuffer2+0x97/0x250 [i915]
[13148.094511]  [<ffffffffa0109514>] drm_ioctl+0x3d4/0x4c0 [drm]
[13148.094515]  [<ffffffffa01b8a30>] ? i915_gem_execbuffer+0x450/0x450 [i915]
[13148.094517]  [<ffffffff8111b7c7>] do_vfs_ioctl+0x97/0x5d0
[13148.094518]  [<ffffffff8111bd91>] sys_ioctl+0x91/0xa0
[13148.094520]  [<ffffffff815424fb>] system_call_fastpath+0x16/0x1b
[13148.094521] ---[ end trace cbadf99dfca64e1b ]---

Hope that's useful!

Thanks,
-Ted

drm/i915: Try enabling RC6 by default (again)

Commit Message

Comments

Patch