Message ID | 1376684973-14743-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Aug 16, 2013 at 01:29:33PM -0700, Ben Widawsky wrote: > The vma will [possibly] be destroyed during unbind in eviction. > Immediately after this, we try to delete the list entry. > > Chris and Ville did the debug on this before I woke up, I just get to > take credit for the fix :p > > v2: Missed the drm_object_unreference use after free (Ville) > > Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Should have spotted this earlier :( Can I have a couple more lines of whitespace? Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Fri, Aug 16, 2013 at 11:31:12PM +0100, Chris Wilson wrote: > On Fri, Aug 16, 2013 at 01:29:33PM -0700, Ben Widawsky wrote: > > The vma will [possibly] be destroyed during unbind in eviction. > > Immediately after this, we try to delete the list entry. > > > > Chris and Ville did the debug on this before I woke up, I just get to > > take credit for the fix :p > > > > v2: Missed the drm_object_unreference use after free (Ville) > > > > Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Since the commit message lacks that crucial piece of information: How was this discovered? I use that to tune my gut feeling for gauging the -nightly test effectiveness ... > Should have spotted this earlier :( > > Can I have a couple more lines of whitespace? > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Queued for -next, thanks for the patch. -Daniel
On Sun, Aug 18, 2013 at 07:26:57PM +0200, Daniel Vetter wrote: > On Fri, Aug 16, 2013 at 11:31:12PM +0100, Chris Wilson wrote: > > On Fri, Aug 16, 2013 at 01:29:33PM -0700, Ben Widawsky wrote: > > > The vma will [possibly] be destroyed during unbind in eviction. > > > Immediately after this, we try to delete the list entry. > > > > > > Chris and Ville did the debug on this before I woke up, I just get to > > > take credit for the fix :p > > > > > > v2: Missed the drm_object_unreference use after free (Ville) > > > > > > Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > Since the commit message lacks that crucial piece of information: How was > this discovered? I use that to tune my gut feeling for gauging the > -nightly test effectiveness ... Mika pasted an oops on #intel-gfx. Chris and Ville had is solved before I woke up. It's pretty strange, Chris said the bug existed in the original ppgtt2 branch (I'm too lazy to check). In many runs for myself, and QA, I'd not seen the oops though. I really can't explain it.
On Mon, Aug 19, 2013 at 12:35 AM, Ben Widawsky <ben@bwidawsk.net> wrote: > On Sun, Aug 18, 2013 at 07:26:57PM +0200, Daniel Vetter wrote: >> On Fri, Aug 16, 2013 at 11:31:12PM +0100, Chris Wilson wrote: >> > On Fri, Aug 16, 2013 at 01:29:33PM -0700, Ben Widawsky wrote: >> > > The vma will [possibly] be destroyed during unbind in eviction. >> > > Immediately after this, we try to delete the list entry. >> > > >> > > Chris and Ville did the debug on this before I woke up, I just get to >> > > take credit for the fix :p >> > > >> > > v2: Missed the drm_object_unreference use after free (Ville) >> > > >> > > Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> >> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> >> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> >> >> Since the commit message lacks that crucial piece of information: How was >> this discovered? I use that to tune my gut feeling for gauging the >> -nightly test effectiveness ... > > Mika pasted an oops on #intel-gfx. Chris and Ville had is solved before > I woke up. It's pretty strange, Chris said the bug existed in the > original ppgtt2 branch (I'm too lazy to check). In many runs for myself, > and QA, I'd not seen the oops though. I really can't explain it. Thanks for the explanation. Please add such information (including the full Oops) to the commit message next time around. I've asked Mika for the backtrace meanwhile. I guess this is another candidate for a testcase - if you and QA have beat on this for a while and couldn't hit this bug we need to try harder to hit bugs ;-) -Daniel
Daniel Vetter <daniel@ffwll.ch> writes: > On Mon, Aug 19, 2013 at 12:35 AM, Ben Widawsky <ben@bwidawsk.net> wrote: >> On Sun, Aug 18, 2013 at 07:26:57PM +0200, Daniel Vetter wrote: >>> On Fri, Aug 16, 2013 at 11:31:12PM +0100, Chris Wilson wrote: >>> > On Fri, Aug 16, 2013 at 01:29:33PM -0700, Ben Widawsky wrote: >>> > > The vma will [possibly] be destroyed during unbind in eviction. >>> > > Immediately after this, we try to delete the list entry. >>> > > >>> > > Chris and Ville did the debug on this before I woke up, I just get to >>> > > take credit for the fix :p >>> > > >>> > > v2: Missed the drm_object_unreference use after free (Ville) >>> > > >>> > > Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> >>> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >>> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> >>> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> >>> >>> Since the commit message lacks that crucial piece of information: How was >>> this discovered? I use that to tune my gut feeling for gauging the >>> -nightly test effectiveness ... >> >> Mika pasted an oops on #intel-gfx. Chris and Ville had is solved before >> I woke up. It's pretty strange, Chris said the bug existed in the >> original ppgtt2 branch (I'm too lazy to check). In many runs for myself, >> and QA, I'd not seen the oops though. I really can't explain it. > > Thanks for the explanation. Please add such information (including the > full Oops) to the commit message next time around. I've asked Mika for > the backtrace meanwhile. Here is the trace: [ 403.472448] BUG: unable to handle kernel paging request at 6b6b6b6b [ 403.472473] IP: [<c12c1500>] __list_del_entry+0x20/0xe0 [ 403.472514] *pdpt = 000000002e89c001 *pde = 0000000000000000 [ 403.472556] Oops: 0000 [#1] SMP [ 403.472582] Modules linked in: mxm_wmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi psmouse snd_seq_midi_event snd_seq serio_raw snd_timer snd_seq_device snd soundcore snd_page_alloc wmi bnep rfcomm bluetooth mac_hid parport_pc ppdev lp parport usbhid dm_crypt firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper e1000e ptp drm i2c_algo_bit pps_core xhci_hcd video [ 403.472895] CPU: 2 PID: 1940 Comm: Xorg Not tainted 3.11.0-rc2+ #827 [ 403.472938] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0070.2012.0416.2117 04/16/2012 [ 403.473002] task: ec866c00 ti: ee6a2000 task.ti: ee6a2000 [ 403.473039] EIP: 0060:[<c12c1500>] EFLAGS: 00013202 CPU: 2 [ 403.473078] EIP is at __list_del_entry+0x20/0xe0 [ 403.473109] EAX: f016d9bc EBX: f016d9bc ECX: 6b6b6b6b EDX: 6b6b6b6b [ 403.473151] ESI: 00000000 EDI: ee6a3c90 EBP: ee6a3c60 ESP: ee6a3c48 [ 403.473193] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 403.473230] CR0: 80050033 CR2: 6b6b6b6b CR3: 2ec43000 CR4: 001407f0 [ 403.473271] Stack: [ 403.473285] f63b2ff0 f61f98c0 f61f8000 f016d9bc 00000000 f016d9bc ee6a3cac f8519a4a [ 403.473347] 00000000 00000000 10000000 f61f8000 0100a000 10000000 00000001 008ca000 [ 403.473410] f64ee840 f61f98c0 f016d9bc f016dcec ee6a3c98 ee6a3c98 f61f98c0 dcc58f00 [ 403.473472] Call Trace: [ 403.473509] [<f8519a4a>] i915_gem_evict_something+0x17a/0x2d0 [i915] [ 403.473567] [<f8516ed1>] i915_gem_object_pin+0x271/0x660 [i915] [ 403.473622] [<f851c740>] ? i915_ggtt_clear_range+0x20/0x20 [i915] [ 403.473676] [<f8517afa>] i915_gem_object_pin_to_display_plane+0xda/0x190 [i915] [ 403.473742] [<f852d9fa>] intel_pin_and_fence_fb_obj+0xba/0x140 [i915] [ 403.473800] [<f852db40>] intel_gen7_queue_flip+0x30/0x1c0 [i915] [ 403.473856] [<f85337b0>] intel_crtc_page_flip+0x1a0/0x320 [i915] [ 403.473911] [<f847b549>] ? drm_framebuffer_reference+0x39/0x80 [drm] [ 403.473965] [<f847f9fb>] drm_mode_page_flip_ioctl+0x28b/0x320 [drm] [ 403.474018] [<f846fec8>] drm_ioctl+0x4b8/0x560 [drm] [ 403.474064] [<f847f770>] ? drm_mode_gamma_get_ioctl+0xd0/0xd0 [drm] [ 403.474113] [<c1140f8a>] ? do_sync_read+0x6a/0xa0 [ 403.474154] [<f846fa10>] ? drm_copy_field+0x80/0x80 [drm] [ 403.474193] [<c115134c>] do_vfs_ioctl+0x7c/0x5b0 [ 403.474228] [<c1141d2f>] ? vfs_read+0xef/0x160 [ 403.474263] [<c108dcbb>] ? ktime_get_ts+0x4b/0x120 [ 403.474298] [<c1151917>] SyS_ioctl+0x97/0xa0 [ 403.474330] [<c1590bc1>] sysenter_do_call+0x12/0x22 [ 403.474364] Code: 55 f4 8b 45 f8 e9 75 ff ff ff 90 55 89 e5 53 83 ec 14 8b 08 8b 50 04 81 f9 00 01 10 00 74 24 81 fa 00 02 20 00 0f 84 8e 00 00 00 <8b> 1a 39 d8 75 62 8b 59 04 39 d8 75 35 89 51 04 89 0a 83 c4 14 [ 403.474566] EIP: [<c12c1500>] __list_del_entry+0x20/0xe0 SS:ESP 0068:ee6a3c48 [ 403.476513] CR2: 000000006b6b6b6b
On Mon, Aug 19, 2013 at 10:49 AM, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote: >>> Mika pasted an oops on #intel-gfx. Chris and Ville had is solved before >>> I woke up. It's pretty strange, Chris said the bug existed in the >>> original ppgtt2 branch (I'm too lazy to check). In many runs for myself, >>> and QA, I'd not seen the oops though. I really can't explain it. >> >> Thanks for the explanation. Please add such information (including the >> full Oops) to the commit message next time around. I've asked Mika for >> the backtrace meanwhile. > > Here is the trace: Thanks, added to the commit message. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 0cbaad4..3b7b74e 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -136,14 +136,17 @@ found: /* Unbinding will emit any required flushes */ while (!list_empty(&eviction_list)) { + struct drm_gem_object *obj; vma = list_first_entry(&eviction_list, struct i915_vma, exec_list); + + obj = &vma->obj->base; + list_del_init(&vma->exec_list); if (ret == 0) ret = i915_vma_unbind(vma); - list_del_init(&vma->exec_list); - drm_gem_object_unreference(&vma->obj->base); + drm_gem_object_unreference(obj); } return ret;
The vma will [possibly] be destroyed during unbind in eviction. Immediately after this, we try to delete the list entry. Chris and Ville did the debug on this before I woke up, I just get to take credit for the fix :p v2: Missed the drm_object_unreference use after free (Ville) Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem_evict.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)