Message ID | 20170302145323.12886-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 02, 2017 at 02:53:23PM +0000, Chris Wilson wrote: > __i915_guc_submit may be, despite my assertion, called from outside of > an irq-safe spinlock so we need to use a full spin_lock_irqsave and not > cheat using a spin_lock. (The initial notify callback from the completed > fence is called before the spinlock is taken to wake up all waiters and > call their callbacks.) > > [ 48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527! > [ 48.166617] invalid opcode: 0000 [#1] PREEMPT SMP > [ 48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915] > [ 48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G U 4.10.0nightly-170302-guc_scrub+ #19 > [ 48.166778] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016 > [ 48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915] > [ 48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000 > [ 48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915] > [ 48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202 > [ 48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201 > [ 48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040 > [ 48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000 > [ 48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148 > [ 48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660 > [ 48.167156] FS: 0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000 > [ 48.167195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0 > [ 48.167257] Call Trace: > [ 48.168112] ? trace_hardirqs_on+0xd/0x10 > [ 48.168966] ? _raw_spin_unlock_irqrestore+0x4a/0x80 > [ 48.169831] i915_guc_submit+0x1a/0x20 [i915] > [ 48.170680] submit_notify+0x89/0xc0 [i915] > [ 48.171512] __i915_sw_fence_complete+0x175/0x220 [i915] > [ 48.172340] i915_sw_fence_complete+0x2a/0x50 [i915] > [ 48.173158] i915_sw_fence_commit+0x21/0x30 [i915] > [ 48.173968] __i915_add_request+0x238/0x530 [i915] > [ 48.174764] __intel_autoenable_gt_powersave+0x8b/0xb0 [i915] > [ 48.175549] process_one_work+0x218/0x690 > [ 48.176318] ? process_one_work+0x197/0x690 > [ 48.177183] worker_thread+0x4e/0x4a0 > [ 48.178039] kthread+0x10c/0x140 > [ 48.178878] ? process_one_work+0x690/0x690 > [ 48.179718] ? kthread_create_on_node+0x40/0x40 > [ 48.180568] ret_from_fork+0x31/0x40 > [ 48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a > [ 48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80 > > Reported-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> > Fixes: 349ab9192cc3 ("drm/i915/guc: Make wq_lock irq-safe") > Fixes: 67b807a89230 ("drm/i915: Delay disabling the user interrupt for breadcrumbs") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Tested-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
On 02/03/2017 16:53, Chris Wilson wrote: > __i915_guc_submit may be, despite my assertion, called from outside of > an irq-safe spinlock so we need to use a full spin_lock_irqsave and not > cheat using a spin_lock. (The initial notify callback from the completed > fence is called before the spinlock is taken to wake up all waiters and > call their callbacks.) > > [ 48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527! > [ 48.166617] invalid opcode: 0000 [#1] PREEMPT SMP > [ 48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915] > [ 48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G U 4.10.0nightly-170302-guc_scrub+ #19 > [ 48.166778] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016 > [ 48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915] > [ 48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000 > [ 48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915] > [ 48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202 > [ 48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201 > [ 48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040 > [ 48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000 > [ 48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148 > [ 48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660 > [ 48.167156] FS: 0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000 > [ 48.167195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0 > [ 48.167257] Call Trace: > [ 48.168112] ? trace_hardirqs_on+0xd/0x10 > [ 48.168966] ? _raw_spin_unlock_irqrestore+0x4a/0x80 > [ 48.169831] i915_guc_submit+0x1a/0x20 [i915] > [ 48.170680] submit_notify+0x89/0xc0 [i915] > [ 48.171512] __i915_sw_fence_complete+0x175/0x220 [i915] > [ 48.172340] i915_sw_fence_complete+0x2a/0x50 [i915] > [ 48.173158] i915_sw_fence_commit+0x21/0x30 [i915] > [ 48.173968] __i915_add_request+0x238/0x530 [i915] > [ 48.174764] __intel_autoenable_gt_powersave+0x8b/0xb0 [i915] > [ 48.175549] process_one_work+0x218/0x690 > [ 48.176318] ? process_one_work+0x197/0x690 > [ 48.177183] worker_thread+0x4e/0x4a0 > [ 48.178039] kthread+0x10c/0x140 > [ 48.178878] ? process_one_work+0x690/0x690 > [ 48.179718] ? kthread_create_on_node+0x40/0x40 > [ 48.180568] ret_from_fork+0x31/0x40 > [ 48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a > [ 48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80 > > Reported-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> > Fixes: 349ab9192cc3 ("drm/i915/guc: Make wq_lock irq-safe") > Fixes: 67b807a89230 ("drm/i915: Delay disabling the user interrupt for breadcrumbs") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> > --- > drivers/gpu/drm/i915/i915_guc_submission.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c > index 7b535a32fc27..beb38e30d0e9 100644 > --- a/drivers/gpu/drm/i915/i915_guc_submission.c > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c > @@ -515,6 +515,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) > unsigned int engine_id = engine->id; > struct intel_guc *guc = &rq->i915->guc; > struct i915_guc_client *client = guc->execbuf_client; > + unsigned long flags; > int b_ret; > > /* WA to flush out the pending GMADR writes to ring buffer. */ > @@ -523,10 +524,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) > > trace_i915_gem_request_in(rq, 0); > > - /* We are always called with irqs disabled */ > - GEM_BUG_ON(!irqs_disabled()); > - > - spin_lock(&client->wq_lock); > + spin_lock_irqsave(&client->wq_lock, flags); > > guc_wq_item_append(client, rq); > b_ret = guc_ring_doorbell(client); > @@ -539,7 +537,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) > guc->submissions[engine_id] += 1; > guc->last_seqno[engine_id] = rq->global_seqno; > > - spin_unlock(&client->wq_lock); > + spin_unlock_irqrestore(&client->wq_lock, flags); > } > > static void i915_guc_submit(struct drm_i915_gem_request *rq) > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko P.S. For some reason all your recent e-mail where you copy me at @intel.com ends up in Outlook's junk folder (even though I turned off junk filtering!). When/if you remember please use @linux.intel.com instead.
On Thu, Mar 02, 2017 at 04:18:05PM -0000, Patchwork wrote: > == Series Details == > > Series: drm/i915/guc: Disable irq for __i915_guc_submit wq_lock > URL : https://patchwork.freedesktop.org/series/20564/ > State : success > > == Summary == > > Series 20564v1 drm/i915/guc: Disable irq for __i915_guc_submit wq_lock > https://patchwork.freedesktop.org/api/1.0/series/20564/revisions/1/mbox/ > > Test gem_exec_flush: > Subgroup basic-batch-kernel-default-uc: > fail -> PASS (fi-snb-2600) fdo#100007 > > fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007 > > fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 > fi-bsw-n3050 total:278 pass:239 dwarn:0 dfail:0 fail:0 skip:39 > fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 > fi-bxt-t5700 total:278 pass:258 dwarn:0 dfail:0 fail:0 skip:20 > fi-byt-j1900 total:278 pass:251 dwarn:0 dfail:0 fail:0 skip:27 > fi-byt-n2820 total:278 pass:247 dwarn:0 dfail:0 fail:0 skip:31 > fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 > fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 > fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 > fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 > fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 > fi-kbl-7500u total:278 pass:259 dwarn:1 dfail:0 fail:0 skip:18 > fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 > fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 > fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 > fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 > fi-snb-2600 total:278 pass:249 dwarn:0 dfail:0 fail:0 skip:29 > > a5783262a0dd79ed8a7d244352378a3db668c5b2 drm-tip: 2017y-03m-02d-14h-56m-29s UTC integration manifest > 488c25f drm/i915/guc: Disable irq for __i915_guc_submit wq_lock Pushed this fix along with a couple of other debugging aides reviewed earlier today. Thanks, -Chris
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index 7b535a32fc27..beb38e30d0e9 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -515,6 +515,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) unsigned int engine_id = engine->id; struct intel_guc *guc = &rq->i915->guc; struct i915_guc_client *client = guc->execbuf_client; + unsigned long flags; int b_ret; /* WA to flush out the pending GMADR writes to ring buffer. */ @@ -523,10 +524,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) trace_i915_gem_request_in(rq, 0); - /* We are always called with irqs disabled */ - GEM_BUG_ON(!irqs_disabled()); - - spin_lock(&client->wq_lock); + spin_lock_irqsave(&client->wq_lock, flags); guc_wq_item_append(client, rq); b_ret = guc_ring_doorbell(client); @@ -539,7 +537,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) guc->submissions[engine_id] += 1; guc->last_seqno[engine_id] = rq->global_seqno; - spin_unlock(&client->wq_lock); + spin_unlock_irqrestore(&client->wq_lock, flags); } static void i915_guc_submit(struct drm_i915_gem_request *rq)
__i915_guc_submit may be, despite my assertion, called from outside of an irq-safe spinlock so we need to use a full spin_lock_irqsave and not cheat using a spin_lock. (The initial notify callback from the completed fence is called before the spinlock is taken to wake up all waiters and call their callbacks.) [ 48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527! [ 48.166617] invalid opcode: 0000 [#1] PREEMPT SMP [ 48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915] [ 48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G U 4.10.0nightly-170302-guc_scrub+ #19 [ 48.166778] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016 [ 48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915] [ 48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000 [ 48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915] [ 48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202 [ 48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201 [ 48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040 [ 48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000 [ 48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148 [ 48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660 [ 48.167156] FS: 0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000 [ 48.167195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0 [ 48.167257] Call Trace: [ 48.168112] ? trace_hardirqs_on+0xd/0x10 [ 48.168966] ? _raw_spin_unlock_irqrestore+0x4a/0x80 [ 48.169831] i915_guc_submit+0x1a/0x20 [i915] [ 48.170680] submit_notify+0x89/0xc0 [i915] [ 48.171512] __i915_sw_fence_complete+0x175/0x220 [i915] [ 48.172340] i915_sw_fence_complete+0x2a/0x50 [i915] [ 48.173158] i915_sw_fence_commit+0x21/0x30 [i915] [ 48.173968] __i915_add_request+0x238/0x530 [i915] [ 48.174764] __intel_autoenable_gt_powersave+0x8b/0xb0 [i915] [ 48.175549] process_one_work+0x218/0x690 [ 48.176318] ? process_one_work+0x197/0x690 [ 48.177183] worker_thread+0x4e/0x4a0 [ 48.178039] kthread+0x10c/0x140 [ 48.178878] ? process_one_work+0x690/0x690 [ 48.179718] ? kthread_create_on_node+0x40/0x40 [ 48.180568] ret_from_fork+0x31/0x40 [ 48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a [ 48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80 Reported-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Fixes: 349ab9192cc3 ("drm/i915/guc: Make wq_lock irq-safe") Fixes: 67b807a89230 ("drm/i915: Delay disabling the user interrupt for breadcrumbs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> --- drivers/gpu/drm/i915/i915_guc_submission.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)