diff mbox

drm/i915/guc: Disable irq for __i915_guc_submit wq_lock

Message ID 20170302145323.12886-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 2, 2017, 2:53 p.m. UTC
__i915_guc_submit may be, despite my assertion, called from outside of
an irq-safe spinlock so we need to use a full spin_lock_irqsave and not
cheat using a spin_lock. (The initial notify callback from the completed
fence is called before the spinlock is taken to wake up all waiters and
call their callbacks.)

[   48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527!
[   48.166617] invalid opcode: 0000 [#1] PREEMPT SMP
[   48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915]
[   48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G     U          4.10.0nightly-170302-guc_scrub+ #19
[   48.166778] Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016
[   48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915]
[   48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000
[   48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915]
[   48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202
[   48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201
[   48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040
[   48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000
[   48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148
[   48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660
[   48.167156] FS:  0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000
[   48.167195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0
[   48.167257] Call Trace:
[   48.168112]  ? trace_hardirqs_on+0xd/0x10
[   48.168966]  ? _raw_spin_unlock_irqrestore+0x4a/0x80
[   48.169831]  i915_guc_submit+0x1a/0x20 [i915]
[   48.170680]  submit_notify+0x89/0xc0 [i915]
[   48.171512]  __i915_sw_fence_complete+0x175/0x220 [i915]
[   48.172340]  i915_sw_fence_complete+0x2a/0x50 [i915]
[   48.173158]  i915_sw_fence_commit+0x21/0x30 [i915]
[   48.173968]  __i915_add_request+0x238/0x530 [i915]
[   48.174764]  __intel_autoenable_gt_powersave+0x8b/0xb0 [i915]
[   48.175549]  process_one_work+0x218/0x690
[   48.176318]  ? process_one_work+0x197/0x690
[   48.177183]  worker_thread+0x4e/0x4a0
[   48.178039]  kthread+0x10c/0x140
[   48.178878]  ? process_one_work+0x690/0x690
[   48.179718]  ? kthread_create_on_node+0x40/0x40
[   48.180568]  ret_from_fork+0x31/0x40
[   48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a
[   48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80

Reported-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Fixes: 349ab9192cc3 ("drm/i915/guc: Make wq_lock irq-safe")
Fixes: 67b807a89230 ("drm/i915: Delay disabling the user interrupt for breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Arkadiusz Hiler March 2, 2017, 2:57 p.m. UTC | #1
On Thu, Mar 02, 2017 at 02:53:23PM +0000, Chris Wilson wrote:
> __i915_guc_submit may be, despite my assertion, called from outside of
> an irq-safe spinlock so we need to use a full spin_lock_irqsave and not
> cheat using a spin_lock. (The initial notify callback from the completed
> fence is called before the spinlock is taken to wake up all waiters and
> call their callbacks.)
> 
> [   48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527!
> [   48.166617] invalid opcode: 0000 [#1] PREEMPT SMP
> [   48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915]
> [   48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G     U          4.10.0nightly-170302-guc_scrub+ #19
> [   48.166778] Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016
> [   48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915]
> [   48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000
> [   48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915]
> [   48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202
> [   48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201
> [   48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040
> [   48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000
> [   48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148
> [   48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660
> [   48.167156] FS:  0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000
> [   48.167195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0
> [   48.167257] Call Trace:
> [   48.168112]  ? trace_hardirqs_on+0xd/0x10
> [   48.168966]  ? _raw_spin_unlock_irqrestore+0x4a/0x80
> [   48.169831]  i915_guc_submit+0x1a/0x20 [i915]
> [   48.170680]  submit_notify+0x89/0xc0 [i915]
> [   48.171512]  __i915_sw_fence_complete+0x175/0x220 [i915]
> [   48.172340]  i915_sw_fence_complete+0x2a/0x50 [i915]
> [   48.173158]  i915_sw_fence_commit+0x21/0x30 [i915]
> [   48.173968]  __i915_add_request+0x238/0x530 [i915]
> [   48.174764]  __intel_autoenable_gt_powersave+0x8b/0xb0 [i915]
> [   48.175549]  process_one_work+0x218/0x690
> [   48.176318]  ? process_one_work+0x197/0x690
> [   48.177183]  worker_thread+0x4e/0x4a0
> [   48.178039]  kthread+0x10c/0x140
> [   48.178878]  ? process_one_work+0x690/0x690
> [   48.179718]  ? kthread_create_on_node+0x40/0x40
> [   48.180568]  ret_from_fork+0x31/0x40
> [   48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a
> [   48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80
> 
> Reported-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Fixes: 349ab9192cc3 ("drm/i915/guc: Make wq_lock irq-safe")
> Fixes: 67b807a89230 ("drm/i915: Delay disabling the user interrupt for breadcrumbs")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Tested-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Tvrtko Ursulin March 2, 2017, 3:01 p.m. UTC | #2
On 02/03/2017 16:53, Chris Wilson wrote:
> __i915_guc_submit may be, despite my assertion, called from outside of
> an irq-safe spinlock so we need to use a full spin_lock_irqsave and not
> cheat using a spin_lock. (The initial notify callback from the completed
> fence is called before the spinlock is taken to wake up all waiters and
> call their callbacks.)
>
> [   48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527!
> [   48.166617] invalid opcode: 0000 [#1] PREEMPT SMP
> [   48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915]
> [   48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G     U          4.10.0nightly-170302-guc_scrub+ #19
> [   48.166778] Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016
> [   48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915]
> [   48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000
> [   48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915]
> [   48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202
> [   48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201
> [   48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040
> [   48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000
> [   48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148
> [   48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660
> [   48.167156] FS:  0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000
> [   48.167195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0
> [   48.167257] Call Trace:
> [   48.168112]  ? trace_hardirqs_on+0xd/0x10
> [   48.168966]  ? _raw_spin_unlock_irqrestore+0x4a/0x80
> [   48.169831]  i915_guc_submit+0x1a/0x20 [i915]
> [   48.170680]  submit_notify+0x89/0xc0 [i915]
> [   48.171512]  __i915_sw_fence_complete+0x175/0x220 [i915]
> [   48.172340]  i915_sw_fence_complete+0x2a/0x50 [i915]
> [   48.173158]  i915_sw_fence_commit+0x21/0x30 [i915]
> [   48.173968]  __i915_add_request+0x238/0x530 [i915]
> [   48.174764]  __intel_autoenable_gt_powersave+0x8b/0xb0 [i915]
> [   48.175549]  process_one_work+0x218/0x690
> [   48.176318]  ? process_one_work+0x197/0x690
> [   48.177183]  worker_thread+0x4e/0x4a0
> [   48.178039]  kthread+0x10c/0x140
> [   48.178878]  ? process_one_work+0x690/0x690
> [   48.179718]  ? kthread_create_on_node+0x40/0x40
> [   48.180568]  ret_from_fork+0x31/0x40
> [   48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a
> [   48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80
>
> Reported-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Fixes: 349ab9192cc3 ("drm/i915/guc: Make wq_lock irq-safe")
> Fixes: 67b807a89230 ("drm/i915: Delay disabling the user interrupt for breadcrumbs")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_guc_submission.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 7b535a32fc27..beb38e30d0e9 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -515,6 +515,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq)
>  	unsigned int engine_id = engine->id;
>  	struct intel_guc *guc = &rq->i915->guc;
>  	struct i915_guc_client *client = guc->execbuf_client;
> +	unsigned long flags;
>  	int b_ret;
>
>  	/* WA to flush out the pending GMADR writes to ring buffer. */
> @@ -523,10 +524,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq)
>
>  	trace_i915_gem_request_in(rq, 0);
>
> -	/* We are always called with irqs disabled */
> -	GEM_BUG_ON(!irqs_disabled());
> -
> -	spin_lock(&client->wq_lock);
> +	spin_lock_irqsave(&client->wq_lock, flags);
>
>  	guc_wq_item_append(client, rq);
>  	b_ret = guc_ring_doorbell(client);
> @@ -539,7 +537,7 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq)
>  	guc->submissions[engine_id] += 1;
>  	guc->last_seqno[engine_id] = rq->global_seqno;
>
> -	spin_unlock(&client->wq_lock);
> +	spin_unlock_irqrestore(&client->wq_lock, flags);
>  }
>
>  static void i915_guc_submit(struct drm_i915_gem_request *rq)
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

P.S. For some reason all your recent e-mail where you copy me at 
@intel.com ends up in Outlook's junk folder (even though I turned off 
junk filtering!). When/if you remember please use @linux.intel.com instead.
Chris Wilson March 2, 2017, 6:51 p.m. UTC | #3
On Thu, Mar 02, 2017 at 04:18:05PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915/guc: Disable irq for __i915_guc_submit wq_lock
> URL   : https://patchwork.freedesktop.org/series/20564/
> State : success
> 
> == Summary ==
> 
> Series 20564v1 drm/i915/guc: Disable irq for __i915_guc_submit wq_lock
> https://patchwork.freedesktop.org/api/1.0/series/20564/revisions/1/mbox/
> 
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> 
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> 
> fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11 
> fi-bsw-n3050     total:278  pass:239  dwarn:0   dfail:0   fail:0   skip:39 
> fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19 
> fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20 
> fi-byt-j1900     total:278  pass:251  dwarn:0   dfail:0   fail:0   skip:27 
> fi-byt-n2820     total:278  pass:247  dwarn:0   dfail:0   fail:0   skip:31 
> fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16 
> fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16 
> fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50 
> fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18 
> fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18 
> fi-kbl-7500u     total:278  pass:259  dwarn:1   dfail:0   fail:0   skip:18 
> fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10 
> fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17 
> fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10 
> fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28 
> fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29 
> 
> a5783262a0dd79ed8a7d244352378a3db668c5b2 drm-tip: 2017y-03m-02d-14h-56m-29s UTC integration manifest
> 488c25f drm/i915/guc: Disable irq for __i915_guc_submit wq_lock

Pushed this fix along with a couple of other debugging aides reviewed
earlier today. Thanks,
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 7b535a32fc27..beb38e30d0e9 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -515,6 +515,7 @@  static void __i915_guc_submit(struct drm_i915_gem_request *rq)
 	unsigned int engine_id = engine->id;
 	struct intel_guc *guc = &rq->i915->guc;
 	struct i915_guc_client *client = guc->execbuf_client;
+	unsigned long flags;
 	int b_ret;
 
 	/* WA to flush out the pending GMADR writes to ring buffer. */
@@ -523,10 +524,7 @@  static void __i915_guc_submit(struct drm_i915_gem_request *rq)
 
 	trace_i915_gem_request_in(rq, 0);
 
-	/* We are always called with irqs disabled */
-	GEM_BUG_ON(!irqs_disabled());
-
-	spin_lock(&client->wq_lock);
+	spin_lock_irqsave(&client->wq_lock, flags);
 
 	guc_wq_item_append(client, rq);
 	b_ret = guc_ring_doorbell(client);
@@ -539,7 +537,7 @@  static void __i915_guc_submit(struct drm_i915_gem_request *rq)
 	guc->submissions[engine_id] += 1;
 	guc->last_seqno[engine_id] = rq->global_seqno;
 
-	spin_unlock(&client->wq_lock);
+	spin_unlock_irqrestore(&client->wq_lock, flags);
 }
 
 static void i915_guc_submit(struct drm_i915_gem_request *rq)