diff mbox

[1/2] drm/i915/guc: Fix sleep under spinlock during reset

Message ID 20170412154843.16035-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin April 12, 2017, 3:48 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Looks like intel_guc_reset had the ability to sleep under the
uncore spinlock since forever but it wasn't detected until the
recent changes annotated the wait for register with might_sleep.

I have fixed it by removing holding of the uncore spinlock over
the call to gen6_hw_domain_reset, since I do not see that is
really needed. But there is always a possibility I am missing
some nasty detail so please double check.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
--
A random selection of Cc for people who I though were recently
engaged in GuC code.
---
 drivers/gpu/drm/i915/intel_uncore.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Chris Wilson April 12, 2017, 3:58 p.m. UTC | #1
On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Looks like intel_guc_reset had the ability to sleep under the
> uncore spinlock since forever but it wasn't detected until the
> recent changes annotated the wait for register with might_sleep.
> 
> I have fixed it by removing holding of the uncore spinlock over
> the call to gen6_hw_domain_reset, since I do not see that is
> really needed. But there is always a possibility I am missing
> some nasty detail so please double check.

Afaik, no we are not using the uncore.lock here to serialise resets so
yes we should be safe in dropping it.

Will the guc be coming under the same hw semaphore as gen8 per-engine
resets?
-Chris
Michel Thierry April 12, 2017, 4:22 p.m. UTC | #2
On 12/04/17 08:58, Chris Wilson wrote:
> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Looks like intel_guc_reset had the ability to sleep under the
>> uncore spinlock since forever but it wasn't detected until the
>> recent changes annotated the wait for register with might_sleep.
>>
>> I have fixed it by removing holding of the uncore spinlock over
>> the call to gen6_hw_domain_reset, since I do not see that is
>> really needed. But there is always a possibility I am missing
>> some nasty detail so please double check.
>
> Afaik, no we are not using the uncore.lock here to serialise resets so
> yes we should be safe in dropping it.
>
> Will the guc be coming under the same hw semaphore as gen8 per-engine
> resets?

A bit unrelated, but should intel_guc_reset be intel_reset_guc instead?
Here we're trying to reset the microcontroller, not asking guc to do a 
reset.
Michel Thierry April 27, 2017, 6:14 p.m. UTC | #3
On 12/04/17 09:22, Michel Thierry wrote:
> On 12/04/17 08:58, Chris Wilson wrote:
>> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Looks like intel_guc_reset had the ability to sleep under the
>>> uncore spinlock since forever but it wasn't detected until the
>>> recent changes annotated the wait for register with might_sleep.
>>>
>>> I have fixed it by removing holding of the uncore spinlock over
>>> the call to gen6_hw_domain_reset, since I do not see that is
>>> really needed. But there is always a possibility I am missing
>>> some nasty detail so please double check.
>>
>> Afaik, no we are not using the uncore.lock here to serialise resets so
>> yes we should be safe in dropping it.
>>
>> Will the guc be coming under the same hw semaphore as gen8 per-engine
>> resets?
>
> A bit unrelated, but should intel_guc_reset be intel_reset_guc instead?
> Here we're trying to reset the microcontroller, not asking guc to do a
> reset.

Ping?

Anyone unlucky enough to be using GuC submission should be seeing this 
warning when the firmware has to be reloaded (for example after any 
i-g-t hang test).

I still think the function should be renamed to _reset_guc though, since 
it's the hw reseting the guc, not the other way around.

Acked-by: Michel Thierry <michel.thierry@intel.com>
Tvrtko Ursulin April 27, 2017, 6:20 p.m. UTC | #4
On 27/04/2017 19:14, Michel Thierry wrote:
> On 12/04/17 09:22, Michel Thierry wrote:
>> On 12/04/17 08:58, Chris Wilson wrote:
>>> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Looks like intel_guc_reset had the ability to sleep under the
>>>> uncore spinlock since forever but it wasn't detected until the
>>>> recent changes annotated the wait for register with might_sleep.
>>>>
>>>> I have fixed it by removing holding of the uncore spinlock over
>>>> the call to gen6_hw_domain_reset, since I do not see that is
>>>> really needed. But there is always a possibility I am missing
>>>> some nasty detail so please double check.
>>>
>>> Afaik, no we are not using the uncore.lock here to serialise resets so
>>> yes we should be safe in dropping it.
>>>
>>> Will the guc be coming under the same hw semaphore as gen8 per-engine
>>> resets?
>>
>> A bit unrelated, but should intel_guc_reset be intel_reset_guc instead?
>> Here we're trying to reset the microcontroller, not asking guc to do a
>> reset.
>
> Ping?
>
> Anyone unlucky enough to be using GuC submission should be seeing this
> warning when the firmware has to be reloaded (for example after any
> i-g-t hang test).
>
> I still think the function should be renamed to _reset_guc though, since
> it's the hw reseting the guc, not the other way around.
>
> Acked-by: Michel Thierry <michel.thierry@intel.com>

Thanks! Now just exercise restrain in suggesting bikesheds and if 
someone can provide an r-b we could merge this. ;) (To be read as - lets 
leave the renaming for a follow up work since this fix is not to blame 
for the objectionable name.)

Regards,

Tvrtko
Michel Thierry April 27, 2017, 6:25 p.m. UTC | #5
On 27/04/17 11:20, Tvrtko Ursulin wrote:
>
> On 27/04/2017 19:14, Michel Thierry wrote:
>> On 12/04/17 09:22, Michel Thierry wrote:
>>> On 12/04/17 08:58, Chris Wilson wrote:
>>>> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> Looks like intel_guc_reset had the ability to sleep under the
>>>>> uncore spinlock since forever but it wasn't detected until the
>>>>> recent changes annotated the wait for register with might_sleep.
>>>>>
>>>>> I have fixed it by removing holding of the uncore spinlock over
>>>>> the call to gen6_hw_domain_reset, since I do not see that is
>>>>> really needed. But there is always a possibility I am missing
>>>>> some nasty detail so please double check.
>>>>
>>>> Afaik, no we are not using the uncore.lock here to serialise resets so
>>>> yes we should be safe in dropping it.
>>>>
>>>> Will the guc be coming under the same hw semaphore as gen8 per-engine
>>>> resets?
>>>
>>> A bit unrelated, but should intel_guc_reset be intel_reset_guc instead?
>>> Here we're trying to reset the microcontroller, not asking guc to do a
>>> reset.
>>
>> Ping?
>>
>> Anyone unlucky enough to be using GuC submission should be seeing this
>> warning when the firmware has to be reloaded (for example after any
>> i-g-t hang test).
>>
>> I still think the function should be renamed to _reset_guc though, since
>> it's the hw reseting the guc, not the other way around.
>>
>> Acked-by: Michel Thierry <michel.thierry@intel.com>
>
> Thanks! Now just exercise restrain in suggesting bikesheds and if
> someone can provide an r-b we could merge this. ;) (To be read as - lets
> leave the renaming for a follow up work since this fix is not to blame
> for the objectionable name.)
>
> Regards,
>

_Invoking GuC experts_

Agreed, and since I'm the one that will tell the guc to perform a reset, 
I can include the bikeshed in my patches.
Joonas Lahtinen April 28, 2017, 7:30 a.m. UTC | #6
On ke, 2017-04-12 at 16:48 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Looks like intel_guc_reset had the ability to sleep under the
> uncore spinlock since forever but it wasn't detected until the
> recent changes annotated the wait for register with might_sleep.
> 
> I have fixed it by removing holding of the uncore spinlock over
> the call to gen6_hw_domain_reset, since I do not see that is
> really needed. But there is always a possibility I am missing
> some nasty detail so please double check.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Oscar Mateo <oscar.mateo@intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Name and location of intel_guc_reset is bad, yes.

Regards, Joonas
Tvrtko Ursulin April 28, 2017, 7:59 a.m. UTC | #7
On 12/04/2017 18:16, Patchwork wrote:
> == Series Details ==
>
> Series: series starting with [1/2] drm/i915/guc: Fix sleep under spinlock during reset
> URL   : https://patchwork.freedesktop.org/series/22937/
> State : failure
>
> == Summary ==
>
> Series 22937v1 Series without cover letter
> https://patchwork.freedesktop.org/api/1.0/series/22937/revisions/1/mbox/
>
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> Test kms_pipe_crc_basic:
>         Subgroup read-crc-pipe-a-frame-sequence:
>                 pass       -> FAIL       (fi-skl-6770hq)

Re-opening:

https://bugs.freedesktop.org/show_bug.cgi?id=99788

kms_pipe_crc_basic/read-crc-pipe-b-frame-sequence:

(kms_pipe_crc_basic:8766) CRITICAL: Failed assertion: crcs[j].frame + 1 
== crcs[j + 1].frame
(kms_pipe_crc_basic:8766) CRITICAL: error: 20240 != 20241

>
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
>
> fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:430s
> fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:423s
> fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:585s
> fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:487s
> fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20  time:536s
> fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:481s
> fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:481s
> fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:413s
> fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:407s
> fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:423s
> fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:491s
> fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:461s
> fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:442s
> fi-kbl-7560u     total:278  pass:267  dwarn:1   dfail:0   fail:0   skip:10  time:554s
> fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:437s
> fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:570s
> fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:449s
> fi-skl-6770hq    total:278  pass:267  dwarn:0   dfail:0   fail:1   skip:10  time:483s
> fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:412s
> fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:535s
> fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29  time:413s
>
> 57188aaf3b96d08e4095755fda0e577b1e0d8c60 drm-tip: 2017y-04m-12d-16h-12m-10s UTC integration manifest
> 54e2b93 guc run
> e066fdc drm/i915/guc: Fix sleep under spinlock during reset

Pushed, thanks for the review!

Regards,

Tvrtko
Saarinen, Jani April 28, 2017, 8:26 a.m. UTC | #8
Hi, 
> -----Original Message-----

> On 12/04/2017 18:16, Patchwork wrote:

> > == Series Details ==

> >

> > Series: series starting with [1/2] drm/i915/guc: Fix sleep under spinlock during

> reset

> > URL   : https://patchwork.freedesktop.org/series/22937/ 

> > State : failure

> >

> > == Summary ==

> >

> > Series 22937v1 Series without cover letter

> > https://patchwork.freedesktop.org/api/1.0/series/22937/revisions/1/mbo

> > x/

> >

> > Test gem_exec_flush:

> >         Subgroup basic-batch-kernel-default-uc:

> >                 fail       -> PASS       (fi-snb-2600) fdo#100007

> > Test kms_pipe_crc_basic:

> >         Subgroup read-crc-pipe-a-frame-sequence:

> >                 pass       -> FAIL       (fi-skl-6770hq)

> 

> Re-opening:

Thanks 
Re-opened for real now and marked for CI
 
> https://bugs.freedesktop.org/show_bug.cgi?id=99788

> 

> kms_pipe_crc_basic/read-crc-pipe-b-frame-sequence:

> 

> (kms_pipe_crc_basic:8766) CRITICAL: Failed assertion: crcs[j].frame + 1 ==crcs[j + 1].frame

> (kms_pipe_crc_basic:8766) CRITICAL: error: 20240 != 20241



Jani Saarinen
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 0cd56bf00650..a10d8863b0a9 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1532,7 +1532,7 @@  static int gen6_hw_domain_reset(struct drm_i915_private *dev_priv,
 	 */
 	__raw_i915_write32(dev_priv, GEN6_GDRST, hw_domain_mask);
 
-	/* Spin waiting for the device to ack the reset requests */
+	/* Wait for the device to ack the reset requests */
 	return intel_wait_for_register_fw(dev_priv,
 					  GEN6_GDRST, hw_domain_mask, 0,
 					  500);
@@ -1779,17 +1779,12 @@  bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
 int intel_guc_reset(struct drm_i915_private *dev_priv)
 {
 	int ret;
-	unsigned long irqflags;
 
 	if (!HAS_GUC(dev_priv))
 		return -EINVAL;
 
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
-
 	ret = gen6_hw_domain_reset(dev_priv, GEN9_GRDOM_GUC);
-
-	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
 	return ret;