Message ID | 20170412154843.16035-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Looks like intel_guc_reset had the ability to sleep under the > uncore spinlock since forever but it wasn't detected until the > recent changes annotated the wait for register with might_sleep. > > I have fixed it by removing holding of the uncore spinlock over > the call to gen6_hw_domain_reset, since I do not see that is > really needed. But there is always a possibility I am missing > some nasty detail so please double check. Afaik, no we are not using the uncore.lock here to serialise resets so yes we should be safe in dropping it. Will the guc be coming under the same hw semaphore as gen8 per-engine resets? -Chris
On 12/04/17 08:58, Chris Wilson wrote: > On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Looks like intel_guc_reset had the ability to sleep under the >> uncore spinlock since forever but it wasn't detected until the >> recent changes annotated the wait for register with might_sleep. >> >> I have fixed it by removing holding of the uncore spinlock over >> the call to gen6_hw_domain_reset, since I do not see that is >> really needed. But there is always a possibility I am missing >> some nasty detail so please double check. > > Afaik, no we are not using the uncore.lock here to serialise resets so > yes we should be safe in dropping it. > > Will the guc be coming under the same hw semaphore as gen8 per-engine > resets? A bit unrelated, but should intel_guc_reset be intel_reset_guc instead? Here we're trying to reset the microcontroller, not asking guc to do a reset.
On 12/04/17 09:22, Michel Thierry wrote: > On 12/04/17 08:58, Chris Wilson wrote: >> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> Looks like intel_guc_reset had the ability to sleep under the >>> uncore spinlock since forever but it wasn't detected until the >>> recent changes annotated the wait for register with might_sleep. >>> >>> I have fixed it by removing holding of the uncore spinlock over >>> the call to gen6_hw_domain_reset, since I do not see that is >>> really needed. But there is always a possibility I am missing >>> some nasty detail so please double check. >> >> Afaik, no we are not using the uncore.lock here to serialise resets so >> yes we should be safe in dropping it. >> >> Will the guc be coming under the same hw semaphore as gen8 per-engine >> resets? > > A bit unrelated, but should intel_guc_reset be intel_reset_guc instead? > Here we're trying to reset the microcontroller, not asking guc to do a > reset. Ping? Anyone unlucky enough to be using GuC submission should be seeing this warning when the firmware has to be reloaded (for example after any i-g-t hang test). I still think the function should be renamed to _reset_guc though, since it's the hw reseting the guc, not the other way around. Acked-by: Michel Thierry <michel.thierry@intel.com>
On 27/04/2017 19:14, Michel Thierry wrote: > On 12/04/17 09:22, Michel Thierry wrote: >> On 12/04/17 08:58, Chris Wilson wrote: >>> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Looks like intel_guc_reset had the ability to sleep under the >>>> uncore spinlock since forever but it wasn't detected until the >>>> recent changes annotated the wait for register with might_sleep. >>>> >>>> I have fixed it by removing holding of the uncore spinlock over >>>> the call to gen6_hw_domain_reset, since I do not see that is >>>> really needed. But there is always a possibility I am missing >>>> some nasty detail so please double check. >>> >>> Afaik, no we are not using the uncore.lock here to serialise resets so >>> yes we should be safe in dropping it. >>> >>> Will the guc be coming under the same hw semaphore as gen8 per-engine >>> resets? >> >> A bit unrelated, but should intel_guc_reset be intel_reset_guc instead? >> Here we're trying to reset the microcontroller, not asking guc to do a >> reset. > > Ping? > > Anyone unlucky enough to be using GuC submission should be seeing this > warning when the firmware has to be reloaded (for example after any > i-g-t hang test). > > I still think the function should be renamed to _reset_guc though, since > it's the hw reseting the guc, not the other way around. > > Acked-by: Michel Thierry <michel.thierry@intel.com> Thanks! Now just exercise restrain in suggesting bikesheds and if someone can provide an r-b we could merge this. ;) (To be read as - lets leave the renaming for a follow up work since this fix is not to blame for the objectionable name.) Regards, Tvrtko
On 27/04/17 11:20, Tvrtko Ursulin wrote: > > On 27/04/2017 19:14, Michel Thierry wrote: >> On 12/04/17 09:22, Michel Thierry wrote: >>> On 12/04/17 08:58, Chris Wilson wrote: >>>> On Wed, Apr 12, 2017 at 04:48:42PM +0100, Tvrtko Ursulin wrote: >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> >>>>> Looks like intel_guc_reset had the ability to sleep under the >>>>> uncore spinlock since forever but it wasn't detected until the >>>>> recent changes annotated the wait for register with might_sleep. >>>>> >>>>> I have fixed it by removing holding of the uncore spinlock over >>>>> the call to gen6_hw_domain_reset, since I do not see that is >>>>> really needed. But there is always a possibility I am missing >>>>> some nasty detail so please double check. >>>> >>>> Afaik, no we are not using the uncore.lock here to serialise resets so >>>> yes we should be safe in dropping it. >>>> >>>> Will the guc be coming under the same hw semaphore as gen8 per-engine >>>> resets? >>> >>> A bit unrelated, but should intel_guc_reset be intel_reset_guc instead? >>> Here we're trying to reset the microcontroller, not asking guc to do a >>> reset. >> >> Ping? >> >> Anyone unlucky enough to be using GuC submission should be seeing this >> warning when the firmware has to be reloaded (for example after any >> i-g-t hang test). >> >> I still think the function should be renamed to _reset_guc though, since >> it's the hw reseting the guc, not the other way around. >> >> Acked-by: Michel Thierry <michel.thierry@intel.com> > > Thanks! Now just exercise restrain in suggesting bikesheds and if > someone can provide an r-b we could merge this. ;) (To be read as - lets > leave the renaming for a follow up work since this fix is not to blame > for the objectionable name.) > > Regards, > _Invoking GuC experts_ Agreed, and since I'm the one that will tell the guc to perform a reset, I can include the bikeshed in my patches.
On ke, 2017-04-12 at 16:48 +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Looks like intel_guc_reset had the ability to sleep under the > uncore spinlock since forever but it wasn't detected until the > recent changes annotated the wait for register with might_sleep. > > I have fixed it by removing holding of the uncore spinlock over > the call to gen6_hw_domain_reset, since I do not see that is > really needed. But there is always a possibility I am missing > some nasty detail so please double check. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> > Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Name and location of intel_guc_reset is bad, yes. Regards, Joonas
On 12/04/2017 18:16, Patchwork wrote: > == Series Details == > > Series: series starting with [1/2] drm/i915/guc: Fix sleep under spinlock during reset > URL : https://patchwork.freedesktop.org/series/22937/ > State : failure > > == Summary == > > Series 22937v1 Series without cover letter > https://patchwork.freedesktop.org/api/1.0/series/22937/revisions/1/mbox/ > > Test gem_exec_flush: > Subgroup basic-batch-kernel-default-uc: > fail -> PASS (fi-snb-2600) fdo#100007 > Test kms_pipe_crc_basic: > Subgroup read-crc-pipe-a-frame-sequence: > pass -> FAIL (fi-skl-6770hq) Re-opening: https://bugs.freedesktop.org/show_bug.cgi?id=99788 kms_pipe_crc_basic/read-crc-pipe-b-frame-sequence: (kms_pipe_crc_basic:8766) CRITICAL: Failed assertion: crcs[j].frame + 1 == crcs[j + 1].frame (kms_pipe_crc_basic:8766) CRITICAL: error: 20240 != 20241 > > fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007 > > fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time:430s > fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time:423s > fi-bsw-n3050 total:278 pass:242 dwarn:0 dfail:0 fail:0 skip:36 time:585s > fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time:487s > fi-bxt-t5700 total:278 pass:258 dwarn:0 dfail:0 fail:0 skip:20 time:536s > fi-byt-j1900 total:278 pass:254 dwarn:0 dfail:0 fail:0 skip:24 time:481s > fi-byt-n2820 total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:481s > fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:413s > fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:407s > fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time:423s > fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:491s > fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:461s > fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:442s > fi-kbl-7560u total:278 pass:267 dwarn:1 dfail:0 fail:0 skip:10 time:554s > fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:437s > fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time:570s > fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time:449s > fi-skl-6770hq total:278 pass:267 dwarn:0 dfail:0 fail:1 skip:10 time:483s > fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time:412s > fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:535s > fi-snb-2600 total:278 pass:249 dwarn:0 dfail:0 fail:0 skip:29 time:413s > > 57188aaf3b96d08e4095755fda0e577b1e0d8c60 drm-tip: 2017y-04m-12d-16h-12m-10s UTC integration manifest > 54e2b93 guc run > e066fdc drm/i915/guc: Fix sleep under spinlock during reset Pushed, thanks for the review! Regards, Tvrtko
Hi, > -----Original Message----- > On 12/04/2017 18:16, Patchwork wrote: > > == Series Details == > > > > Series: series starting with [1/2] drm/i915/guc: Fix sleep under spinlock during > reset > > URL : https://patchwork.freedesktop.org/series/22937/ > > State : failure > > > > == Summary == > > > > Series 22937v1 Series without cover letter > > https://patchwork.freedesktop.org/api/1.0/series/22937/revisions/1/mbo > > x/ > > > > Test gem_exec_flush: > > Subgroup basic-batch-kernel-default-uc: > > fail -> PASS (fi-snb-2600) fdo#100007 > > Test kms_pipe_crc_basic: > > Subgroup read-crc-pipe-a-frame-sequence: > > pass -> FAIL (fi-skl-6770hq) > > Re-opening: Thanks Re-opened for real now and marked for CI > https://bugs.freedesktop.org/show_bug.cgi?id=99788 > > kms_pipe_crc_basic/read-crc-pipe-b-frame-sequence: > > (kms_pipe_crc_basic:8766) CRITICAL: Failed assertion: crcs[j].frame + 1 ==crcs[j + 1].frame > (kms_pipe_crc_basic:8766) CRITICAL: error: 20240 != 20241 Jani Saarinen Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 0cd56bf00650..a10d8863b0a9 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1532,7 +1532,7 @@ static int gen6_hw_domain_reset(struct drm_i915_private *dev_priv, */ __raw_i915_write32(dev_priv, GEN6_GDRST, hw_domain_mask); - /* Spin waiting for the device to ack the reset requests */ + /* Wait for the device to ack the reset requests */ return intel_wait_for_register_fw(dev_priv, GEN6_GDRST, hw_domain_mask, 0, 500); @@ -1779,17 +1779,12 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv) int intel_guc_reset(struct drm_i915_private *dev_priv) { int ret; - unsigned long irqflags; if (!HAS_GUC(dev_priv)) return -EINVAL; intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); - spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); - ret = gen6_hw_domain_reset(dev_priv, GEN9_GRDOM_GUC); - - spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags); intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); return ret;