Message ID | 20170818172342.7282-1-michel.thierry@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Michel Thierry (2017-08-18 18:23:42) > The corruption in CSB mmio reads we were seeing has been tracked down to > incorrectly touching forcewake of all domains, following an engine reset. > It is still a mistery why we only catched this in Broxton, since it > could happen in any platform. > > With that fix already merged, commit 4055dc75d6b5 ("drm/i915: Stop > touching forcewake following a gen6+ engine reset"), lets try to enable > per-engine resets in Broxton one more time. > > This reverts commit f188258bde0f ("drm/i915: Disable per-engine reset for > Broxton"). > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Michel Thierry <michel.thierry@intel.com> My bxt has survived about 72 hours of hang testing, which is far more than it was able to previously. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
Quoting Patchwork (2017-08-18 18:57:52) > == Series Details == > > Series: drm/i915: Re-enable per-engine reset for Broxton > URL : https://patchwork.freedesktop.org/series/29011/ > State : success > > == Summary == > > Series 29011v1 drm/i915: Re-enable per-engine reset for Broxton > https://patchwork.freedesktop.org/api/1.0/series/29011/revisions/1/mbox/ > > Test kms_flip: > Subgroup basic-flip-vs-modeset: > pass -> SKIP (fi-skl-x1585l) fdo#101781 > > fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781 And pushed. That was a good find, that I am sure will have nipped some very troublesome issues in the bud. Thanks, -Chris
Quoting Chris Wilson (2017-08-21 15:55:34) > Quoting Michel Thierry (2017-08-18 18:23:42) > > The corruption in CSB mmio reads we were seeing has been tracked down to > > incorrectly touching forcewake of all domains, following an engine reset. > > It is still a mistery why we only catched this in Broxton, since it > > could happen in any platform. > > > > With that fix already merged, commit 4055dc75d6b5 ("drm/i915: Stop > > touching forcewake following a gen6+ engine reset"), lets try to enable > > per-engine resets in Broxton one more time. > > > > This reverts commit f188258bde0f ("drm/i915: Disable per-engine reset for > > Broxton"). > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Michel Thierry <michel.thierry@intel.com> > > My bxt has survived about 72 hours of hang testing, which is far more > than it was able to previously. > > Acked-by: Chris Wilson <chris@chris-wilson.co.uk> > Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Uh oh, seemingly just hit it again... -Chris
On 05/09/17 06:57, Chris Wilson wrote: > Quoting Chris Wilson (2017-08-21 15:55:34) >> Quoting Michel Thierry (2017-08-18 18:23:42) >>> The corruption in CSB mmio reads we were seeing has been tracked down to >>> incorrectly touching forcewake of all domains, following an engine reset. >>> It is still a mistery why we only catched this in Broxton, since it >>> could happen in any platform. >>> >>> With that fix already merged, commit 4055dc75d6b5 ("drm/i915: Stop >>> touching forcewake following a gen6+ engine reset"), lets try to enable >>> per-engine resets in Broxton one more time. >>> >>> This reverts commit f188258bde0f ("drm/i915: Disable per-engine reset for >>> Broxton"). >>> >>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> >> >> My bxt has survived about 72 hours of hang testing, which is far more >> than it was able to previously. >> >> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> >> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> > > Uh oh, seemingly just hit it again... Was it because the CSBs were 0's? A couple of times I saw a spurious CSB event (0x12 - preempted & complete), after an already 'complete' event. That was also hitting the assert because the ctx-id would be 'wrong'. I think we could ignore the 0x12 event and it will continue.
Quoting Michel Thierry (2017-09-06 16:25:06) > On 05/09/17 06:57, Chris Wilson wrote: > > Quoting Chris Wilson (2017-08-21 15:55:34) > >> Quoting Michel Thierry (2017-08-18 18:23:42) > >>> The corruption in CSB mmio reads we were seeing has been tracked down to > >>> incorrectly touching forcewake of all domains, following an engine reset. > >>> It is still a mistery why we only catched this in Broxton, since it > >>> could happen in any platform. > >>> > >>> With that fix already merged, commit 4055dc75d6b5 ("drm/i915: Stop > >>> touching forcewake following a gen6+ engine reset"), lets try to enable > >>> per-engine resets in Broxton one more time. > >>> > >>> This reverts commit f188258bde0f ("drm/i915: Disable per-engine reset for > >>> Broxton"). > >>> > >>> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> > >> > >> My bxt has survived about 72 hours of hang testing, which is far more > >> than it was able to previously. > >> > >> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> > >> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Uh oh, seemingly just hit it again... > > Was it because the CSBs were 0's? > > A couple of times I saw a spurious CSB event (0x12 - preempted & > complete), after an already 'complete' event. That was also hitting the > assert because the ctx-id would be 'wrong'. I think we could ignore the > 0x12 event and it will continue. Hmm, that 0x12 event has never triggered the invalid ctx id yet for me (but that's probably just a matter of workload), it always hits the too-many-switches. Sadly we can't just continue on after that as the hw is completely out-of-sync with our submissions, and the only way to recover appears to be a gpu reset. Anyway, haven't yet dug back into the bang, just reaffirmed that disabling per-engine resets gives me a ickle@broxton:~$ uptime 16:55:31 up 1 day, 2:01, 2 users, load average: 3.66, 3.38, 3.3 so far of drv_selftest --r live_hanghceck -Chris
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 6f87f1fe9cef..aa48a40d72eb 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -400,7 +400,6 @@ static const struct intel_device_info intel_broxton_info = { GEN9_LP_FEATURES, .platform = INTEL_BROXTON, .ddb_size = 512, - .has_reset_engine = false, }; static const struct intel_device_info intel_geminilake_info = {
The corruption in CSB mmio reads we were seeing has been tracked down to incorrectly touching forcewake of all domains, following an engine reset. It is still a mistery why we only catched this in Broxton, since it could happen in any platform. With that fix already merged, commit 4055dc75d6b5 ("drm/i915: Stop touching forcewake following a gen6+ engine reset"), lets try to enable per-engine resets in Broxton one more time. This reverts commit f188258bde0f ("drm/i915: Disable per-engine reset for Broxton"). Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> --- drivers/gpu/drm/i915/i915_pci.c | 1 - 1 file changed, 1 deletion(-)