Message ID | 20200701150721.423630-1-michal@hardline.pl (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Reboot CI if we get wedged during driver init | expand |
Quoting Michał Winiarski (2020-07-01 16:07:21) > From: Michał Winiarski <michal.winiarski@intel.com> > > Getting wedged device on driver init is pretty much unrecoverable. > Since we're running verious scenarios that may potentially hit this in > CI (module reload / selftests / hotunplug), and if it happens, it means > that we can't trust any subsequent CI results, we should just apply the > taint to let the CI know that it should reboot (CI checks taint between > test runs). Ok, we treat WEDGED_ON_INIT as non-recoverable [as opposed to the less wedged WEDGED]. > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
Quoting Michał Winiarski (2020-07-01 16:07:21) > From: Michał Winiarski <michal.winiarski@intel.com> > > Getting wedged device on driver init is pretty much unrecoverable. > Since we're running verious scenarios that may potentially hit this in > CI (module reload / selftests / hotunplug), and if it happens, it means > that we can't trust any subsequent CI results, we should just apply the > taint to let the CI know that it should reboot (CI checks taint between > test runs). > > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Petri Latvala <petri.latvala@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index 0156f1f5c736..d27e8bb7d550 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt) > I915_WEDGED_ON_INIT); > intel_gt_set_wedged(gt); > set_bit(I915_WEDGED_ON_INIT, >->reset.flags); > + Ah, we don't say around here that this WEDGED_ON_INIT is non-recoverable, could you please add a comment to that effect? > + add_taint_for_CI(TAINT_WARN); > } > > void intel_gt_init_reset(struct intel_gt *gt) > -- > 2.27.0 >
On 01.07.2020 17:17, Chris Wilson wrote: > Quoting Michał Winiarski (2020-07-01 16:07:21) >> From: Michał Winiarski <michal.winiarski@intel.com> >> >> Getting wedged device on driver init is pretty much unrecoverable. >> Since we're running verious scenarios that may potentially hit this in typo >> CI (module reload / selftests / hotunplug), and if it happens, it means >> that we can't trust any subsequent CI results, we should just apply the >> taint to let the CI know that it should reboot (CI checks taint between >> test runs). >> >> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Petri Latvala <petri.latvala@intel.com> >> --- >> drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c >> index 0156f1f5c736..d27e8bb7d550 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c >> @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt) >> I915_WEDGED_ON_INIT); >> intel_gt_set_wedged(gt); >> set_bit(I915_WEDGED_ON_INIT, >->reset.flags); >> + > > Ah, we don't say around here that this WEDGED_ON_INIT is non-recoverable, > could you please add a comment to that effect? > Such comment is already in WEDGED_ON_INIT description, but repeating it will definitely help >> + add_taint_for_CI(TAINT_WARN); btw, today we are tainting kernel for CI silently and from different places, so maybe it is worth to add there some debug log with __builtin_return_address() for better diagnose why we stopped CI? with typo/comment fixed, Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> >> } >> >> void intel_gt_init_reset(struct intel_gt *gt) >> -- >> 2.27.0 >> > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx >
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 0156f1f5c736..d27e8bb7d550 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt) I915_WEDGED_ON_INIT); intel_gt_set_wedged(gt); set_bit(I915_WEDGED_ON_INIT, >->reset.flags); + + add_taint_for_CI(TAINT_WARN); } void intel_gt_init_reset(struct intel_gt *gt)