Message ID | 20230921182033.135448-1-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/guc: Suppress 'ignoring reset notification' message | expand |
Hi John, On Thu, Sep 21, 2023 at 11:20:33AM -0700, John.C.Harrison@Intel.com wrote: > From: John Harrison <John.C.Harrison@Intel.com> > > If an active context has been banned (e.g. Ctrl+C killed) then it is > likely to be reset as part of evicting it from the hardware. That > results in a 'ignoring context reset notification: banned = 1' > message at info level. This confuses/concerns people and makes them > thing something has gone wrong when it hasn't. /thing/think/ > There is already a debug level message with essentially the same > information. So drop the 'ignore' info level one and just add the > 'ignore' flag to the debug level one instead (which will therefore not > appear by default but will still show up in CI runs). > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Andi
On 21/09/2023 19:20, John.C.Harrison@Intel.com wrote: > From: John Harrison <John.C.Harrison@Intel.com> > > If an active context has been banned (e.g. Ctrl+C killed) then it is > likely to be reset as part of evicting it from the hardware. That > results in a 'ignoring context reset notification: banned = 1' > message at info level. This confuses/concerns people and makes them > thing something has gone wrong when it hasn't. Noticed the "confuses/concerns people" part while preparing the 6.7 pull request, and the fact there is no Fixes: tag. Is this something that would be worth sending to stable (manually and if yes could you do that please? If there were actual user bugs filed I guess. Regards, Tvrtko > There is already a debug level message with essentially the same > information. So drop the 'ignore' info level one and just add the > 'ignore' flag to the debug level one instead (which will therefore not > appear by default but will still show up in CI runs). > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index cabdc645fcddb..da7331346df1f 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -4770,19 +4770,19 @@ static void guc_context_replay(struct intel_context *ce) > static void guc_handle_context_reset(struct intel_guc *guc, > struct intel_context *ce) > { > + bool capture = intel_context_is_schedulable(ce); > + > trace_intel_context_reset(ce); > > - guc_dbg(guc, "Got context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n", > + guc_dbg(guc, "%s context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n", > + capture ? "Got" : "Ignoring", > ce->guc_id.id, ce->engine->name, > str_yes_no(intel_context_is_exiting(ce)), > str_yes_no(intel_context_is_banned(ce))); > > - if (likely(intel_context_is_schedulable(ce))) { > + if (capture) { > capture_error_state(guc, ce); > guc_context_replay(ce); > - } else { > - guc_info(guc, "Ignoring context reset notification of exiting context 0x%04X on %s", > - ce->guc_id.id, ce->engine->name); > } > } >
On 10/12/2023 03:21, Tvrtko Ursulin wrote: > On 21/09/2023 19:20, John.C.Harrison@Intel.com wrote: >> From: John Harrison <John.C.Harrison@Intel.com> >> >> If an active context has been banned (e.g. Ctrl+C killed) then it is >> likely to be reset as part of evicting it from the hardware. That >> results in a 'ignoring context reset notification: banned = 1' >> message at info level. This confuses/concerns people and makes them >> thing something has gone wrong when it hasn't. > > Noticed the "confuses/concerns people" part while preparing the 6.7 > pull request, and the fact there is no Fixes: tag. Is this something > that would be worth sending to stable (manually and if yes could you > do that please? If there were actual user bugs filed I guess. > No upstream bugs that I am aware of. There were very occasional concerned emails from internal test teams (E2E and such rather than kernel) and I think one internal bug was logged about it being seen when running some automated user interaction stress test thing (monkey runner or similar). So not sure that it is worth the effort of a backport to older trees. And you can't really call it a bug with an older patch. The message was never an error or even a warning, just an info level. John. > Regards, > > Tvrtko > >> There is already a debug level message with essentially the same >> information. So drop the 'ignore' info level one and just add the >> 'ignore' flag to the debug level one instead (which will therefore not >> appear by default but will still show up in CI runs). >> >> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> >> --- >> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >> index cabdc645fcddb..da7331346df1f 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c >> @@ -4770,19 +4770,19 @@ static void guc_context_replay(struct >> intel_context *ce) >> static void guc_handle_context_reset(struct intel_guc *guc, >> struct intel_context *ce) >> { >> + bool capture = intel_context_is_schedulable(ce); >> + >> trace_intel_context_reset(ce); >> - guc_dbg(guc, "Got context reset notification: 0x%04X on %s, >> exiting = %s, banned = %s\n", >> + guc_dbg(guc, "%s context reset notification: 0x%04X on %s, >> exiting = %s, banned = %s\n", >> + capture ? "Got" : "Ignoring", >> ce->guc_id.id, ce->engine->name, >> str_yes_no(intel_context_is_exiting(ce)), >> str_yes_no(intel_context_is_banned(ce))); >> - if (likely(intel_context_is_schedulable(ce))) { >> + if (capture) { >> capture_error_state(guc, ce); >> guc_context_replay(ce); >> - } else { >> - guc_info(guc, "Ignoring context reset notification of >> exiting context 0x%04X on %s", >> - ce->guc_id.id, ce->engine->name); >> } >> }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index cabdc645fcddb..da7331346df1f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4770,19 +4770,19 @@ static void guc_context_replay(struct intel_context *ce) static void guc_handle_context_reset(struct intel_guc *guc, struct intel_context *ce) { + bool capture = intel_context_is_schedulable(ce); + trace_intel_context_reset(ce); - guc_dbg(guc, "Got context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n", + guc_dbg(guc, "%s context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n", + capture ? "Got" : "Ignoring", ce->guc_id.id, ce->engine->name, str_yes_no(intel_context_is_exiting(ce)), str_yes_no(intel_context_is_banned(ce))); - if (likely(intel_context_is_schedulable(ce))) { + if (capture) { capture_error_state(guc, ce); guc_context_replay(ce); - } else { - guc_info(guc, "Ignoring context reset notification of exiting context 0x%04X on %s", - ce->guc_id.id, ce->engine->name); } }