diff mbox series

drm/i915/guc: Suppress 'ignoring reset notification' message

Message ID 20230921182033.135448-1-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Suppress 'ignoring reset notification' message | expand

Commit Message

John Harrison Sept. 21, 2023, 6:20 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

If an active context has been banned (e.g. Ctrl+C killed) then it is
likely to be reset as part of evicting it from the hardware. That
results in a 'ignoring context reset notification: banned = 1'
message at info level. This confuses/concerns people and makes them
thing something has gone wrong when it hasn't.

There is already a debug level message with essentially the same
information. So drop the 'ignore' info level one and just add the
'ignore' flag to the debug level one instead (which will therefore not
appear by default but will still show up in CI runs).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Andi Shyti Sept. 25, 2023, 2:23 p.m. UTC | #1
Hi John,

On Thu, Sep 21, 2023 at 11:20:33AM -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> If an active context has been banned (e.g. Ctrl+C killed) then it is
> likely to be reset as part of evicting it from the hardware. That
> results in a 'ignoring context reset notification: banned = 1'
> message at info level. This confuses/concerns people and makes them
> thing something has gone wrong when it hasn't.

/thing/think/

> There is already a debug level message with essentially the same
> information. So drop the 'ignore' info level one and just add the
> 'ignore' flag to the debug level one instead (which will therefore not
> appear by default but will still show up in CI runs).
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> 

Andi
Tvrtko Ursulin Oct. 12, 2023, 10:21 a.m. UTC | #2
On 21/09/2023 19:20, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> If an active context has been banned (e.g. Ctrl+C killed) then it is
> likely to be reset as part of evicting it from the hardware. That
> results in a 'ignoring context reset notification: banned = 1'
> message at info level. This confuses/concerns people and makes them
> thing something has gone wrong when it hasn't.

Noticed the "confuses/concerns people" part while preparing the 6.7 pull 
request, and the fact there is no Fixes: tag. Is this something that 
would be worth sending to stable (manually and if yes could you do that 
please? If there were actual user bugs filed I guess.

Regards,

Tvrtko

> There is already a debug level message with essentially the same
> information. So drop the 'ignore' info level one and just add the
> 'ignore' flag to the debug level one instead (which will therefore not
> appear by default but will still show up in CI runs).
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index cabdc645fcddb..da7331346df1f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -4770,19 +4770,19 @@ static void guc_context_replay(struct intel_context *ce)
>   static void guc_handle_context_reset(struct intel_guc *guc,
>   				     struct intel_context *ce)
>   {
> +	bool capture = intel_context_is_schedulable(ce);
> +
>   	trace_intel_context_reset(ce);
>   
> -	guc_dbg(guc, "Got context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n",
> +	guc_dbg(guc, "%s context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n",
> +		capture ? "Got" : "Ignoring",
>   		ce->guc_id.id, ce->engine->name,
>   		str_yes_no(intel_context_is_exiting(ce)),
>   		str_yes_no(intel_context_is_banned(ce)));
>   
> -	if (likely(intel_context_is_schedulable(ce))) {
> +	if (capture) {
>   		capture_error_state(guc, ce);
>   		guc_context_replay(ce);
> -	} else {
> -		guc_info(guc, "Ignoring context reset notification of exiting context 0x%04X on %s",
> -			 ce->guc_id.id, ce->engine->name);
>   	}
>   }
>
John Harrison Oct. 12, 2023, 5:56 p.m. UTC | #3
On 10/12/2023 03:21, Tvrtko Ursulin wrote:
> On 21/09/2023 19:20, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> If an active context has been banned (e.g. Ctrl+C killed) then it is
>> likely to be reset as part of evicting it from the hardware. That
>> results in a 'ignoring context reset notification: banned = 1'
>> message at info level. This confuses/concerns people and makes them
>> thing something has gone wrong when it hasn't.
>
> Noticed the "confuses/concerns people" part while preparing the 6.7 
> pull request, and the fact there is no Fixes: tag. Is this something 
> that would be worth sending to stable (manually and if yes could you 
> do that please? If there were actual user bugs filed I guess.
>
No upstream bugs that I am aware of. There were very occasional 
concerned emails from internal test teams (E2E and such rather than 
kernel) and I think one internal bug was logged about it being seen when 
running some automated user interaction stress test thing (monkey runner 
or similar). So not sure that it is worth the effort of a backport to 
older trees. And you can't really call it a bug with an older patch. The 
message was never an error or even a warning, just an info level.

John.


> Regards,
>
> Tvrtko
>
>> There is already a debug level message with essentially the same
>> information. So drop the 'ignore' info level one and just add the
>> 'ignore' flag to the debug level one instead (which will therefore not
>> appear by default but will still show up in CI runs).
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index cabdc645fcddb..da7331346df1f 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -4770,19 +4770,19 @@ static void guc_context_replay(struct 
>> intel_context *ce)
>>   static void guc_handle_context_reset(struct intel_guc *guc,
>>                        struct intel_context *ce)
>>   {
>> +    bool capture = intel_context_is_schedulable(ce);
>> +
>>       trace_intel_context_reset(ce);
>>   -    guc_dbg(guc, "Got context reset notification: 0x%04X on %s, 
>> exiting = %s, banned = %s\n",
>> +    guc_dbg(guc, "%s context reset notification: 0x%04X on %s, 
>> exiting = %s, banned = %s\n",
>> +        capture ? "Got" : "Ignoring",
>>           ce->guc_id.id, ce->engine->name,
>>           str_yes_no(intel_context_is_exiting(ce)),
>>           str_yes_no(intel_context_is_banned(ce)));
>>   -    if (likely(intel_context_is_schedulable(ce))) {
>> +    if (capture) {
>>           capture_error_state(guc, ce);
>>           guc_context_replay(ce);
>> -    } else {
>> -        guc_info(guc, "Ignoring context reset notification of 
>> exiting context 0x%04X on %s",
>> -             ce->guc_id.id, ce->engine->name);
>>       }
>>   }
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index cabdc645fcddb..da7331346df1f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4770,19 +4770,19 @@  static void guc_context_replay(struct intel_context *ce)
 static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
+	bool capture = intel_context_is_schedulable(ce);
+
 	trace_intel_context_reset(ce);
 
-	guc_dbg(guc, "Got context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n",
+	guc_dbg(guc, "%s context reset notification: 0x%04X on %s, exiting = %s, banned = %s\n",
+		capture ? "Got" : "Ignoring",
 		ce->guc_id.id, ce->engine->name,
 		str_yes_no(intel_context_is_exiting(ce)),
 		str_yes_no(intel_context_is_banned(ce)));
 
-	if (likely(intel_context_is_schedulable(ce))) {
+	if (capture) {
 		capture_error_state(guc, ce);
 		guc_context_replay(ce);
-	} else {
-		guc_info(guc, "Ignoring context reset notification of exiting context 0x%04X on %s",
-			 ce->guc_id.id, ce->engine->name);
 	}
 }