diff mbox series

drm/i915/guc: Reduce verbosity on log overflows

Message ID 20190617100917.13110-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Reduce verbosity on log overflows | expand

Commit Message

Chris Wilson June 17, 2019, 10:09 a.m. UTC
If the user is clearing the log buffer too slowly, we overflow. As this
is an expected condition, and the driver tries to handle it, reduce the
error message down to a notice.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Tvrtko Ursulin June 17, 2019, 2:21 p.m. UTC | #1
On 17/06/2019 11:09, Chris Wilson wrote:
> If the user is clearing the log buffer too slowly, we overflow. As this
> is an expected condition, and the driver tries to handle it, reduce the
> error message down to a notice.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_log.c b/drivers/gpu/drm/i915/intel_guc_log.c
> index bf1446629703..e3b83ecb90b5 100644
> --- a/drivers/gpu/drm/i915/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/intel_guc_log.c
> @@ -208,7 +208,9 @@ static bool guc_check_log_buf_overflow(struct intel_guc_log *log,
>   			/* buffer_full_cnt is a 4 bit counter */
>   			log->stats[type].sampled_overflow += 16;
>   		}
> -		DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
> +
> +		dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
> +				       "GuC log buffer overflow\n");
>   	}
>   
>   	return overflow;
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Maybe we could even go debug, although I not sure if userspace is 
expected to look at dmesg to detect overflows or if the thing has some 
marker it writes into the file in that case. I don't see it so that's 
probably a slight flaw.

Regards,

Tvrtko
Chris Wilson June 17, 2019, 2:26 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-06-17 15:21:54)
> 
> On 17/06/2019 11:09, Chris Wilson wrote:
> > If the user is clearing the log buffer too slowly, we overflow. As this
> > is an expected condition, and the driver tries to handle it, reduce the
> > error message down to a notice.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_guc_log.c b/drivers/gpu/drm/i915/intel_guc_log.c
> > index bf1446629703..e3b83ecb90b5 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_log.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_log.c
> > @@ -208,7 +208,9 @@ static bool guc_check_log_buf_overflow(struct intel_guc_log *log,
> >                       /* buffer_full_cnt is a 4 bit counter */
> >                       log->stats[type].sampled_overflow += 16;
> >               }
> > -             DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
> > +
> > +             dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
> > +                                    "GuC log buffer overflow\n");
> >       }
> >   
> >       return overflow;
> > 
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Maybe we could even go debug, although I not sure if userspace is 
> expected to look at dmesg to detect overflows or if the thing has some 
> marker it writes into the file in that case. I don't see it so that's 
> probably a slight flaw.

It wasn't completely clear to me how well userspace did handle this (I
haven't run the debug tools myself) so settled for leaving a hint
visible in dmesg. Ideally, if we are expecting userspace to be handling
the ringbuf interface, it should be coping with the overflows then this
is just noise and can be removed.
-Chris
Michal Wajdeczko June 17, 2019, 2:38 p.m. UTC | #3
On Mon, 17 Jun 2019 12:09:17 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> If the user is clearing the log buffer too slowly, we overflow. As this
> is an expected condition, and the driver tries to handle it, reduce the
> error message down to a notice.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_guc_log.c  
> b/drivers/gpu/drm/i915/intel_guc_log.c
> index bf1446629703..e3b83ecb90b5 100644
> --- a/drivers/gpu/drm/i915/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/intel_guc_log.c
> @@ -208,7 +208,9 @@ static bool guc_check_log_buf_overflow(struct  
> intel_guc_log *log,
>  			/* buffer_full_cnt is a 4 bit counter */
>  			log->stats[type].sampled_overflow += 16;
>  		}
> -		DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
> +
> +		dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
> +				       "GuC log buffer overflow\n");

While this change alone is not harmful, I'm afraid that there might
be another reason why we see this message : once buffer_full_cnt is
turned on by the fw and then we reset the Guc, this field may have
stale value as it is not cleared by us/uc and we may wrongly assume
that there was an overflow.

>  	}
> 	return overflow;
Chris Wilson June 17, 2019, 2:42 p.m. UTC | #4
Quoting Michal Wajdeczko (2019-06-17 15:38:53)
> On Mon, 17 Jun 2019 12:09:17 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > If the user is clearing the log buffer too slowly, we overflow. As this
> > is an expected condition, and the driver tries to handle it, reduce the
> > error message down to a notice.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_guc_log.c  
> > b/drivers/gpu/drm/i915/intel_guc_log.c
> > index bf1446629703..e3b83ecb90b5 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_log.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_log.c
> > @@ -208,7 +208,9 @@ static bool guc_check_log_buf_overflow(struct  
> > intel_guc_log *log,
> >                       /* buffer_full_cnt is a 4 bit counter */
> >                       log->stats[type].sampled_overflow += 16;
> >               }
> > -             DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
> > +
> > +             dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
> > +                                    "GuC log buffer overflow\n");
> 
> While this change alone is not harmful, I'm afraid that there might
> be another reason why we see this message : once buffer_full_cnt is
> turned on by the fw and then we reset the Guc, this field may have
> stale value as it is not cleared by us/uc and we may wrongly assume
> that there was an overflow.

Fwiw, at the moment, it is being thrown by the cat /debug/dri/0/* test.
-Chris
Chris Wilson June 18, 2019, 10:24 a.m. UTC | #5
Quoting Michal Wajdeczko (2019-06-17 15:38:53)
> On Mon, 17 Jun 2019 12:09:17 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > If the user is clearing the log buffer too slowly, we overflow. As this
> > is an expected condition, and the driver tries to handle it, reduce the
> > error message down to a notice.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_guc_log.c  
> > b/drivers/gpu/drm/i915/intel_guc_log.c
> > index bf1446629703..e3b83ecb90b5 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_log.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_log.c
> > @@ -208,7 +208,9 @@ static bool guc_check_log_buf_overflow(struct  
> > intel_guc_log *log,
> >                       /* buffer_full_cnt is a 4 bit counter */
> >                       log->stats[type].sampled_overflow += 16;
> >               }
> > -             DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
> > +
> > +             dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
> > +                                    "GuC log buffer overflow\n");
> 
> While this change alone is not harmful, I'm afraid that there might
> be another reason why we see this message : once buffer_full_cnt is
> turned on by the fw and then we reset the Guc, this field may have
> stale value as it is not cleared by us/uc and we may wrongly assume
> that there was an overflow.

So yay/nay on silencing the test output so I can blissfully ignore it in
bugzilla?
-Chris
Michal Wajdeczko June 18, 2019, 10:30 a.m. UTC | #6
On Tue, 18 Jun 2019 12:24:30 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Michal Wajdeczko (2019-06-17 15:38:53)
>> On Mon, 17 Jun 2019 12:09:17 +0200, Chris Wilson
>> <chris@chris-wilson.co.uk> wrote:
>>
>> > If the user is clearing the log buffer too slowly, we overflow. As  
>> this
>> > is an expected condition, and the driver tries to handle it, reduce  
>> the
>> > error message down to a notice.
>> >
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110817
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_guc_log.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_guc_log.c
>> > b/drivers/gpu/drm/i915/intel_guc_log.c
>> > index bf1446629703..e3b83ecb90b5 100644
>> > --- a/drivers/gpu/drm/i915/intel_guc_log.c
>> > +++ b/drivers/gpu/drm/i915/intel_guc_log.c
>> > @@ -208,7 +208,9 @@ static bool guc_check_log_buf_overflow(struct
>> > intel_guc_log *log,
>> >                       /* buffer_full_cnt is a 4 bit counter */
>> >                       log->stats[type].sampled_overflow += 16;
>> >               }
>> > -             DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
>> > +
>> > +              
>> dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
>> > +                                    "GuC log buffer overflow\n");
>>
>> While this change alone is not harmful, I'm afraid that there might
>> be another reason why we see this message : once buffer_full_cnt is
>> turned on by the fw and then we reset the Guc, this field may have
>> stale value as it is not cleared by us/uc and we may wrongly assume
>> that there was an overflow.
>
> So yay/nay on silencing the test output so I can blissfully ignore it in
> bugzilla?

OK to merge, we can debug extra 'notice' logs separately

Michal
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_guc_log.c b/drivers/gpu/drm/i915/intel_guc_log.c
index bf1446629703..e3b83ecb90b5 100644
--- a/drivers/gpu/drm/i915/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/intel_guc_log.c
@@ -208,7 +208,9 @@  static bool guc_check_log_buf_overflow(struct intel_guc_log *log,
 			/* buffer_full_cnt is a 4 bit counter */
 			log->stats[type].sampled_overflow += 16;
 		}
-		DRM_ERROR_RATELIMITED("GuC log buffer overflow\n");
+
+		dev_notice_ratelimited(guc_to_i915(log_to_guc(log))->drm.dev,
+				       "GuC log buffer overflow\n");
 	}
 
 	return overflow;