diff mbox

[08/22] drm/i915/perf: rate limit spurious oa report notice

Message ID 20170511154345.962-9-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lionel Landwerlin May 11, 2017, 3:43 p.m. UTC
From: Robert Bragg <robert@sixbynine.org>

This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
    printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  6 ++++++
 drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

Comments

Chris Wilson May 13, 2017, 10:08 a.m. UTC | #1
On Thu, May 11, 2017 at 04:43:31PM +0100, Lionel Landwerlin wrote:
> From: Robert Bragg <robert@sixbynine.org>
> 
> This change is pre-emptively aiming to avoid a potential cause of kernel
> logging noise in case some condition were to result in us seeing invalid
> OA reports.
> 
> The workaround for the OA unit's tail pointer race condition is what
> avoids the primary known cause of invalid reports being seen and with
> that in place we aren't expecting to see this notice but it can't be
> entirely ruled out.
> 
> Just in case some condition does lead to the notice then it's likely
> that it will be triggered repeatedly while attempting to append a
> sequence of reports and depending on the configured OA sampling
> frequency that might be a large number of repeat notices.
> 
> v2: (Chris) avoid inconsistent warning on throttle with
>     printk_ratelimit()
> v3: (Matt) init and summarise with stream init/close not driver init/fini
> 
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

I've pushed up to this point as this looks to be the end of the bug
fixes and the start of the feature work. However Lionel, then I noticed
that you haven't been adding your s-o-b for the patches you've handled,
it's just a step to ensure the provenance of every patch entering the
kernel through us.
-Chris
Lionel Landwerlin May 13, 2017, 10:55 a.m. UTC | #2
On 13/05/17 11:08, Chris Wilson wrote:
> On Thu, May 11, 2017 at 04:43:31PM +0100, Lionel Landwerlin wrote:
>> From: Robert Bragg <robert@sixbynine.org>
>>
>> This change is pre-emptively aiming to avoid a potential cause of kernel
>> logging noise in case some condition were to result in us seeing invalid
>> OA reports.
>>
>> The workaround for the OA unit's tail pointer race condition is what
>> avoids the primary known cause of invalid reports being seen and with
>> that in place we aren't expecting to see this notice but it can't be
>> entirely ruled out.
>>
>> Just in case some condition does lead to the notice then it's likely
>> that it will be triggered repeatedly while attempting to append a
>> sequence of reports and depending on the configured OA sampling
>> frequency that might be a large number of repeat notices.
>>
>> v2: (Chris) avoid inconsistent warning on throttle with
>>      printk_ratelimit()
>> v3: (Matt) init and summarise with stream init/close not driver init/fini
>>
>> Signed-off-by: Robert Bragg <robert@sixbynine.org>
>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> I've pushed up to this point as this looks to be the end of the bug
> fixes and the start of the feature work. However Lionel, then I noticed
> that you haven't been adding your s-o-b for the patches you've handled,
> it's just a step to ensure the provenance of every patch entering the
> kernel through us.
> -Chris
>

I haven't actually touched the gen7 patches (1 to 8)
The rest should have my s-o-b
Chris Wilson May 13, 2017, 11:05 a.m. UTC | #3
On Sat, May 13, 2017 at 11:55:24AM +0100, Lionel Landwerlin wrote:
> On 13/05/17 11:08, Chris Wilson wrote:
> >On Thu, May 11, 2017 at 04:43:31PM +0100, Lionel Landwerlin wrote:
> >>From: Robert Bragg <robert@sixbynine.org>
> >>
> >>This change is pre-emptively aiming to avoid a potential cause of kernel
> >>logging noise in case some condition were to result in us seeing invalid
> >>OA reports.
> >>
> >>The workaround for the OA unit's tail pointer race condition is what
> >>avoids the primary known cause of invalid reports being seen and with
> >>that in place we aren't expecting to see this notice but it can't be
> >>entirely ruled out.
> >>
> >>Just in case some condition does lead to the notice then it's likely
> >>that it will be triggered repeatedly while attempting to append a
> >>sequence of reports and depending on the configured OA sampling
> >>frequency that might be a large number of repeat notices.
> >>
> >>v2: (Chris) avoid inconsistent warning on throttle with
> >>     printk_ratelimit()
> >>v3: (Matt) init and summarise with stream init/close not driver init/fini
> >>
> >>Signed-off-by: Robert Bragg <robert@sixbynine.org>
> >>Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> >I've pushed up to this point as this looks to be the end of the bug
> >fixes and the start of the feature work. However Lionel, then I noticed
> >that you haven't been adding your s-o-b for the patches you've handled,
> >it's just a step to ensure the provenance of every patch entering the
> >kernel through us.
> >-Chris
> >
> 
> I haven't actually touched the gen7 patches (1 to 8)
> The rest should have my s-o-b

It's not about touching, it's about submitting patches for inclusion.
The s-o-b is to say that you have the authority to submit this code.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 22b2ea3ea66f..66dee15e4fc0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2354,6 +2354,12 @@  struct drm_i915_private {
 			wait_queue_head_t poll_wq;
 			bool pollin;
 
+			/**
+			 * For rate limiting any notifications of spurious
+			 * invalid OA reports
+			 */
+			struct ratelimit_state spurious_report_rs;
+
 			bool periodic;
 			int period_exponent;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index dbed19a5a1b2..665a3c53e388 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -632,7 +632,8 @@  static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		 * copying it to userspace...
 		 */
 		if (report32[0] == 0) {
-			DRM_NOTE("Skipping spurious, invalid OA report\n");
+			if (__ratelimit(&dev_priv->perf.oa.spurious_report_rs))
+				DRM_NOTE("Skipping spurious, invalid OA report\n");
 			continue;
 		}
 
@@ -912,6 +913,11 @@  static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
 		oa_put_render_ctx_id(stream);
 
 	dev_priv->perf.oa.exclusive_stream = NULL;
+
+	if (dev_priv->perf.oa.spurious_report_rs.missed) {
+		DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",
+			 dev_priv->perf.oa.spurious_report_rs.missed);
+	}
 }
 
 static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
@@ -1267,6 +1273,26 @@  static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		return -EINVAL;
 	}
 
+	/* We set up some ratelimit state to potentially throttle any _NOTES
+	 * about spurious, invalid OA reports which we don't forward to
+	 * userspace.
+	 *
+	 * The initialization is associated with opening the stream (not driver
+	 * init) considering we print a _NOTE about any throttling when closing
+	 * the stream instead of waiting until driver _fini which no one would
+	 * ever see.
+	 *
+	 * Using the same limiting factors as printk_ratelimit()
+	 */
+	ratelimit_state_init(&dev_priv->perf.oa.spurious_report_rs,
+			     5 * HZ, 10);
+	/* Since we use a DRM_NOTE for spurious reports it would be
+	 * inconsistent to let __ratelimit() automatically print a warning for
+	 * throttling.
+	 */
+	ratelimit_set_flags(&dev_priv->perf.oa.spurious_report_rs,
+			    RATELIMIT_MSG_ON_RELEASE);
+
 	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
 
 	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;