diff mbox

[3/3] drm/i915/perf: rate limit spurious oa report notice

Message ID 20170222152510.5039-4-robert@sixbynine.org (mailing list archive)
State New, archived
Headers show

Commit Message

Robert Bragg Feb. 22, 2017, 3:25 p.m. UTC
This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary know cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Matthew Auld Feb. 28, 2017, 1:28 p.m. UTC | #1
On 22 February 2017 at 15:25, Robert Bragg <robert@sixbynine.org> wrote:
> This change is pre-emptively aiming to avoid a potential cause of kernel
> logging noise in case some condition were to result in us seeing invalid
> OA reports.
>
> The workaround for the OA unit's tail pointer race condition is what
> avoids the primary know cause of invalid reports being seen and with
> that in place we aren't expecting to see this notice but it can't be
> entirely ruled out.
>
> Just in case some condition does lead to the notice then it's likely
> that it will be triggered repeatedly while attempting to append a
> sequence of reports and depending on the configured OA sampling
> frequency that might be a large number of repeat notices.
>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Chris Wilson Feb. 28, 2017, 1:33 p.m. UTC | #2
On Tue, Feb 28, 2017 at 01:28:13PM +0000, Matthew Auld wrote:
> On 22 February 2017 at 15:25, Robert Bragg <robert@sixbynine.org> wrote:
> > This change is pre-emptively aiming to avoid a potential cause of kernel
> > logging noise in case some condition were to result in us seeing invalid
> > OA reports.
> >
> > The workaround for the OA unit's tail pointer race condition is what
> > avoids the primary know cause of invalid reports being seen and with
> > that in place we aren't expecting to see this notice but it can't be
> > entirely ruled out.
> >
> > Just in case some condition does lead to the notice then it's likely
> > that it will be triggered repeatedly while attempting to append a
> > sequence of reports and depending on the configured OA sampling
> > frequency that might be a large number of repeat notices.
> >
> > Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

printk_ratelimits emits a WARN when triggered, defeating the purpose of
using NOTE.
-Chris
Robert Bragg March 20, 2017, 4:21 p.m. UTC | #3
On Tue, Feb 28, 2017 at 1:33 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, Feb 28, 2017 at 01:28:13PM +0000, Matthew Auld wrote:
>> On 22 February 2017 at 15:25, Robert Bragg <robert@sixbynine.org> wrote:
>> > This change is pre-emptively aiming to avoid a potential cause of kernel
>> > logging noise in case some condition were to result in us seeing invalid
>> > OA reports.
>> >
>> > The workaround for the OA unit's tail pointer race condition is what
>> > avoids the primary know cause of invalid reports being seen and with
>> > that in place we aren't expecting to see this notice but it can't be
>> > entirely ruled out.
>> >
>> > Just in case some condition does lead to the notice then it's likely
>> > that it will be triggered repeatedly while attempting to append a
>> > sequence of reports and depending on the configured OA sampling
>> > frequency that might be a large number of repeat notices.
>> >
>> > Signed-off-by: Robert Bragg <robert@sixbynine.org>
>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>
> printk_ratelimits emits a WARN when triggered, defeating the purpose of
> using NOTE.

Hmm, that's a slightly awkward problem.

The warning comes from the common __ratelimit utility api that's used
for numerous things but by default will warn if the rate limiting
kicked in (once printing resumes again).

There's a RATELIMIT_MSG_ON_RELEASE flag that can be set to def the
warning until ratelimit_state_exit() is called (which looks optional
or at least the flag could also be cleared before calling it to avoid
any warnings).

In general printk_ratelimit() doesn't seem like an ideal mechanism for
throttling messages, even if they are warnings, since the rate limit
state is shared across orthogonal components, so maybe it's best
anyway to use a custom rate limit state.

Considering the _MSG_ON_RELEASE flag, I think I'll init some ratelimit
state in dev_priv->perf.oa just for this message, use
ratelimit_set_flags() to set _MSG_ON_RELEASE and have a corresponding
debug/note in i915_perf_fini after checking the rs->missed counter.

Actually at this point I'm slightly doubting whether having a warning
for throttling is that terrible in this case. Although I tend to think
it could be good to have dedicated ratelimit state here, there
conceptually is a point a which a visible warning could be appropriate
if we're really seeing a lot of these spurious reports. It's a grey
area since it's ok as note if it's rare/intermittent, but somthing's
maybe gone wrong if we see *lots* of these. hmm.

Incidentally, I suppose this same observation is relevant to many of
the DRM_*_RATELIMITED macros too - there will be an inconsistent level
used to notify about any throttling?

Br,
- Robert

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d04ebaa8406e..a901bcd80263 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -632,7 +632,8 @@  static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		 * copying it to userspace...
 		 */
 		if (report32[0] == 0) {
-			DRM_NOTE("Skipping spurious, invalid OA report\n");
+			if (printk_ratelimit())
+				DRM_NOTE("Skipping spurious, invalid OA report\n");
 			continue;
 		}