Message ID | 20180102151235.3949-14-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 02/01/2018 15:12, Chris Wilson wrote: > We have a hole in our busy-stat accounting if the pmu is enabled during > a long running batch, the pmu will not start accumulating busy-time > until the next context switch. This then fails tests that are only > sampling a single batch. Oops, something in recent perf_pmu rework uncovered this? > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > index ebdcbcbacb3c..c2f01ff96ce1 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) > spin_lock_irqsave(&engine->stats.lock, flags); > if (engine->stats.enabled == ~0) > goto busy; > - if (engine->stats.enabled++ == 0) > + if (engine->stats.enabled++ == 0) { > engine->stats.enabled_at = ktime_get(); > + > + /* XXX submission method oblivious */ You mean once busy stats support for the GuC gets added? > + engine->stats.active = port_count(&engine->execlists.port[1]); > + engine->stats.active += port_count(&engine->execlists.port[0]); Hm, wouldn't this have the potential to over-account the active counter resulting in permanent 100%? Port count is [0, 2], where 2 is re-submission of port 0, while engine->stats.active is [0, 2], where this 2 is number of ports. In other words I think the correct method would be: if (port_count(0)) active++; if (port_count(1)) active++; Have to yet think how to handle GuC for this hole. Regards, Tvrtko > + if (engine->stats.active) > + engine->stats.start = engine->stats.enabled_at; > + } > spin_unlock_irqrestore(&engine->stats.lock, flags); > > return 0; >
Quoting Tvrtko Ursulin (2018-01-08 14:18:53) > > On 02/01/2018 15:12, Chris Wilson wrote: > > We have a hole in our busy-stat accounting if the pmu is enabled during > > a long running batch, the pmu will not start accumulating busy-time > > until the next context switch. This then fails tests that are only > > sampling a single batch. > > Oops, something in recent perf_pmu rework uncovered this? Yeah, the tweaks reduced some tests to submitting a single batch started before the counters. Previously all tests started the counters then submitted the workload. I had a test planned, obviously forgot to write it. I couldn't think of a way to guarantee hitting the lite-restore case thought. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index ebdcbcbacb3c..c2f01ff96ce1 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) > > spin_lock_irqsave(&engine->stats.lock, flags); > > if (engine->stats.enabled == ~0) > > goto busy; > > - if (engine->stats.enabled++ == 0) > > + if (engine->stats.enabled++ == 0) { > > engine->stats.enabled_at = ktime_get(); > > + > > + /* XXX submission method oblivious */ > > You mean once busy stats support for the GuC gets added? I was just thinking that we shouldn't be poking into execlists here and that we need an agnostic method for execlists to tell busy_stats that it is starting active. Or something. > > + engine->stats.active = port_count(&engine->execlists.port[1]); > > + engine->stats.active += port_count(&engine->execlists.port[0]); > > Hm, wouldn't this have the potential to over-account the active counter > resulting in permanent 100%? Port count is [0, 2], where 2 is > re-submission of port 0, while engine->stats.active is [0, 2], where > this 2 is number of ports. In other words I think the correct method > would be: > > if (port_count(0)) > active++; > if (port_count(1)) > active++; I thought we needed for stats.active to reflect the number of pending context_out we will report. So for the lite-restore case, I thought 2 was the correct value. -Chris
On 08/01/2018 14:29, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-01-08 14:18:53) >> >> On 02/01/2018 15:12, Chris Wilson wrote: >>> We have a hole in our busy-stat accounting if the pmu is enabled during >>> a long running batch, the pmu will not start accumulating busy-time >>> until the next context switch. This then fails tests that are only >>> sampling a single batch. >> >> Oops, something in recent perf_pmu rework uncovered this? > > Yeah, the tweaks reduced some tests to submitting a single batch started > before the counters. Previously all tests started the counters then > submitted the workload. I had a test planned, obviously forgot to write > it. I couldn't think of a way to guarantee hitting the lite-restore case > thought. Spin batch from ctx 1 and another to ctx 2, after a short delay, wouldn't do it? Or at least reliably? >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> --- >>> drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c >>> index ebdcbcbacb3c..c2f01ff96ce1 100644 >>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c >>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c >>> @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) >>> spin_lock_irqsave(&engine->stats.lock, flags); >>> if (engine->stats.enabled == ~0) >>> goto busy; >>> - if (engine->stats.enabled++ == 0) >>> + if (engine->stats.enabled++ == 0) { >>> engine->stats.enabled_at = ktime_get(); >>> + >>> + /* XXX submission method oblivious */ >> >> You mean once busy stats support for the GuC gets added? > > I was just thinking that we shouldn't be poking into execlists here and > that we need an agnostic method for execlists to tell busy_stats that it > is starting active. Or something. Okay, yes. Would need to define what the API returns explicitly, perhaps a number of context submitted to the hardware. But that could be trouble for the GuC backend. Will think about it. >>> + engine->stats.active = port_count(&engine->execlists.port[1]); >>> + engine->stats.active += port_count(&engine->execlists.port[0]); >> >> Hm, wouldn't this have the potential to over-account the active counter >> resulting in permanent 100%? Port count is [0, 2], where 2 is >> re-submission of port 0, while engine->stats.active is [0, 2], where >> this 2 is number of ports. In other words I think the correct method >> would be: >> >> if (port_count(0)) >> active++; >> if (port_count(1)) >> active++; > > I thought we needed for stats.active to reflect the number of pending > context_out we will report. So for the lite-restore case, I thought 2 > was the correct value. Hm, so am I mistaken that we would get still only one context_out (port->count to zero transition) from port0 in lite-restore case? I can't spot it in the code at the moment that I am, but who knows. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-01-08 14:44:27) > > On 08/01/2018 14:29, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-01-08 14:18:53) > >> > >> On 02/01/2018 15:12, Chris Wilson wrote: > >>> We have a hole in our busy-stat accounting if the pmu is enabled during > >>> a long running batch, the pmu will not start accumulating busy-time > >>> until the next context switch. This then fails tests that are only > >>> sampling a single batch. > >> > >> Oops, something in recent perf_pmu rework uncovered this? > > > > Yeah, the tweaks reduced some tests to submitting a single batch started > > before the counters. Previously all tests started the counters then > > submitted the workload. I had a test planned, obviously forgot to write > > it. I couldn't think of a way to guarantee hitting the lite-restore case > > thought. > > Spin batch from ctx 1 and another to ctx 2, after a short delay, > wouldn't do it? Or at least reliably? You need: spin batch ctx1 short batch ctx2 spin batch ctx2 Terminate spin_batch_ctx1 and then you have a very small window where we get the CS interrupt from ctx1 and we resubmit ctx2 with both short+spin batches, up until we then get the lite-restore CS interrupt. That's the window you have to hit, a few microseconds. > > >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>> --- > >>> drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++- > >>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > >>> index ebdcbcbacb3c..c2f01ff96ce1 100644 > >>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c > >>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > >>> @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) > >>> spin_lock_irqsave(&engine->stats.lock, flags); > >>> if (engine->stats.enabled == ~0) > >>> goto busy; > >>> - if (engine->stats.enabled++ == 0) > >>> + if (engine->stats.enabled++ == 0) { > >>> engine->stats.enabled_at = ktime_get(); > >>> + > >>> + /* XXX submission method oblivious */ > >> > >> You mean once busy stats support for the GuC gets added? > > > > I was just thinking that we shouldn't be poking into execlists here and > > that we need an agnostic method for execlists to tell busy_stats that it > > is starting active. Or something. > > Okay, yes. Would need to define what the API returns explicitly, perhaps > a number of context submitted to the hardware. But that could be trouble > for the GuC backend. Will think about it. > > >>> + engine->stats.active = port_count(&engine->execlists.port[1]); > >>> + engine->stats.active += port_count(&engine->execlists.port[0]); > >> > >> Hm, wouldn't this have the potential to over-account the active counter > >> resulting in permanent 100%? Port count is [0, 2], where 2 is > >> re-submission of port 0, while engine->stats.active is [0, 2], where > >> this 2 is number of ports. In other words I think the correct method > >> would be: > >> > >> if (port_count(0)) > >> active++; > >> if (port_count(1)) > >> active++; > > > > I thought we needed for stats.active to reflect the number of pending > > context_out we will report. So for the lite-restore case, I thought 2 > > was the correct value. > > Hm, so am I mistaken that we would get still only one context_out > (port->count to zero transition) from port0 in lite-restore case? I > can't spot it in the code at the moment that I am, but who knows. Nope, my mistake, It is just the transition to/from count==0 that are relevant. More reason to try and hit the lite-restore window :) -Chris
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index ebdcbcbacb3c..c2f01ff96ce1 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) spin_lock_irqsave(&engine->stats.lock, flags); if (engine->stats.enabled == ~0) goto busy; - if (engine->stats.enabled++ == 0) + if (engine->stats.enabled++ == 0) { engine->stats.enabled_at = ktime_get(); + + /* XXX submission method oblivious */ + engine->stats.active = port_count(&engine->execlists.port[1]); + engine->stats.active += port_count(&engine->execlists.port[0]); + if (engine->stats.active) + engine->stats.start = engine->stats.enabled_at; + } spin_unlock_irqrestore(&engine->stats.lock, flags); return 0;
We have a hole in our busy-stat accounting if the pmu is enabled during a long running batch, the pmu will not start accumulating busy-time until the next context switch. This then fails tests that are only sampling a single batch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)