diff mbox

[14/19] drm/i915: Reconstruct active state on starting busy-stats

Message ID 20180102151235.3949-14-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Jan. 2, 2018, 3:12 p.m. UTC
We have a hole in our busy-stat accounting if the pmu is enabled during
a long running batch, the pmu will not start accumulating busy-time
until the next context switch. This then fails tests that are only
sampling a single batch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Tvrtko Ursulin Jan. 8, 2018, 2:18 p.m. UTC | #1
On 02/01/2018 15:12, Chris Wilson wrote:
> We have a hole in our busy-stat accounting if the pmu is enabled during
> a long running batch, the pmu will not start accumulating busy-time
> until the next context switch. This then fails tests that are only
> sampling a single batch.

Oops, something in recent perf_pmu rework uncovered this?

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index ebdcbcbacb3c..c2f01ff96ce1 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>   	spin_lock_irqsave(&engine->stats.lock, flags);
>   	if (engine->stats.enabled == ~0)
>   		goto busy;
> -	if (engine->stats.enabled++ == 0)
> +	if (engine->stats.enabled++ == 0) {
>   		engine->stats.enabled_at = ktime_get();
> +
> +		/* XXX submission method oblivious */

You mean once busy stats support for the GuC gets added?

> +		engine->stats.active = port_count(&engine->execlists.port[1]);
> +		engine->stats.active += port_count(&engine->execlists.port[0]);

Hm, wouldn't this have the potential to over-account the active counter 
resulting in permanent 100%? Port count is [0, 2], where 2 is 
re-submission of port 0, while engine->stats.active is [0, 2], where 
this 2 is number of ports. In other words I think the correct method 
would be:

if (port_count(0))
	active++;
if (port_count(1))
	active++;

Have to yet think how to handle GuC for this hole.

Regards,

Tvrtko

> +		if (engine->stats.active)
> +			engine->stats.start = engine->stats.enabled_at;
> +	}
>   	spin_unlock_irqrestore(&engine->stats.lock, flags);
>   
>   	return 0;
>
Chris Wilson Jan. 8, 2018, 2:29 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-01-08 14:18:53)
> 
> On 02/01/2018 15:12, Chris Wilson wrote:
> > We have a hole in our busy-stat accounting if the pmu is enabled during
> > a long running batch, the pmu will not start accumulating busy-time
> > until the next context switch. This then fails tests that are only
> > sampling a single batch.
> 
> Oops, something in recent perf_pmu rework uncovered this?

Yeah, the tweaks reduced some tests to submitting a single batch started
before the counters. Previously all tests started the counters then
submitted the workload. I had a test planned, obviously forgot to write
it. I couldn't think of a way to guarantee hitting the lite-restore case
thought.
 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index ebdcbcbacb3c..c2f01ff96ce1 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
> >       spin_lock_irqsave(&engine->stats.lock, flags);
> >       if (engine->stats.enabled == ~0)
> >               goto busy;
> > -     if (engine->stats.enabled++ == 0)
> > +     if (engine->stats.enabled++ == 0) {
> >               engine->stats.enabled_at = ktime_get();
> > +
> > +             /* XXX submission method oblivious */
> 
> You mean once busy stats support for the GuC gets added?

I was just thinking that we shouldn't be poking into execlists here and
that we need an agnostic method for execlists to tell busy_stats that it
is starting active. Or something.

> > +             engine->stats.active = port_count(&engine->execlists.port[1]);
> > +             engine->stats.active += port_count(&engine->execlists.port[0]);
> 
> Hm, wouldn't this have the potential to over-account the active counter 
> resulting in permanent 100%? Port count is [0, 2], where 2 is 
> re-submission of port 0, while engine->stats.active is [0, 2], where 
> this 2 is number of ports. In other words I think the correct method 
> would be:
> 
> if (port_count(0))
>         active++;
> if (port_count(1))
>         active++;

I thought we needed for stats.active to reflect the number of pending
context_out we will report. So for the lite-restore case, I thought 2
was the correct value.
-Chris
Tvrtko Ursulin Jan. 8, 2018, 2:44 p.m. UTC | #3
On 08/01/2018 14:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-08 14:18:53)
>>
>> On 02/01/2018 15:12, Chris Wilson wrote:
>>> We have a hole in our busy-stat accounting if the pmu is enabled during
>>> a long running batch, the pmu will not start accumulating busy-time
>>> until the next context switch. This then fails tests that are only
>>> sampling a single batch.
>>
>> Oops, something in recent perf_pmu rework uncovered this?
> 
> Yeah, the tweaks reduced some tests to submitting a single batch started
> before the counters. Previously all tests started the counters then
> submitted the workload. I had a test planned, obviously forgot to write
> it. I couldn't think of a way to guarantee hitting the lite-restore case
> thought.

Spin batch from ctx 1 and another to ctx 2, after a short delay, 
wouldn't do it? Or at least reliably?

>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++-
>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> index ebdcbcbacb3c..c2f01ff96ce1 100644
>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>>>        spin_lock_irqsave(&engine->stats.lock, flags);
>>>        if (engine->stats.enabled == ~0)
>>>                goto busy;
>>> -     if (engine->stats.enabled++ == 0)
>>> +     if (engine->stats.enabled++ == 0) {
>>>                engine->stats.enabled_at = ktime_get();
>>> +
>>> +             /* XXX submission method oblivious */
>>
>> You mean once busy stats support for the GuC gets added?
> 
> I was just thinking that we shouldn't be poking into execlists here and
> that we need an agnostic method for execlists to tell busy_stats that it
> is starting active. Or something.

Okay, yes. Would need to define what the API returns explicitly, perhaps 
a number of context submitted to the hardware. But that could be trouble 
for the GuC backend. Will think about it.

>>> +             engine->stats.active = port_count(&engine->execlists.port[1]);
>>> +             engine->stats.active += port_count(&engine->execlists.port[0]);
>>
>> Hm, wouldn't this have the potential to over-account the active counter
>> resulting in permanent 100%? Port count is [0, 2], where 2 is
>> re-submission of port 0, while engine->stats.active is [0, 2], where
>> this 2 is number of ports. In other words I think the correct method
>> would be:
>>
>> if (port_count(0))
>>          active++;
>> if (port_count(1))
>>          active++;
> 
> I thought we needed for stats.active to reflect the number of pending
> context_out we will report. So for the lite-restore case, I thought 2
> was the correct value.

Hm, so am I mistaken that we would get still only one context_out 
(port->count to zero transition) from port0 in lite-restore case? I 
can't spot it in the code at the moment that I am, but who knows.

Regards,

Tvrtko
Chris Wilson Jan. 8, 2018, 2:52 p.m. UTC | #4
Quoting Tvrtko Ursulin (2018-01-08 14:44:27)
> 
> On 08/01/2018 14:29, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-01-08 14:18:53)
> >>
> >> On 02/01/2018 15:12, Chris Wilson wrote:
> >>> We have a hole in our busy-stat accounting if the pmu is enabled during
> >>> a long running batch, the pmu will not start accumulating busy-time
> >>> until the next context switch. This then fails tests that are only
> >>> sampling a single batch.
> >>
> >> Oops, something in recent perf_pmu rework uncovered this?
> > 
> > Yeah, the tweaks reduced some tests to submitting a single batch started
> > before the counters. Previously all tests started the counters then
> > submitted the workload. I had a test planned, obviously forgot to write
> > it. I couldn't think of a way to guarantee hitting the lite-restore case
> > thought.
> 
> Spin batch from ctx 1 and another to ctx 2, after a short delay, 
> wouldn't do it? Or at least reliably?

You need:

spin batch ctx1
short batch ctx2
spin batch ctx2

Terminate spin_batch_ctx1 and then you have a very small window where we
get the CS interrupt from ctx1 and we resubmit ctx2 with both short+spin
batches, up until we then get the lite-restore CS interrupt.

That's the window you have to hit, a few microseconds.

> 
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/intel_engine_cs.c | 9 ++++++++-
> >>>    1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> >>> index ebdcbcbacb3c..c2f01ff96ce1 100644
> >>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> >>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> >>> @@ -1946,8 +1946,15 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
> >>>        spin_lock_irqsave(&engine->stats.lock, flags);
> >>>        if (engine->stats.enabled == ~0)
> >>>                goto busy;
> >>> -     if (engine->stats.enabled++ == 0)
> >>> +     if (engine->stats.enabled++ == 0) {
> >>>                engine->stats.enabled_at = ktime_get();
> >>> +
> >>> +             /* XXX submission method oblivious */
> >>
> >> You mean once busy stats support for the GuC gets added?
> > 
> > I was just thinking that we shouldn't be poking into execlists here and
> > that we need an agnostic method for execlists to tell busy_stats that it
> > is starting active. Or something.
> 
> Okay, yes. Would need to define what the API returns explicitly, perhaps 
> a number of context submitted to the hardware. But that could be trouble 
> for the GuC backend. Will think about it.
> 
> >>> +             engine->stats.active = port_count(&engine->execlists.port[1]);
> >>> +             engine->stats.active += port_count(&engine->execlists.port[0]);
> >>
> >> Hm, wouldn't this have the potential to over-account the active counter
> >> resulting in permanent 100%? Port count is [0, 2], where 2 is
> >> re-submission of port 0, while engine->stats.active is [0, 2], where
> >> this 2 is number of ports. In other words I think the correct method
> >> would be:
> >>
> >> if (port_count(0))
> >>          active++;
> >> if (port_count(1))
> >>          active++;
> > 
> > I thought we needed for stats.active to reflect the number of pending
> > context_out we will report. So for the lite-restore case, I thought 2
> > was the correct value.
> 
> Hm, so am I mistaken that we would get still only one context_out 
> (port->count to zero transition) from port0 in lite-restore case? I 
> can't spot it in the code at the moment that I am, but who knows.

Nope, my mistake, It is just the transition to/from count==0 that are
relevant. More reason to try and hit the lite-restore window :)
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ebdcbcbacb3c..c2f01ff96ce1 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1946,8 +1946,15 @@  int intel_enable_engine_stats(struct intel_engine_cs *engine)
 	spin_lock_irqsave(&engine->stats.lock, flags);
 	if (engine->stats.enabled == ~0)
 		goto busy;
-	if (engine->stats.enabled++ == 0)
+	if (engine->stats.enabled++ == 0) {
 		engine->stats.enabled_at = ktime_get();
+
+		/* XXX submission method oblivious */
+		engine->stats.active = port_count(&engine->execlists.port[1]);
+		engine->stats.active += port_count(&engine->execlists.port[0]);
+		if (engine->stats.active)
+			engine->stats.start = engine->stats.enabled_at;
+	}
 	spin_unlock_irqrestore(&engine->stats.lock, flags);
 
 	return 0;