diff mbox

[2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option

Message ID 20180625172546.7729-2-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin June 25, 2018, 5:25 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This Kconfig option was added to protect the implementation specific
internals from user expectations but so far it was mostly hassle.

Remove it so it is possible to debug request submission on any kernel
anywhere.

This adds around 4k to default i915.ko build but should have no
performance effects due inactive tracepoints being no-op-ed out and out-
of-line.

Users should remember tracepoints which are close to low level i915
implementation details are subject to change and cannot be guaranteed.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/Kconfig.debug | 11 -----------
 drivers/gpu/drm/i915/i915_trace.h  | 20 --------------------
 2 files changed, 31 deletions(-)

Comments

Chris Wilson June 25, 2018, 8:02 p.m. UTC | #1
Quoting Tvrtko Ursulin (2018-06-25 18:25:46)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> This Kconfig option was added to protect the implementation specific
> internals from user expectations but so far it was mostly hassle.
> 
> Remove it so it is possible to debug request submission on any kernel
> anywhere.

Our job is not to let bugs into the wild ;)
 
> This adds around 4k to default i915.ko build but should have no
> performance effects due inactive tracepoints being no-op-ed out and out-
> of-line.
> 
> Users should remember tracepoints which are close to low level i915
> implementation details are subject to change and cannot be guaranteed.

That's the caveat that I feel needs fleshed out. Burying it had the
advantage of making it quite clear that you had to opt in and pick up
the pieces when it inevitably breaks.

What is wanted and what can we reasonable provide? If the tracepoints
needs to undergo major change before the next LTS, let alone for the
life of that LTS...

If we know what is wanted can we define that better in terms of
dma_fence and leave lowlevel for debugging (or think of how we achieve
the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
that.
-Chris
Tvrtko Ursulin June 26, 2018, 10:46 a.m. UTC | #2
On 25/06/2018 21:02, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-25 18:25:46)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> This Kconfig option was added to protect the implementation specific
>> internals from user expectations but so far it was mostly hassle.
>>
>> Remove it so it is possible to debug request submission on any kernel
>> anywhere.
> 
> Our job is not to let bugs into the wild ;)

I did not word that well - I actually meant debugging the engine 
timelines for unexpected stalls and/or dependencies. So more about 
userspace being able to analyse what's happening.

>> This adds around 4k to default i915.ko build but should have no
>> performance effects due inactive tracepoints being no-op-ed out and out-
>> of-line.
>>
>> Users should remember tracepoints which are close to low level i915
>> implementation details are subject to change and cannot be guaranteed.
> 
> That's the caveat that I feel needs fleshed out. Burying it had the
> advantage of making it quite clear that you had to opt in and pick up
> the pieces when it inevitably breaks.
> 
> What is wanted and what can we reasonable provide? If the tracepoints
> needs to undergo major change before the next LTS, let alone for the
> life of that LTS...
> 
> If we know what is wanted can we define that better in terms of
> dma_fence and leave lowlevel for debugging (or think of how we achieve
> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
> that.

What is wanted is for instance take trace.pl on any kernel anywhere and 
it is able to deduce/draw the exact metrics/timeline of command 
submission for an workload.

At the moment it without low level tracepoints, and without the 
intel_engine_notify tweak, it is workload dependent on how close it 
could get.

So a set of tracepoints to allow drawing the timeline:

1. request_queue (or _add)
2. request_submit
3. intel_engine_notify
4. request_in/out

With this set the above is possible and we don't need a lot of work to 
get there.

And with the Virtual Engine it will become more interesting to have 
this. So if we had a bug report saying load balancing is not working 
well, we could just say "please run it via trace.pl --trace and attach 
perf script output". That way we could easily see whether or not is is a 
problem in userspace behaviour or else.

Regards,

Tvrtko
Chris Wilson June 26, 2018, 10:55 a.m. UTC | #3
Quoting Tvrtko Ursulin (2018-06-26 11:46:51)
> 
> On 25/06/2018 21:02, Chris Wilson wrote:
> > If we know what is wanted can we define that better in terms of
> > dma_fence and leave lowlevel for debugging (or think of how we achieve
> > the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
> > that.
> 
> What is wanted is for instance take trace.pl on any kernel anywhere and 
> it is able to deduce/draw the exact metrics/timeline of command 
> submission for an workload.
> 
> At the moment it without low level tracepoints, and without the 
> intel_engine_notify tweak, it is workload dependent on how close it 
> could get.

Interjecting what dma-fence already has (or we could use), not sure how
well userspace can actually map it to their timelines.
> 
> So a set of tracepoints to allow drawing the timeline:
> 
> 1. request_queue (or _add)
dma_fence_init

> 2. request_submit

> 3. intel_engine_notify
For obvious reasons, no match in dma_fence.

> 4. request_in
dma_fence_emit

> 5. request out
dma_fence_signal (similar, not quite, we would have to force irq
signaling).
 
> With this set the above is possible and we don't need a lot of work to 
> get there.

From a brief glance we are missing a dma_fence_queue for request_submit
replacement.

So next question is what information do we get from our tracepoints (or
more precisely do you use) that we lack in dma_fence?

> And with the Virtual Engine it will become more interesting to have 
> this. So if we had a bug report saying load balancing is not working 
> well, we could just say "please run it via trace.pl --trace and attach 
> perf script output". That way we could easily see whether or not is is a 
> problem in userspace behaviour or else.

And there I was wanting a script to capture the workload so that we
could replay it and dissect it. :-p
-Chris
Tvrtko Ursulin June 26, 2018, 11:24 a.m. UTC | #4
On 26/06/2018 11:55, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-26 11:46:51)
>>
>> On 25/06/2018 21:02, Chris Wilson wrote:
>>> If we know what is wanted can we define that better in terms of
>>> dma_fence and leave lowlevel for debugging (or think of how we achieve
>>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
>>> that.
>>
>> What is wanted is for instance take trace.pl on any kernel anywhere and
>> it is able to deduce/draw the exact metrics/timeline of command
>> submission for an workload.
>>
>> At the moment it without low level tracepoints, and without the
>> intel_engine_notify tweak, it is workload dependent on how close it
>> could get.
> 
> Interjecting what dma-fence already has (or we could use), not sure how
> well userspace can actually map it to their timelines.
>>
>> So a set of tracepoints to allow drawing the timeline:
>>
>> 1. request_queue (or _add)
> dma_fence_init
> 
>> 2. request_submit
> 
>> 3. intel_engine_notify
> For obvious reasons, no match in dma_fence.
> 
>> 4. request_in
> dma_fence_emit
> 
>> 5. request out
> dma_fence_signal (similar, not quite, we would have to force irq
> signaling).

Yes not quite the same due potential time shift between user interrupt 
and dma_fence_signal call via different paths.

>   
>> With this set the above is possible and we don't need a lot of work to
>> get there.
> 
>  From a brief glance we are missing a dma_fence_queue for request_submit
> replacement.
> 
> So next question is what information do we get from our tracepoints (or
> more precisely do you use) that we lack in dma_fence?

Port=%u and preemption (completed=%u) comes immediately to mind. Way to 
tie with engines would be nice or it is all abstract timelines.

Going this direction sounds like a long detour to get where we almost 
are. I suspect you are valuing the benefit of it being generic and hence 
and parsing tool could be cross-driver. But you can also just punt the 
"abstractising" into the parsing tool.

>> And with the Virtual Engine it will become more interesting to have
>> this. So if we had a bug report saying load balancing is not working
>> well, we could just say "please run it via trace.pl --trace and attach
>> perf script output". That way we could easily see whether or not is is a
>> problem in userspace behaviour or else.
> 
> And there I was wanting a script to capture the workload so that we
> could replay it and dissect it. :-p

Depends on what level you want that. Perf script output from the above 
tracepoints would do on one level. If you wanted a higher level to 
re-exercise load balancing then it wouldn't completely be enough, or at 
least a lot of guesswork would be needed.

Regards,

Tvrtko
Chris Wilson June 26, 2018, 11:48 a.m. UTC | #5
Quoting Tvrtko Ursulin (2018-06-26 12:24:51)
> 
> On 26/06/2018 11:55, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-26 11:46:51)
> >>
> >> On 25/06/2018 21:02, Chris Wilson wrote:
> >>> If we know what is wanted can we define that better in terms of
> >>> dma_fence and leave lowlevel for debugging (or think of how we achieve
> >>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
> >>> that.
> >>
> >> What is wanted is for instance take trace.pl on any kernel anywhere and
> >> it is able to deduce/draw the exact metrics/timeline of command
> >> submission for an workload.
> >>
> >> At the moment it without low level tracepoints, and without the
> >> intel_engine_notify tweak, it is workload dependent on how close it
> >> could get.
> > 
> > Interjecting what dma-fence already has (or we could use), not sure how
> > well userspace can actually map it to their timelines.
> >>
> >> So a set of tracepoints to allow drawing the timeline:
> >>
> >> 1. request_queue (or _add)
> > dma_fence_init
> > 
> >> 2. request_submit
> > 
> >> 3. intel_engine_notify
> > For obvious reasons, no match in dma_fence.
> > 
> >> 4. request_in
> > dma_fence_emit
> > 
> >> 5. request out
> > dma_fence_signal (similar, not quite, we would have to force irq
> > signaling).
> 
> Yes not quite the same due potential time shift between user interrupt 
> and dma_fence_signal call via different paths.
> 
> >   
> >> With this set the above is possible and we don't need a lot of work to
> >> get there.
> > 
> >  From a brief glance we are missing a dma_fence_queue for request_submit
> > replacement.
> > 
> > So next question is what information do we get from our tracepoints (or
> > more precisely do you use) that we lack in dma_fence?
> 
> Port=%u and preemption (completed=%u) comes immediately to mind. Way to 
> tie with engines would be nice or it is all abstract timelines.
> 
> Going this direction sounds like a long detour to get where we almost 
> are. I suspect you are valuing the benefit of it being generic and hence 
> and parsing tool could be cross-driver. But you can also just punt the 
> "abstractising" into the parsing tool.

It's just that this about the third time this has been raised in the
last couple of weeks with the other two requests being from a generic
tooling pov (Eric Anholt for gnome-shell tweaking, and some one
else looking for a gpuvis-like tool). So it seems like there is
interest, even if I doubt that it'll help answer any questions beyond
what you can just extract from looking at userspace. (Imo, the only
people these tracepoints are useful for are people writing patches for
the driver. For everyone else, you can just observe system behaviour and
optimise your code for your workload. Otoh, can one trust a black
box, argh.)

To have a second set of nearly equivalent tracepoints, we need to have
strong justification why we couldn't just use or extend the generic set.

Plus I feel a lot more comfortable exporting a set of generic
tracepoints, than those where we may be leaking more knowledge of the HW
than we can reasonably expect to support for the indefinite future.

> >> And with the Virtual Engine it will become more interesting to have
> >> this. So if we had a bug report saying load balancing is not working
> >> well, we could just say "please run it via trace.pl --trace and attach
> >> perf script output". That way we could easily see whether or not is is a
> >> problem in userspace behaviour or else.
> > 
> > And there I was wanting a script to capture the workload so that we
> > could replay it and dissect it. :-p
> 
> Depends on what level you want that. Perf script output from the above 
> tracepoints would do on one level. If you wanted a higher level to 
> re-exercise load balancing then it wouldn't completely be enough, or at 
> least a lot of guesswork would be needed.

It all depends on what level you want to optimise, is the way I look at
it. Userspace driver, you capture the client->driver userspace API (e.g.
cairo-trace, apitrace). But for optimising scheduling layout, we just
need a workload descriptor like wsim -- with perhaps the only tweak
being able to define latency/throughput metrics relevant to that
workload, and being able to integrate with a pseudo display server. The
challenge as I see it is being able to convince the user that it is a
useful diagnosis step and being able to generate a reasonable wsim
automatically.
-Chris
Tvrtko Ursulin Aug. 8, 2018, 12:13 p.m. UTC | #6
On 26/06/2018 12:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-26 12:24:51)
>>
>> On 26/06/2018 11:55, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-06-26 11:46:51)
>>>>
>>>> On 25/06/2018 21:02, Chris Wilson wrote:
>>>>> If we know what is wanted can we define that better in terms of
>>>>> dma_fence and leave lowlevel for debugging (or think of how we achieve
>>>>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push
>>>>> that.
>>>>
>>>> What is wanted is for instance take trace.pl on any kernel anywhere and
>>>> it is able to deduce/draw the exact metrics/timeline of command
>>>> submission for an workload.
>>>>
>>>> At the moment it without low level tracepoints, and without the
>>>> intel_engine_notify tweak, it is workload dependent on how close it
>>>> could get.
>>>
>>> Interjecting what dma-fence already has (or we could use), not sure how
>>> well userspace can actually map it to their timelines.
>>>>
>>>> So a set of tracepoints to allow drawing the timeline:
>>>>
>>>> 1. request_queue (or _add)
>>> dma_fence_init
>>>
>>>> 2. request_submit
>>>
>>>> 3. intel_engine_notify
>>> For obvious reasons, no match in dma_fence.
>>>
>>>> 4. request_in
>>> dma_fence_emit
>>>
>>>> 5. request out
>>> dma_fence_signal (similar, not quite, we would have to force irq
>>> signaling).
>>
>> Yes not quite the same due potential time shift between user interrupt
>> and dma_fence_signal call via different paths.
>>
>>>    
>>>> With this set the above is possible and we don't need a lot of work to
>>>> get there.
>>>
>>>   From a brief glance we are missing a dma_fence_queue for request_submit
>>> replacement.
>>>
>>> So next question is what information do we get from our tracepoints (or
>>> more precisely do you use) that we lack in dma_fence?
>>
>> Port=%u and preemption (completed=%u) comes immediately to mind. Way to
>> tie with engines would be nice or it is all abstract timelines.
>>
>> Going this direction sounds like a long detour to get where we almost
>> are. I suspect you are valuing the benefit of it being generic and hence
>> and parsing tool could be cross-driver. But you can also just punt the
>> "abstractising" into the parsing tool.
> 
> It's just that this about the third time this has been raised in the
> last couple of weeks with the other two requests being from a generic
> tooling pov (Eric Anholt for gnome-shell tweaking, and some one
> else looking for a gpuvis-like tool). So it seems like there is
> interest, even if I doubt that it'll help answer any questions beyond
> what you can just extract from looking at userspace. (Imo, the only
> people these tracepoints are useful for are people writing patches for
> the driver. For everyone else, you can just observe system behaviour and
> optimise your code for your workload. Otoh, can one trust a black
> box, argh.)

Some of the things might be obtainable purely from userspace via heavily 
instrumented builds, which may be in the realm of possible for during 
development, but I don't think it is feasible in general both because it 
is too involved, and because it would preclude existence of tools which 
can trace any random client.

> To have a second set of nearly equivalent tracepoints, we need to have
> strong justification why we couldn't just use or extend the generic set.

I was hoping that the conversation so far established that nearly 
equivalent is not close enough for intended use cases. And that is not 
possible to make the generic ones so.

> Plus I feel a lot more comfortable exporting a set of generic
> tracepoints, than those where we may be leaking more knowledge of the HW
> than we can reasonably expect to support for the indefinite future.

I think it is accepted we cannot guarantee low level tracepoints will be 
supportable in the future world of GuC scheduling. (How and what we will 
do there is yet unresolved.) But at least we get much better usability 
for platforms up to there, and for very small effort. The idea is not to 
mark these as ABI but just improve user experience.

You are I suppose worried that if these tracepoints disappeared due 
being un-implementable someone will complain?

I just want that anyone can run trace.pl and see how virtual engine 
behaves, without having to recompile the kernel. And VTune people want 
the same for their enterprise-level customers. Both tools are ready to 
adapt should it be required. Its I repeat just usability and user 
experience out of the box.

> 
>>>> And with the Virtual Engine it will become more interesting to have
>>>> this. So if we had a bug report saying load balancing is not working
>>>> well, we could just say "please run it via trace.pl --trace and attach
>>>> perf script output". That way we could easily see whether or not is is a
>>>> problem in userspace behaviour or else.
>>>
>>> And there I was wanting a script to capture the workload so that we
>>> could replay it and dissect it. :-p
>>
>> Depends on what level you want that. Perf script output from the above
>> tracepoints would do on one level. If you wanted a higher level to
>> re-exercise load balancing then it wouldn't completely be enough, or at
>> least a lot of guesswork would be needed.
> 
> It all depends on what level you want to optimise, is the way I look at
> it. Userspace driver, you capture the client->driver userspace API (e.g.
> cairo-trace, apitrace). But for optimising scheduling layout, we just
> need a workload descriptor like wsim -- with perhaps the only tweak
> being able to define latency/throughput metrics relevant to that
> workload, and being able to integrate with a pseudo display server. The
> challenge as I see it is being able to convince the user that it is a
> useful diagnosis step and being able to generate a reasonable wsim
> automatically.

To derive wsim's from apitraces sounds much more challenging but also I 
think is orthogonal. Tracing could be always there on the low level 
whether the client is real or simulated.

Regards,

Tvrtko
Chris Wilson Aug. 8, 2018, 12:42 p.m. UTC | #7
Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> 
> On 26/06/2018 12:48, Chris Wilson wrote:
> > It's just that this about the third time this has been raised in the
> > last couple of weeks with the other two requests being from a generic
> > tooling pov (Eric Anholt for gnome-shell tweaking, and some one
> > else looking for a gpuvis-like tool). So it seems like there is
> > interest, even if I doubt that it'll help answer any questions beyond
> > what you can just extract from looking at userspace. (Imo, the only
> > people these tracepoints are useful for are people writing patches for
> > the driver. For everyone else, you can just observe system behaviour and
> > optimise your code for your workload. Otoh, can one trust a black
> > box, argh.)
> 
> Some of the things might be obtainable purely from userspace via heavily 
> instrumented builds, which may be in the realm of possible for during 
> development, but I don't think it is feasible in general both because it 
> is too involved, and because it would preclude existence of tools which 
> can trace any random client.
> 
> > To have a second set of nearly equivalent tracepoints, we need to have
> > strong justification why we couldn't just use or extend the generic set.
> 
> I was hoping that the conversation so far established that nearly 
> equivalent is not close enough for intended use cases. And that is not 
> possible to make the generic ones so.

(I just don't see the point of those use cases. I trace the kernel to
fix the kernel...)
 
> > Plus I feel a lot more comfortable exporting a set of generic
> > tracepoints, than those where we may be leaking more knowledge of the HW
> > than we can reasonably expect to support for the indefinite future.
> 
> I think it is accepted we cannot guarantee low level tracepoints will be 
> supportable in the future world of GuC scheduling. (How and what we will 
> do there is yet unresolved.) But at least we get much better usability 
> for platforms up to there, and for very small effort. The idea is not to 
> mark these as ABI but just improve user experience.
> 
> You are I suppose worried that if these tracepoints disappeared due 
> being un-implementable someone will complain?

They already do...
 
> I just want that anyone can run trace.pl and see how virtual engine 
> behaves, without having to recompile the kernel. And VTune people want 
> the same for their enterprise-level customers. Both tools are ready to 
> adapt should it be required. Its I repeat just usability and user 
> experience out of the box.

The out-of-the-box user experience should not require the use of such
tools in the first place! If they are trying to work around the kernel
(and that's the only use of this information I see) we have bugs a
plenty.

[snip because I repeated myself]

I think my issues boil down to:

 1 - people will complain no matter what (when it changes, when it is no
     longer available)

 2 - people will use it to workaround not fix; the information about kernel
     behaviour should only be used with a view to fixing that behaviour

As such, I am quite happy to have it limited to driver developers that
want to fix issues at source (OpenCL, I'm looking at you). There's tons
of other user observable information out there for tuning userspace,
why does the latency of runnable->queued matter if you will not do anything
about it? Other things like dependency graphs, if you can't keep control
of your own fences, you've already lost.

I don't see any value in giving the information away, just the cost. If
you can convince Joonas of its merit, and if we can define just exactly
what ABI it constitutes, then I'd be happy to be the one who says "I
told you so" in the future for a change.
-Chris
Tvrtko Ursulin Aug. 8, 2018, 12:56 p.m. UTC | #8
+Joonas

On 08/08/2018 13:42, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
>>
>> On 26/06/2018 12:48, Chris Wilson wrote:
>>> It's just that this about the third time this has been raised in the
>>> last couple of weeks with the other two requests being from a generic
>>> tooling pov (Eric Anholt for gnome-shell tweaking, and some one
>>> else looking for a gpuvis-like tool). So it seems like there is
>>> interest, even if I doubt that it'll help answer any questions beyond
>>> what you can just extract from looking at userspace. (Imo, the only
>>> people these tracepoints are useful for are people writing patches for
>>> the driver. For everyone else, you can just observe system behaviour and
>>> optimise your code for your workload. Otoh, can one trust a black
>>> box, argh.)
>>
>> Some of the things might be obtainable purely from userspace via heavily
>> instrumented builds, which may be in the realm of possible for during
>> development, but I don't think it is feasible in general both because it
>> is too involved, and because it would preclude existence of tools which
>> can trace any random client.
>>
>>> To have a second set of nearly equivalent tracepoints, we need to have
>>> strong justification why we couldn't just use or extend the generic set.
>>
>> I was hoping that the conversation so far established that nearly
>> equivalent is not close enough for intended use cases. And that is not
>> possible to make the generic ones so.
> 
> (I just don't see the point of those use cases. I trace the kernel to
> fix the kernel...)

Yes and with virtual engine we will have a bigger reason to trace the 
kernel with a random client.

>   
>>> Plus I feel a lot more comfortable exporting a set of generic
>>> tracepoints, than those where we may be leaking more knowledge of the HW
>>> than we can reasonably expect to support for the indefinite future.
>>
>> I think it is accepted we cannot guarantee low level tracepoints will be
>> supportable in the future world of GuC scheduling. (How and what we will
>> do there is yet unresolved.) But at least we get much better usability
>> for platforms up to there, and for very small effort. The idea is not to
>> mark these as ABI but just improve user experience.
>>
>> You are I suppose worried that if these tracepoints disappeared due
>> being un-implementable someone will complain?
> 
> They already do...
>   
>> I just want that anyone can run trace.pl and see how virtual engine
>> behaves, without having to recompile the kernel. And VTune people want
>> the same for their enterprise-level customers. Both tools are ready to
>> adapt should it be required. Its I repeat just usability and user
>> experience out of the box.
> 
> The out-of-the-box user experience should not require the use of such
> tools in the first place! If they are trying to work around the kernel
> (and that's the only use of this information I see) we have bugs a
> plenty.
> 
> [snip because I repeated myself]
> 
> I think my issues boil down to:
> 
>   1 - people will complain no matter what (when it changes, when it is no
>       longer available)
> 
>   2 - people will use it to workaround not fix; the information about kernel
>       behaviour should only be used with a view to fixing that behaviour
> 
> As such, I am quite happy to have it limited to driver developers that
> want to fix issues at source (OpenCL, I'm looking at you). There's tons
> of other user observable information out there for tuning userspace,
> why does the latency of runnable->queued matter if you will not do anything
> about it? Other things like dependency graphs, if you can't keep control
> of your own fences, you've already lost.

This is true, no disagreement. My point simply was that we can provide 
this info easily to anyone. There is a little bit of analogy with perf 
scheduler tracing/map etc.

> I don't see any value in giving the information away, just the cost. If
> you can convince Joonas of its merit, and if we can define just exactly
> what ABI it constitutes, then I'd be happy to be the one who says "I
> told you so" in the future for a change.

I think Joonas was okay in principle that we soft-commit to _trying_ to 
keep _some_ tracepoint stable-ish (where it makes sense and after some 
discussion for each) if IGT also materializes which auto-pings us (via 
CI) when we break one of them. But I may be misremembering so Joonas 
please comment.

Regards,

Tvrtko
Joonas Lahtinen Aug. 13, 2018, 9:54 a.m. UTC | #9
Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> On 08/08/2018 13:42, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> This is true, no disagreement. My point simply was that we can provide 
> this info easily to anyone. There is a little bit of analogy with perf 
> scheduler tracing/map etc.
> 
> > I don't see any value in giving the information away, just the cost. If
> > you can convince Joonas of its merit, and if we can define just exactly
> > what ABI it constitutes, then I'd be happy to be the one who says "I
> > told you so" in the future for a change.
> 
> I think Joonas was okay in principle that we soft-commit to _trying_ to 
> keep _some_ tracepoint stable-ish (where it makes sense and after some 
> discussion for each) if IGT also materializes which auto-pings us (via 
> CI) when we break one of them. But I may be misremembering so Joonas 
> please comment.

Currently gpuvis, using these, seems to be only packaged in one AUR repo,
and they do make a not in the wiki how you need to configure kernel for
debugging. And there's been no apparent demand for them to have it in
stock kernel.

And even when we do get demand for having gpuvis or another tool working
from vanilla kernel, tracepoints being a rather tricky subject, I would
start the discussion by going through alternative means of providing the
information the tool needs and considering those.

So lets still keep this option as it was introduced. The whole
"tracepoints as stable uAPI" idea is a can of worms which is only dug
into when other options are exhausted.

Regards, Joonas
Kukanova, Svetlana Aug. 13, 2018, 1:44 p.m. UTC | #10
Joonas, sorry for interfering; could you please explain more regarding the options for tracing scheduling events better than tracepoints?
After scheduling moves to GuC tools will have to switch to something like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
I know gpuvis is not the only attempt to use tracepoints for the same purpose. (there're trace.pl and S.E.A. and of course VTune though it probably is not considered to be existing as it's not open source). 
And assuming this movement towards GuC is it not too late to invent a completely new way to provide tools with scheduling info from kmd? 
Could we just improve the existing way and let it live its last years\months? 

gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too.
Thus it appears that intel gfx on linux is the most closed platform, not bothering of observability (or even bothering about how to forbid observability).

Not long ago the MediaSDK team diagnosed a problem with their workloads looking at VTune timelines - seeing the difference between the time request came to kmd and time it went runnable & comparing the queues on 2 engines they understood that their requests have dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed.

I can add Dmitry Rogozhkin to discussion if the usefulness of scheduling timeline in tools is questionable, as far as I remember this wasn't the only use case they had, I'm sure he can add more.

Thank you,
Svetlana

-----Original Message-----
From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Joonas Lahtinen
Sent: Monday, August 13, 2018 12:55 PM
To: Chris Wilson <chris@chris-wilson.co.uk>; Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option

Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> On 08/08/2018 13:42, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> This is true, no disagreement. My point simply was that we can provide 
> this info easily to anyone. There is a little bit of analogy with perf 
> scheduler tracing/map etc.
> 
> > I don't see any value in giving the information away, just the cost. 
> > If you can convince Joonas of its merit, and if we can define just 
> > exactly what ABI it constitutes, then I'd be happy to be the one who 
> > says "I told you so" in the future for a change.
> 
> I think Joonas was okay in principle that we soft-commit to _trying_ 
> to keep _some_ tracepoint stable-ish (where it makes sense and after 
> some discussion for each) if IGT also materializes which auto-pings us 
> (via
> CI) when we break one of them. But I may be misremembering so Joonas 
> please comment.

Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel.

And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those.

So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted.

Regards, Joonas
Joonas Lahtinen Aug. 21, 2018, 12:06 p.m. UTC | #11
Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
> Joonas, sorry for interfering; could you please explain more regarding the
> options for tracing scheduling events better than tracepoints?
> After scheduling moves to GuC tools will have to switch to something like
> GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
> I know gpuvis is not the only attempt to use tracepoints for the same purpose.
> (there're trace.pl and S.E.A. and of course VTune though it probably is not
> considered to be existing as it's not open source). 
> And assuming this movement towards GuC is it not too late to invent a
> completely new way to provide tools with scheduling info from kmd? 
> Could we just improve the existing way and let it live its last years\months? 

Hi,

You actually mentioned the prime reason why we should not go and
hastily make tracepoints a stable uAPI with regards to scheduling
information.

The scheduler's nature will be evolving when some of the scheduling
decisions are moved to GuC and the way how we get the information
will be changing at that point, so tracepoints will indeed be a
very bad mechanism for providing the information.

The kernel scheduler is definitely not going anywhere with the
introduction of more hardware scheduling capabilities, so it is a
misconception to think that the interface would need to be completely
different for when GuC is enabled.

> 
> gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW execution;
> it cosplays Microsoft GPUView which works out-of-the-box on Windows too.
> Thus it appears that intel gfx on linux is the most closed platform, not
> bothering of observability (or even bothering about how to forbid observability).

gpuvis is a developer tool. The tracepoints behind this configure
switch are way more low-level than what the gpuvis seems to support
for AMDGPU *at all*. They seem to stick to IOCTL level. So from what
I see, we should be on-par with the competition even without any special
kernel configuration. So lets not get things mixed up.

And I remind, the tool is not shipping anywhere really (except the AUR),
but just built from source by developers in need, and they seem to be just
fine with re-compiling the kernel (as there have been no requests).

Once there is an actual request to have some metrics from vanilla
kernels through some end-user tools (not a developer tool, like here),
I'll be glad to discuss about how to provide the information the best
for them in a stable manner.

> Not long ago the MediaSDK team diagnosed a problem with their workloads
> looking at VTune timelines - seeing the difference between the time request
> came to kmd and time it went runnable & comparing the queues on 2 engines they
> understood that their requests have dependencies that were definitely
> unexpected. MediaSDK reported the problem to driver people and it was fixed.
> 
> I can add Dmitry Rogozhkin to discussion if the usefulness of scheduling
> timeline in tools is questionable, as far as I remember this wasn't the only
> use case they had, I'm sure he can add more.

I'm well aware of the use cases. And Dmitry is well aware of the need
for an Open Source consumer for any requested stable uAPIs. And we don't
currently have that, so there's no disconnect on information.

There's just no Open Source tool to first design and then validate the
interfaces against. There's just the debugging tool which happens to
work currently, without any guarantees that next kernel version would
not cause a substantial rework of the interfacing code.

The interface discussion would probably start from a DRM subsystem
level, so that the tool would have an equivalent level of base
experience from all drivers.

Regards, Joonas

> 
> Thank you,
> Svetlana
> 
> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Joonas Lahtinen
> Sent: Monday, August 13, 2018 12:55 PM
> To: Chris Wilson <chris@chris-wilson.co.uk>; Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> 
> Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> > On 08/08/2018 13:42, Chris Wilson wrote:
> > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> > This is true, no disagreement. My point simply was that we can provide 
> > this info easily to anyone. There is a little bit of analogy with perf 
> > scheduler tracing/map etc.
> > 
> > > I don't see any value in giving the information away, just the cost. 
> > > If you can convince Joonas of its merit, and if we can define just 
> > > exactly what ABI it constitutes, then I'd be happy to be the one who 
> > > says "I told you so" in the future for a change.
> > 
> > I think Joonas was okay in principle that we soft-commit to _trying_ 
> > to keep _some_ tracepoint stable-ish (where it makes sense and after 
> > some discussion for each) if IGT also materializes which auto-pings us 
> > (via
> > CI) when we break one of them. But I may be misremembering so Joonas 
> > please comment.
> 
> Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel.
> 
> And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those.
> 
> So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted.
> 
> Regards, Joonas
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin Aug. 22, 2018, 12:49 p.m. UTC | #12
On 21/08/2018 13:06, Joonas Lahtinen wrote:
> Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
>> Joonas, sorry for interfering; could you please explain more regarding the
>> options for tracing scheduling events better than tracepoints?
>> After scheduling moves to GuC tools will have to switch to something like
>> GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
>> I know gpuvis is not the only attempt to use tracepoints for the same purpose.
>> (there're trace.pl and S.E.A. and of course VTune though it probably is not
>> considered to be existing as it's not open source).
>> And assuming this movement towards GuC is it not too late to invent a
>> completely new way to provide tools with scheduling info from kmd?
>> Could we just improve the existing way and let it live its last years\months?
> 
> Hi,
> 
> You actually mentioned the prime reason why we should not go and
> hastily make tracepoints a stable uAPI with regards to scheduling
> information.
> 
> The scheduler's nature will be evolving when some of the scheduling
> decisions are moved to GuC and the way how we get the information
> will be changing at that point, so tracepoints will indeed be a
> very bad mechanism for providing the information.
> 
> The kernel scheduler is definitely not going anywhere with the
> introduction of more hardware scheduling capabilities, so it is a
> misconception to think that the interface would need to be completely
> different for when GuC is enabled.

On the last paragraph - even with the today's GuC i915 already loses 
visibility of CSB interrupts. So there is already a big difference in 
semantics of what request_in and request_out tracepoints mean. Put 
preemption into the picture and we just don't know any more when 
something started executing on the GPU, when it got preempted, 
re-submitted etc. So I think it is fair to say that moving more of 
scheduling into the GuC creates a problem for tools which want to 
represent request execution timelines.

Regards,

Tvrtko
Tvrtko Ursulin Aug. 22, 2018, 1:02 p.m. UTC | #13
On 22/08/2018 13:49, Tvrtko Ursulin wrote:
> 
> On 21/08/2018 13:06, Joonas Lahtinen wrote:
>> Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
>>> Joonas, sorry for interfering; could you please explain more 
>>> regarding the
>>> options for tracing scheduling events better than tracepoints?
>>> After scheduling moves to GuC tools will have to switch to something 
>>> like
>>> GuC-logging; but while kmd does scheduling isn't kernel-tracing the 
>>> best solution?
>>> I know gpuvis is not the only attempt to use tracepoints for the same 
>>> purpose.
>>> (there're trace.pl and S.E.A. and of course VTune though it probably 
>>> is not
>>> considered to be existing as it's not open source).
>>> And assuming this movement towards GuC is it not too late to invent a
>>> completely new way to provide tools with scheduling info from kmd?
>>> Could we just improve the existing way and let it live its last 
>>> years\months?
>>
>> Hi,
>>
>> You actually mentioned the prime reason why we should not go and
>> hastily make tracepoints a stable uAPI with regards to scheduling
>> information.
>>
>> The scheduler's nature will be evolving when some of the scheduling
>> decisions are moved to GuC and the way how we get the information
>> will be changing at that point, so tracepoints will indeed be a
>> very bad mechanism for providing the information.
>>
>> The kernel scheduler is definitely not going anywhere with the
>> introduction of more hardware scheduling capabilities, so it is a
>> misconception to think that the interface would need to be completely
>> different for when GuC is enabled.
> 
> On the last paragraph - even with the today's GuC i915 already loses 
> visibility of CSB interrupts. So there is already a big difference in 
> semantics of what request_in and request_out tracepoints mean. Put 
> preemption into the picture and we just don't know any more when 
> something started executing on the GPU, when it got preempted, 
> re-submitted etc. So I think it is fair to say that moving more of 
> scheduling into the GuC creates a problem for tools which want to 
> represent request execution timelines.

P.S. To clarify - which is exactly why we marked those tracpoints as low 
level and why it is problematic to rely on them.

Regards,

Tvrtko
Joonas Lahtinen Aug. 22, 2018, 1:12 p.m. UTC | #14
Quoting Tvrtko Ursulin (2018-08-22 15:49:52)
> 
> On 21/08/2018 13:06, Joonas Lahtinen wrote:
> > Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
> >> Joonas, sorry for interfering; could you please explain more regarding the
> >> options for tracing scheduling events better than tracepoints?
> >> After scheduling moves to GuC tools will have to switch to something like
> >> GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
> >> I know gpuvis is not the only attempt to use tracepoints for the same purpose.
> >> (there're trace.pl and S.E.A. and of course VTune though it probably is not
> >> considered to be existing as it's not open source).
> >> And assuming this movement towards GuC is it not too late to invent a
> >> completely new way to provide tools with scheduling info from kmd?
> >> Could we just improve the existing way and let it live its last years\months?
> > 
> > Hi,
> > 
> > You actually mentioned the prime reason why we should not go and
> > hastily make tracepoints a stable uAPI with regards to scheduling
> > information.
> > 
> > The scheduler's nature will be evolving when some of the scheduling
> > decisions are moved to GuC and the way how we get the information
> > will be changing at that point, so tracepoints will indeed be a
> > very bad mechanism for providing the information.
> > 
> > The kernel scheduler is definitely not going anywhere with the
> > introduction of more hardware scheduling capabilities, so it is a
> > misconception to think that the interface would need to be completely
> > different for when GuC is enabled.

To clarify, I meant to underline that there is not going to be a steep
switching point where a transition from interface A to B, which Svetlana
referred to, would happen naturally.

The introduced interface will have to provide the information for years
and kernel versions to come, and we already have a some data that
tracepoints may not be the format of choice due to GuC.

> On the last paragraph - even with the today's GuC i915 already loses 
> visibility of CSB interrupts. So there is already a big difference in 
> semantics of what request_in and request_out tracepoints mean. Put 
> preemption into the picture and we just don't know any more when 
> something started executing on the GPU, when it got preempted, 
> re-submitted etc. So I think it is fair to say that moving more of 
> scheduling into the GuC creates a problem for tools which want to 
> represent request execution timelines.

Yes, for tools that depend on the tracepoints. That's why it is most
likely best to introduce the information in some other form, but I am
starting to sound like a broken record already :)

Regards, Joonas

> 
> Regards,
> 
> Tvrtko
Kukanova, Svetlana Aug. 27, 2018, 1:37 p.m. UTC | #15
> Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner.

Sorry for my ignorance, but looks like I don't understand what developer vs. end-user means here.
With regard to GPU profiling VTune's end-user is somebody who develops gfx or media applications basing on MediaSDK, OpenCL, C for Media, etc.
Or, more often it's an intel application engineer working with those people's code. 
AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data he\she decides that the problem is on the deeper level of the gfx stack, not in the customer's code.
Then Dmitry's team would be experimenting with VTune and deciding if the problem is in their code or it's deeper in i915.
Don't think that i915 people use VTune (sadly:)) so here the chain is broken. Otherwise they could e.g. blame HW based on the same data.
I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an "end-user" and who's a "developer"? 
Or is a "developer" a kernel developer only? 
And e.g. Dmitry is an end-user and thus he is not supposed to use tools like gpuvis or VTune?
Looks like all the chain before i915 is annoyed by the kernel-rebuilding requirement.

>The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers.

That sounds like a solution from an ideal world. I mean if DRM had a uAPI for scheduling observability and all the drivers had to implement this. And the drivers would require info from HW like GuC pointing to the necessity of uAPI support... 
Would be just great for all the tools (, developers and end-users). 
But I have no idea what kind of impulse should it be to bring this to reality. And if all the energy available to human kind at the given evolution point would be enough to at least start this. 
Or am I just too pessimistic? Are there some simple defined steps to be done to make it? Can we build a realistic plan?

E.g. is this the first step? -
> There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code.

How does it usually work, I mean you can't have a widely shipped open-source consumer already using a non-existent feature that is to be requested? 
And I can't imagine what kind of existing tool should it be to decide suddenly that it needs to add GPU scheduling tracing to the list of its features.
If you want to have a new tool for GPU scheduling timeline - and it sounds like a sane idea, looks like we agree on the use cases etc. - how can you make it open source first and then get the API to be based on from i915?
Or am I just missing the point completely?
If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) - would Dmitry be eligible to request observability features for tools?

Thank you,
Svetlana

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
Sent: Tuesday, August 21, 2018 3:07 PM
To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option

Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
> Joonas, sorry for interfering; could you please explain more regarding 
> the options for tracing scheduling events better than tracepoints?
> After scheduling moves to GuC tools will have to switch to something 
> like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
> I know gpuvis is not the only attempt to use tracepoints for the same purpose.
> (there're trace.pl and S.E.A. and of course VTune though it probably 
> is not considered to be existing as it's not open source).
> And assuming this movement towards GuC is it not too late to invent a 
> completely new way to provide tools with scheduling info from kmd?
> Could we just improve the existing way and let it live its last years\months? 

Hi,

You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information.

The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information.

The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled.

> 
> gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW 
> execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too.
> Thus it appears that intel gfx on linux is the most closed platform, 
> not bothering of observability (or even bothering about how to forbid observability).

gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up.

And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests).

Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner.

> Not long ago the MediaSDK team diagnosed a problem with their 
> workloads looking at VTune timelines - seeing the difference between 
> the time request came to kmd and time it went runnable & comparing the 
> queues on 2 engines they understood that their requests have 
> dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed.
> 
> I can add Dmitry Rogozhkin to discussion if the usefulness of 
> scheduling timeline in tools is questionable, as far as I remember 
> this wasn't the only use case they had, I'm sure he can add more.

I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information.

There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code.

The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers.

Regards, Joonas

> 
> Thank you,
> Svetlana
> 
> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On 
> Behalf Of Joonas Lahtinen
> Sent: Monday, August 13, 2018 12:55 PM
> To: Chris Wilson <chris@chris-wilson.co.uk>; 
> Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin 
> <tursulin@ursulin.net>; Tvrtko Ursulin 
> <tvrtko.ursulin@linux.intel.com>
> Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove 
> DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> 
> Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> > On 08/08/2018 13:42, Chris Wilson wrote:
> > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> > This is true, no disagreement. My point simply was that we can 
> > provide this info easily to anyone. There is a little bit of analogy 
> > with perf scheduler tracing/map etc.
> > 
> > > I don't see any value in giving the information away, just the cost. 
> > > If you can convince Joonas of its merit, and if we can define just 
> > > exactly what ABI it constitutes, then I'd be happy to be the one 
> > > who says "I told you so" in the future for a change.
> > 
> > I think Joonas was okay in principle that we soft-commit to _trying_ 
> > to keep _some_ tracepoint stable-ish (where it makes sense and after 
> > some discussion for each) if IGT also materializes which auto-pings 
> > us (via
> > CI) when we break one of them. But I may be misremembering so Joonas 
> > please comment.
> 
> Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel.
> 
> And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those.
> 
> So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted.
> 
> Regards, Joonas
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Joonas Lahtinen Aug. 29, 2018, 2:51 p.m. UTC | #16
Quoting Kukanova, Svetlana (2018-08-27 16:37:14)
> > Once there is an actual request to have some metrics from vanilla kernels
> > through some end-user tools (not a developer tool, like here), I'll be glad
> > to discuss about how to provide the information the best for them in a
> > stable manner.
> 
> Sorry for my ignorance, but looks like I don't understand what developer vs.
> end-user means here.
> With regard to GPU profiling VTune's end-user is somebody who develops gfx or
> media applications basing on MediaSDK, OpenCL, C for Media, etc.
> Or, more often it's an intel application engineer working with those people's
> code.
> AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data
> he\she decides that the problem is on the deeper level of the gfx stack, not
> in the customer's code.
> Then Dmitry's team would be experimenting with VTune and deciding if the
> problem is in their code or it's deeper in i915.
> Don't think that i915 people use VTune (sadly:)) so here the chain is broken.
> Otherwise they could e.g. blame HW based on the same data.
> I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an
> "end-user" and who's a "developer"?
> Or is a "developer" a kernel developer only?
> And e.g. Dmitry is an end-user and thus he is not supposed to use tools like
> gpuvis or VTune?
> Looks like all the chain before i915 is annoyed by the kernel-rebuilding
> requirement.

With end-user tool I'm referring to something that would have interest
in being packaged and shipped by a distro.

gpuvis team seems to be doing fine with the application being built from source
and being run against a specially configured kernel for their purposes. I would
assume there to be some queries about a enabling the tracepoints by default if
there was demand. At the same time I would assume them to try to get the
application packaged and into distros.

And then we would commence discussing how to provide the information in a stable
manner (most likely outside tracepoints). So far I'm not seeing such queries
from gpuvis direction.

> > The interface discussion would probably start from a DRM subsystem level, so
> > that the tool would have an equivalent level of base experience from all
> > drivers.
>
> That sounds like a solution from an ideal world. I mean if DRM had a uAPI for
> scheduling observability and all the drivers had to implement this. And the
> drivers would require info from HW like GuC pointing to the necessity of uAPI
> support...
> Would be just great for all the tools (, developers and end-users).
> But I have no idea what kind of impulse should it be to bring this to reality.
> And if all the energy available to human kind at the given evolution point
> would be enough to at least start this. 
> Or am I just too pessimistic? Are there some simple defined steps to be done
> to make it? Can we build a realistic plan?

Step is "1. Have the tool" :) There seem to be three options: 1) open sourcing
VTune 2) contributing to gpuvis project to drive the project into the above
mentioned direction. 3) writing a new project from scratch (not encouraged,
unless you have something differentiating to bring to the table).

Unless somebody actively drives the feature to some Open Source userspace
consumer, there won't be an interface for the information from kernel.
Demand from an Open Source application is a hard requirement for kickstarting
the interface discussion.

> E.g. is this the first step? -
> > There's just no Open Source tool to first design and then validate the
> > interfaces against. There's just the debugging tool which happens to work
> > currently, without any guarantees that next kernel version would not cause a
> > substantial rework of the interfacing code.
> 
> How does it usually work, I mean you can't have a widely shipped open-source
> consumer already using a non-existent feature that is to be requested? 
> And I can't imagine what kind of existing tool should it be to decide suddenly
> that it needs to add GPU scheduling tracing to the list of its features.
> If you want to have a new tool for GPU scheduling timeline - and it sounds
> like a sane idea, looks like we agree on the use cases etc. - how can you make
> it open source first and then get the API to be based on from i915?

The order is that you develop the tool and the required kernel changes
in parallel in topic branches, to demonstrate the usefulness of the tool
and suitability of the kernel interface. Then after all the patches are
reviewed (kernel + tool), kernel side is merged first, and then the tool
can start working from next kernel release.

This has been attempted to be described in the following
documentation chapter:

https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

> Or am I just missing the point completely?
> If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) -
> would Dmitry be eligible to request observability features for tools?

MediaSDK should not have anything to do with this, unless it will directly
consume the kernel interface in discussion.

The need is for some application/library/whatever userspace component to
demonstrate the suitability of the kernel interface and act as a counterpart
for the kernel interface that can be tested and debugged for changes.
This too, is explained in more detail in the above linked documentation
chapter.

Regards, Joonas

> 
> Thank you,
> Svetlana
> 
> -----Original Message-----
> From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
> Sent: Tuesday, August 21, 2018 3:07 PM
> To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> 
> Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
> > Joonas, sorry for interfering; could you please explain more regarding 
> > the options for tracing scheduling events better than tracepoints?
> > After scheduling moves to GuC tools will have to switch to something 
> > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
> > I know gpuvis is not the only attempt to use tracepoints for the same purpose.
> > (there're trace.pl and S.E.A. and of course VTune though it probably 
> > is not considered to be existing as it's not open source).
> > And assuming this movement towards GuC is it not too late to invent a 
> > completely new way to provide tools with scheduling info from kmd?
> > Could we just improve the existing way and let it live its last years\months? 
> 
> Hi,
> 
> You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information.
> 
> The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information.
> 
> The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled.
> 
> > 
> > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW 
> > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too.
> > Thus it appears that intel gfx on linux is the most closed platform, 
> > not bothering of observability (or even bothering about how to forbid observability).
> 
> gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up.
> 
> And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests).
> 
> Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner.
> 
> > Not long ago the MediaSDK team diagnosed a problem with their 
> > workloads looking at VTune timelines - seeing the difference between 
> > the time request came to kmd and time it went runnable & comparing the 
> > queues on 2 engines they understood that their requests have 
> > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed.
> > 
> > I can add Dmitry Rogozhkin to discussion if the usefulness of 
> > scheduling timeline in tools is questionable, as far as I remember 
> > this wasn't the only use case they had, I'm sure he can add more.
> 
> I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information.
> 
> There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code.
> 
> The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers.
> 
> Regards, Joonas
> 
> > 
> > Thank you,
> > Svetlana
> > 
> > -----Original Message-----
> > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On 
> > Behalf Of Joonas Lahtinen
> > Sent: Monday, August 13, 2018 12:55 PM
> > To: Chris Wilson <chris@chris-wilson.co.uk>; 
> > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin 
> > <tursulin@ursulin.net>; Tvrtko Ursulin 
> > <tvrtko.ursulin@linux.intel.com>
> > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove 
> > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> > 
> > Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> > > On 08/08/2018 13:42, Chris Wilson wrote:
> > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> > > This is true, no disagreement. My point simply was that we can 
> > > provide this info easily to anyone. There is a little bit of analogy 
> > > with perf scheduler tracing/map etc.
> > > 
> > > > I don't see any value in giving the information away, just the cost. 
> > > > If you can convince Joonas of its merit, and if we can define just 
> > > > exactly what ABI it constitutes, then I'd be happy to be the one 
> > > > who says "I told you so" in the future for a change.
> > > 
> > > I think Joonas was okay in principle that we soft-commit to _trying_ 
> > > to keep _some_ tracepoint stable-ish (where it makes sense and after 
> > > some discussion for each) if IGT also materializes which auto-pings 
> > > us (via
> > > CI) when we break one of them. But I may be misremembering so Joonas 
> > > please comment.
> > 
> > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel.
> > 
> > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those.
> > 
> > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted.
> > 
> > Regards, Joonas
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Kukanova, Svetlana Sept. 3, 2018, 12:22 p.m. UTC | #17
OK, so no end-user queries matter, just the queries from the tools for end-users do, right?

And with making the open-source tool (shipped by distros, etc.) suitable for negotiations we need to hurry while at least the trace-point mechanism is not yet completely broken and can be used to show usefulness and to have at least something that can be taken to distro?

If the new tool and kernel changes required by it are developed in parallel - you don't have that "shipped by a distro" condition, BTW, right? Or in case of parallel discussion you're deciding if the suggested tool  has rights to exist?

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
Sent: Wednesday, August 29, 2018 5:52 PM
To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option

Quoting Kukanova, Svetlana (2018-08-27 16:37:14)
> > Once there is an actual request to have some metrics from vanilla kernels
> > through some end-user tools (not a developer tool, like here), I'll be glad
> > to discuss about how to provide the information the best for them in a
> > stable manner.
> 
> Sorry for my ignorance, but looks like I don't understand what developer vs.
> end-user means here.
> With regard to GPU profiling VTune's end-user is somebody who develops gfx or
> media applications basing on MediaSDK, OpenCL, C for Media, etc.
> Or, more often it's an intel application engineer working with those people's
> code.
> AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data
> he\she decides that the problem is on the deeper level of the gfx stack, not
> in the customer's code.
> Then Dmitry's team would be experimenting with VTune and deciding if the
> problem is in their code or it's deeper in i915.
> Don't think that i915 people use VTune (sadly:)) so here the chain is broken.
> Otherwise they could e.g. blame HW based on the same data.
> I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an
> "end-user" and who's a "developer"?
> Or is a "developer" a kernel developer only?
> And e.g. Dmitry is an end-user and thus he is not supposed to use tools like
> gpuvis or VTune?
> Looks like all the chain before i915 is annoyed by the kernel-rebuilding
> requirement.

With end-user tool I'm referring to something that would have interest
in being packaged and shipped by a distro.

gpuvis team seems to be doing fine with the application being built from source
and being run against a specially configured kernel for their purposes. I would
assume there to be some queries about a enabling the tracepoints by default if
there was demand. At the same time I would assume them to try to get the
application packaged and into distros.

And then we would commence discussing how to provide the information in a stable
manner (most likely outside tracepoints). So far I'm not seeing such queries
from gpuvis direction.

> > The interface discussion would probably start from a DRM subsystem level, so
> > that the tool would have an equivalent level of base experience from all
> > drivers.
>
> That sounds like a solution from an ideal world. I mean if DRM had a uAPI for
> scheduling observability and all the drivers had to implement this. And the
> drivers would require info from HW like GuC pointing to the necessity of uAPI
> support...
> Would be just great for all the tools (, developers and end-users).
> But I have no idea what kind of impulse should it be to bring this to reality.
> And if all the energy available to human kind at the given evolution point
> would be enough to at least start this. 
> Or am I just too pessimistic? Are there some simple defined steps to be done
> to make it? Can we build a realistic plan?

Step is "1. Have the tool" :) There seem to be three options: 1) open sourcing
VTune 2) contributing to gpuvis project to drive the project into the above
mentioned direction. 3) writing a new project from scratch (not encouraged,
unless you have something differentiating to bring to the table).

Unless somebody actively drives the feature to some Open Source userspace
consumer, there won't be an interface for the information from kernel.
Demand from an Open Source application is a hard requirement for kickstarting
the interface discussion.

> E.g. is this the first step? -
> > There's just no Open Source tool to first design and then validate the
> > interfaces against. There's just the debugging tool which happens to work
> > currently, without any guarantees that next kernel version would not cause a
> > substantial rework of the interfacing code.
> 
> How does it usually work, I mean you can't have a widely shipped open-source
> consumer already using a non-existent feature that is to be requested? 
> And I can't imagine what kind of existing tool should it be to decide suddenly
> that it needs to add GPU scheduling tracing to the list of its features.
> If you want to have a new tool for GPU scheduling timeline - and it sounds
> like a sane idea, looks like we agree on the use cases etc. - how can you make
> it open source first and then get the API to be based on from i915?

The order is that you develop the tool and the required kernel changes
in parallel in topic branches, to demonstrate the usefulness of the tool
and suitability of the kernel interface. Then after all the patches are
reviewed (kernel + tool), kernel side is merged first, and then the tool
can start working from next kernel release.

This has been attempted to be described in the following
documentation chapter:

https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

> Or am I just missing the point completely?
> If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) -
> would Dmitry be eligible to request observability features for tools?

MediaSDK should not have anything to do with this, unless it will directly
consume the kernel interface in discussion.

The need is for some application/library/whatever userspace component to
demonstrate the suitability of the kernel interface and act as a counterpart
for the kernel interface that can be tested and debugged for changes.
This too, is explained in more detail in the above linked documentation
chapter.

Regards, Joonas

> 
> Thank you,
> Svetlana
> 
> -----Original Message-----
> From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
> Sent: Tuesday, August 21, 2018 3:07 PM
> To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> 
> Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
> > Joonas, sorry for interfering; could you please explain more regarding 
> > the options for tracing scheduling events better than tracepoints?
> > After scheduling moves to GuC tools will have to switch to something 
> > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
> > I know gpuvis is not the only attempt to use tracepoints for the same purpose.
> > (there're trace.pl and S.E.A. and of course VTune though it probably 
> > is not considered to be existing as it's not open source).
> > And assuming this movement towards GuC is it not too late to invent a 
> > completely new way to provide tools with scheduling info from kmd?
> > Could we just improve the existing way and let it live its last years\months? 
> 
> Hi,
> 
> You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information.
> 
> The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information.
> 
> The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled.
> 
> > 
> > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW 
> > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too.
> > Thus it appears that intel gfx on linux is the most closed platform, 
> > not bothering of observability (or even bothering about how to forbid observability).
> 
> gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up.
> 
> And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests).
> 
> Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner.
> 
> > Not long ago the MediaSDK team diagnosed a problem with their 
> > workloads looking at VTune timelines - seeing the difference between 
> > the time request came to kmd and time it went runnable & comparing the 
> > queues on 2 engines they understood that their requests have 
> > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed.
> > 
> > I can add Dmitry Rogozhkin to discussion if the usefulness of 
> > scheduling timeline in tools is questionable, as far as I remember 
> > this wasn't the only use case they had, I'm sure he can add more.
> 
> I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information.
> 
> There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code.
> 
> The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers.
> 
> Regards, Joonas
> 
> > 
> > Thank you,
> > Svetlana
> > 
> > -----Original Message-----
> > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On 
> > Behalf Of Joonas Lahtinen
> > Sent: Monday, August 13, 2018 12:55 PM
> > To: Chris Wilson <chris@chris-wilson.co.uk>; 
> > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin 
> > <tursulin@ursulin.net>; Tvrtko Ursulin 
> > <tvrtko.ursulin@linux.intel.com>
> > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove 
> > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> > 
> > Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> > > On 08/08/2018 13:42, Chris Wilson wrote:
> > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> > > This is true, no disagreement. My point simply was that we can 
> > > provide this info easily to anyone. There is a little bit of analogy 
> > > with perf scheduler tracing/map etc.
> > > 
> > > > I don't see any value in giving the information away, just the cost. 
> > > > If you can convince Joonas of its merit, and if we can define just 
> > > > exactly what ABI it constitutes, then I'd be happy to be the one 
> > > > who says "I told you so" in the future for a change.
> > > 
> > > I think Joonas was okay in principle that we soft-commit to _trying_ 
> > > to keep _some_ tracepoint stable-ish (where it makes sense and after 
> > > some discussion for each) if IGT also materializes which auto-pings 
> > > us (via
> > > CI) when we break one of them. But I may be misremembering so Joonas 
> > > please comment.
> > 
> > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel.
> > 
> > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those.
> > 
> > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted.
> > 
> > Regards, Joonas
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Joonas Lahtinen Sept. 4, 2018, 8:56 a.m. UTC | #18
Quoting Kukanova, Svetlana (2018-09-03 15:22:20)
> OK, so no end-user queries matter, just the queries from the tools for
> end-users do, right?

End-user queries do matter, but we don't have the bandwidth to implement
all the tools/software in the world. So for those reasons, we need to
have the interest from a party that is ready to implement the software.

So to proceed at a technical level, interest from developer of a tool is
needed. Simple as that.

> And with making the open-source tool (shipped by distros, etc.) suitable
> for negotiations we need to hurry while at least the trace-point mechanism
> is not yet completely broken and can be used to show usefulness and to have
> at least something that can be taken to distro?

I'm not sure I understood the question, but anything shipping to distros
as stable tool should not depend on tracepoints. Not even initially, as
tracepoints are volatile to change between kernel updates.

> If the new tool and kernel changes required by it are developed in parallel -
> you don't have that "shipped by a distro" condition, BTW, right? Or in case of
> parallel discussion you're deciding if the suggested tool  has rights to
> exist?

Usually a tool/software would be already established before it requests
some kernel changes. That'd of course require it to be useful before
introducing the new interfaces.

If the tools existence is completely reliant on some new interface provided
by kernel (like here), then we would like to get a green light from some
distro that they are interested in packaging the suggested software to
accompany the kernel changes. It all comes down to negotiating and
collaborating with the community.

This is pretty theoretical discussion before there is somebody stepping
up to develop and maintain the tool. So I'll stop here until that
happens.

Regards, Joonas

> -----Original Message-----
> From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
> Sent: Wednesday, August 29, 2018 5:52 PM
> To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> 
> Quoting Kukanova, Svetlana (2018-08-27 16:37:14)
> > > Once there is an actual request to have some metrics from vanilla kernels
> > > through some end-user tools (not a developer tool, like here), I'll be glad
> > > to discuss about how to provide the information the best for them in a
> > > stable manner.
> > 
> > Sorry for my ignorance, but looks like I don't understand what developer vs.
> > end-user means here.
> > With regard to GPU profiling VTune's end-user is somebody who develops gfx or
> > media applications basing on MediaSDK, OpenCL, C for Media, etc.
> > Or, more often it's an intel application engineer working with those people's
> > code.
> > AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data
> > he\she decides that the problem is on the deeper level of the gfx stack, not
> > in the customer's code.
> > Then Dmitry's team would be experimenting with VTune and deciding if the
> > problem is in their code or it's deeper in i915.
> > Don't think that i915 people use VTune (sadly:)) so here the chain is broken.
> > Otherwise they could e.g. blame HW based on the same data.
> > I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an
> > "end-user" and who's a "developer"?
> > Or is a "developer" a kernel developer only?
> > And e.g. Dmitry is an end-user and thus he is not supposed to use tools like
> > gpuvis or VTune?
> > Looks like all the chain before i915 is annoyed by the kernel-rebuilding
> > requirement.
> 
> With end-user tool I'm referring to something that would have interest
> in being packaged and shipped by a distro.
> 
> gpuvis team seems to be doing fine with the application being built from source
> and being run against a specially configured kernel for their purposes. I would
> assume there to be some queries about a enabling the tracepoints by default if
> there was demand. At the same time I would assume them to try to get the
> application packaged and into distros.
> 
> And then we would commence discussing how to provide the information in a stable
> manner (most likely outside tracepoints). So far I'm not seeing such queries
> from gpuvis direction.
> 
> > > The interface discussion would probably start from a DRM subsystem level, so
> > > that the tool would have an equivalent level of base experience from all
> > > drivers.
> >
> > That sounds like a solution from an ideal world. I mean if DRM had a uAPI for
> > scheduling observability and all the drivers had to implement this. And the
> > drivers would require info from HW like GuC pointing to the necessity of uAPI
> > support...
> > Would be just great for all the tools (, developers and end-users).
> > But I have no idea what kind of impulse should it be to bring this to reality.
> > And if all the energy available to human kind at the given evolution point
> > would be enough to at least start this. 
> > Or am I just too pessimistic? Are there some simple defined steps to be done
> > to make it? Can we build a realistic plan?
> 
> Step is "1. Have the tool" :) There seem to be three options: 1) open sourcing
> VTune 2) contributing to gpuvis project to drive the project into the above
> mentioned direction. 3) writing a new project from scratch (not encouraged,
> unless you have something differentiating to bring to the table).
> 
> Unless somebody actively drives the feature to some Open Source userspace
> consumer, there won't be an interface for the information from kernel.
> Demand from an Open Source application is a hard requirement for kickstarting
> the interface discussion.
> 
> > E.g. is this the first step? -
> > > There's just no Open Source tool to first design and then validate the
> > > interfaces against. There's just the debugging tool which happens to work
> > > currently, without any guarantees that next kernel version would not cause a
> > > substantial rework of the interfacing code.
> > 
> > How does it usually work, I mean you can't have a widely shipped open-source
> > consumer already using a non-existent feature that is to be requested? 
> > And I can't imagine what kind of existing tool should it be to decide suddenly
> > that it needs to add GPU scheduling tracing to the list of its features.
> > If you want to have a new tool for GPU scheduling timeline - and it sounds
> > like a sane idea, looks like we agree on the use cases etc. - how can you make
> > it open source first and then get the API to be based on from i915?
> 
> The order is that you develop the tool and the required kernel changes
> in parallel in topic branches, to demonstrate the usefulness of the tool
> and suitability of the kernel interface. Then after all the patches are
> reviewed (kernel + tool), kernel side is merged first, and then the tool
> can start working from next kernel release.
> 
> This has been attempted to be described in the following
> documentation chapter:
> 
> https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> 
> > Or am I just missing the point completely?
> > If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) -
> > would Dmitry be eligible to request observability features for tools?
> 
> MediaSDK should not have anything to do with this, unless it will directly
> consume the kernel interface in discussion.
> 
> The need is for some application/library/whatever userspace component to
> demonstrate the suitability of the kernel interface and act as a counterpart
> for the kernel interface that can be tested and debugged for changes.
> This too, is explained in more detail in the above linked documentation
> chapter.
> 
> Regards, Joonas
> 
> > 
> > Thank you,
> > Svetlana
> > 
> > -----Original Message-----
> > From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
> > Sent: Tuesday, August 21, 2018 3:07 PM
> > To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> > 
> > Quoting Kukanova, Svetlana (2018-08-13 16:44:49)
> > > Joonas, sorry for interfering; could you please explain more regarding 
> > > the options for tracing scheduling events better than tracepoints?
> > > After scheduling moves to GuC tools will have to switch to something 
> > > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution?
> > > I know gpuvis is not the only attempt to use tracepoints for the same purpose.
> > > (there're trace.pl and S.E.A. and of course VTune though it probably 
> > > is not considered to be existing as it's not open source).
> > > And assuming this movement towards GuC is it not too late to invent a 
> > > completely new way to provide tools with scheduling info from kmd?
> > > Could we just improve the existing way and let it live its last years\months? 
> > 
> > Hi,
> > 
> > You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information.
> > 
> > The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information.
> > 
> > The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled.
> > 
> > > 
> > > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW 
> > > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too.
> > > Thus it appears that intel gfx on linux is the most closed platform, 
> > > not bothering of observability (or even bothering about how to forbid observability).
> > 
> > gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up.
> > 
> > And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests).
> > 
> > Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner.
> > 
> > > Not long ago the MediaSDK team diagnosed a problem with their 
> > > workloads looking at VTune timelines - seeing the difference between 
> > > the time request came to kmd and time it went runnable & comparing the 
> > > queues on 2 engines they understood that their requests have 
> > > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed.
> > > 
> > > I can add Dmitry Rogozhkin to discussion if the usefulness of 
> > > scheduling timeline in tools is questionable, as far as I remember 
> > > this wasn't the only use case they had, I'm sure he can add more.
> > 
> > I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information.
> > 
> > There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code.
> > 
> > The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers.
> > 
> > Regards, Joonas
> > 
> > > 
> > > Thank you,
> > > Svetlana
> > > 
> > > -----Original Message-----
> > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On 
> > > Behalf Of Joonas Lahtinen
> > > Sent: Monday, August 13, 2018 12:55 PM
> > > To: Chris Wilson <chris@chris-wilson.co.uk>; 
> > > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin 
> > > <tursulin@ursulin.net>; Tvrtko Ursulin 
> > > <tvrtko.ursulin@linux.intel.com>
> > > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove 
> > > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option
> > > 
> > > Quoting Tvrtko Ursulin (2018-08-08 15:56:01)
> > > > On 08/08/2018 13:42, Chris Wilson wrote:
> > > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08)
> > > > This is true, no disagreement. My point simply was that we can 
> > > > provide this info easily to anyone. There is a little bit of analogy 
> > > > with perf scheduler tracing/map etc.
> > > > 
> > > > > I don't see any value in giving the information away, just the cost. 
> > > > > If you can convince Joonas of its merit, and if we can define just 
> > > > > exactly what ABI it constitutes, then I'd be happy to be the one 
> > > > > who says "I told you so" in the future for a change.
> > > > 
> > > > I think Joonas was okay in principle that we soft-commit to _trying_ 
> > > > to keep _some_ tracepoint stable-ish (where it makes sense and after 
> > > > some discussion for each) if IGT also materializes which auto-pings 
> > > > us (via
> > > > CI) when we break one of them. But I may be misremembering so Joonas 
> > > > please comment.
> > > 
> > > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel.
> > > 
> > > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those.
> > > 
> > > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted.
> > > 
> > > Regards, Joonas
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 9de8b1c51a5c..058094235329 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -132,17 +132,6 @@  config DRM_I915_SELFTEST_BROKEN
 
 	  If in doubt, say "N".
 
-config DRM_I915_LOW_LEVEL_TRACEPOINTS
-        bool "Enable low level request tracing events"
-        depends on DRM_I915
-        default n
-        help
-          Choose this option to turn on low level request tracing events.
-          This provides the ability to precisely monitor engine utilisation
-          and also analyze the request dependency resolving timeline.
-
-          If in doubt, say "N".
-
 config DRM_I915_DEBUG_VBLANK_EVADE
 	bool "Enable extra debug warnings for vblank evasion"
 	depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 4a6a15075afa..c0352a1b036c 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -679,7 +679,6 @@  DEFINE_EVENT(i915_request, i915_request_add,
 	    TP_ARGS(rq)
 );
 
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
 DEFINE_EVENT(i915_request, i915_request_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -751,25 +750,6 @@  TRACE_EVENT(i915_request_out,
 			      __entry->global_seqno, __entry->completed)
 );
 
-#else
-#if !defined(TRACE_HEADER_MULTI_READ)
-static inline void
-trace_i915_request_submit(struct i915_request *rq)
-{
-}
-
-static inline void
-trace_i915_request_in(struct i915_request *rq, unsigned int port)
-{
-}
-
-static inline void
-trace_i915_request_out(struct i915_request *rq)
-{
-}
-#endif
-#endif
-
 TRACE_EVENT(intel_engine_notify,
 	    TP_PROTO(struct intel_engine_cs *engine, bool waiters),
 	    TP_ARGS(engine, waiters),