Message ID | 20180625172546.7729-2-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Tvrtko Ursulin (2018-06-25 18:25:46) > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > This Kconfig option was added to protect the implementation specific > internals from user expectations but so far it was mostly hassle. > > Remove it so it is possible to debug request submission on any kernel > anywhere. Our job is not to let bugs into the wild ;) > This adds around 4k to default i915.ko build but should have no > performance effects due inactive tracepoints being no-op-ed out and out- > of-line. > > Users should remember tracepoints which are close to low level i915 > implementation details are subject to change and cannot be guaranteed. That's the caveat that I feel needs fleshed out. Burying it had the advantage of making it quite clear that you had to opt in and pick up the pieces when it inevitably breaks. What is wanted and what can we reasonable provide? If the tracepoints needs to undergo major change before the next LTS, let alone for the life of that LTS... If we know what is wanted can we define that better in terms of dma_fence and leave lowlevel for debugging (or think of how we achieve the same with generic bpf? kprobes)? Hmm, I wonder how far we can push that. -Chris
On 25/06/2018 21:02, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-06-25 18:25:46) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> This Kconfig option was added to protect the implementation specific >> internals from user expectations but so far it was mostly hassle. >> >> Remove it so it is possible to debug request submission on any kernel >> anywhere. > > Our job is not to let bugs into the wild ;) I did not word that well - I actually meant debugging the engine timelines for unexpected stalls and/or dependencies. So more about userspace being able to analyse what's happening. >> This adds around 4k to default i915.ko build but should have no >> performance effects due inactive tracepoints being no-op-ed out and out- >> of-line. >> >> Users should remember tracepoints which are close to low level i915 >> implementation details are subject to change and cannot be guaranteed. > > That's the caveat that I feel needs fleshed out. Burying it had the > advantage of making it quite clear that you had to opt in and pick up > the pieces when it inevitably breaks. > > What is wanted and what can we reasonable provide? If the tracepoints > needs to undergo major change before the next LTS, let alone for the > life of that LTS... > > If we know what is wanted can we define that better in terms of > dma_fence and leave lowlevel for debugging (or think of how we achieve > the same with generic bpf? kprobes)? Hmm, I wonder how far we can push > that. What is wanted is for instance take trace.pl on any kernel anywhere and it is able to deduce/draw the exact metrics/timeline of command submission for an workload. At the moment it without low level tracepoints, and without the intel_engine_notify tweak, it is workload dependent on how close it could get. So a set of tracepoints to allow drawing the timeline: 1. request_queue (or _add) 2. request_submit 3. intel_engine_notify 4. request_in/out With this set the above is possible and we don't need a lot of work to get there. And with the Virtual Engine it will become more interesting to have this. So if we had a bug report saying load balancing is not working well, we could just say "please run it via trace.pl --trace and attach perf script output". That way we could easily see whether or not is is a problem in userspace behaviour or else. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-06-26 11:46:51) > > On 25/06/2018 21:02, Chris Wilson wrote: > > If we know what is wanted can we define that better in terms of > > dma_fence and leave lowlevel for debugging (or think of how we achieve > > the same with generic bpf? kprobes)? Hmm, I wonder how far we can push > > that. > > What is wanted is for instance take trace.pl on any kernel anywhere and > it is able to deduce/draw the exact metrics/timeline of command > submission for an workload. > > At the moment it without low level tracepoints, and without the > intel_engine_notify tweak, it is workload dependent on how close it > could get. Interjecting what dma-fence already has (or we could use), not sure how well userspace can actually map it to their timelines. > > So a set of tracepoints to allow drawing the timeline: > > 1. request_queue (or _add) dma_fence_init > 2. request_submit > 3. intel_engine_notify For obvious reasons, no match in dma_fence. > 4. request_in dma_fence_emit > 5. request out dma_fence_signal (similar, not quite, we would have to force irq signaling). > With this set the above is possible and we don't need a lot of work to > get there. From a brief glance we are missing a dma_fence_queue for request_submit replacement. So next question is what information do we get from our tracepoints (or more precisely do you use) that we lack in dma_fence? > And with the Virtual Engine it will become more interesting to have > this. So if we had a bug report saying load balancing is not working > well, we could just say "please run it via trace.pl --trace and attach > perf script output". That way we could easily see whether or not is is a > problem in userspace behaviour or else. And there I was wanting a script to capture the workload so that we could replay it and dissect it. :-p -Chris
On 26/06/2018 11:55, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-06-26 11:46:51) >> >> On 25/06/2018 21:02, Chris Wilson wrote: >>> If we know what is wanted can we define that better in terms of >>> dma_fence and leave lowlevel for debugging (or think of how we achieve >>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push >>> that. >> >> What is wanted is for instance take trace.pl on any kernel anywhere and >> it is able to deduce/draw the exact metrics/timeline of command >> submission for an workload. >> >> At the moment it without low level tracepoints, and without the >> intel_engine_notify tweak, it is workload dependent on how close it >> could get. > > Interjecting what dma-fence already has (or we could use), not sure how > well userspace can actually map it to their timelines. >> >> So a set of tracepoints to allow drawing the timeline: >> >> 1. request_queue (or _add) > dma_fence_init > >> 2. request_submit > >> 3. intel_engine_notify > For obvious reasons, no match in dma_fence. > >> 4. request_in > dma_fence_emit > >> 5. request out > dma_fence_signal (similar, not quite, we would have to force irq > signaling). Yes not quite the same due potential time shift between user interrupt and dma_fence_signal call via different paths. > >> With this set the above is possible and we don't need a lot of work to >> get there. > > From a brief glance we are missing a dma_fence_queue for request_submit > replacement. > > So next question is what information do we get from our tracepoints (or > more precisely do you use) that we lack in dma_fence? Port=%u and preemption (completed=%u) comes immediately to mind. Way to tie with engines would be nice or it is all abstract timelines. Going this direction sounds like a long detour to get where we almost are. I suspect you are valuing the benefit of it being generic and hence and parsing tool could be cross-driver. But you can also just punt the "abstractising" into the parsing tool. >> And with the Virtual Engine it will become more interesting to have >> this. So if we had a bug report saying load balancing is not working >> well, we could just say "please run it via trace.pl --trace and attach >> perf script output". That way we could easily see whether or not is is a >> problem in userspace behaviour or else. > > And there I was wanting a script to capture the workload so that we > could replay it and dissect it. :-p Depends on what level you want that. Perf script output from the above tracepoints would do on one level. If you wanted a higher level to re-exercise load balancing then it wouldn't completely be enough, or at least a lot of guesswork would be needed. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-06-26 12:24:51) > > On 26/06/2018 11:55, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-06-26 11:46:51) > >> > >> On 25/06/2018 21:02, Chris Wilson wrote: > >>> If we know what is wanted can we define that better in terms of > >>> dma_fence and leave lowlevel for debugging (or think of how we achieve > >>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push > >>> that. > >> > >> What is wanted is for instance take trace.pl on any kernel anywhere and > >> it is able to deduce/draw the exact metrics/timeline of command > >> submission for an workload. > >> > >> At the moment it without low level tracepoints, and without the > >> intel_engine_notify tweak, it is workload dependent on how close it > >> could get. > > > > Interjecting what dma-fence already has (or we could use), not sure how > > well userspace can actually map it to their timelines. > >> > >> So a set of tracepoints to allow drawing the timeline: > >> > >> 1. request_queue (or _add) > > dma_fence_init > > > >> 2. request_submit > > > >> 3. intel_engine_notify > > For obvious reasons, no match in dma_fence. > > > >> 4. request_in > > dma_fence_emit > > > >> 5. request out > > dma_fence_signal (similar, not quite, we would have to force irq > > signaling). > > Yes not quite the same due potential time shift between user interrupt > and dma_fence_signal call via different paths. > > > > >> With this set the above is possible and we don't need a lot of work to > >> get there. > > > > From a brief glance we are missing a dma_fence_queue for request_submit > > replacement. > > > > So next question is what information do we get from our tracepoints (or > > more precisely do you use) that we lack in dma_fence? > > Port=%u and preemption (completed=%u) comes immediately to mind. Way to > tie with engines would be nice or it is all abstract timelines. > > Going this direction sounds like a long detour to get where we almost > are. I suspect you are valuing the benefit of it being generic and hence > and parsing tool could be cross-driver. But you can also just punt the > "abstractising" into the parsing tool. It's just that this about the third time this has been raised in the last couple of weeks with the other two requests being from a generic tooling pov (Eric Anholt for gnome-shell tweaking, and some one else looking for a gpuvis-like tool). So it seems like there is interest, even if I doubt that it'll help answer any questions beyond what you can just extract from looking at userspace. (Imo, the only people these tracepoints are useful for are people writing patches for the driver. For everyone else, you can just observe system behaviour and optimise your code for your workload. Otoh, can one trust a black box, argh.) To have a second set of nearly equivalent tracepoints, we need to have strong justification why we couldn't just use or extend the generic set. Plus I feel a lot more comfortable exporting a set of generic tracepoints, than those where we may be leaking more knowledge of the HW than we can reasonably expect to support for the indefinite future. > >> And with the Virtual Engine it will become more interesting to have > >> this. So if we had a bug report saying load balancing is not working > >> well, we could just say "please run it via trace.pl --trace and attach > >> perf script output". That way we could easily see whether or not is is a > >> problem in userspace behaviour or else. > > > > And there I was wanting a script to capture the workload so that we > > could replay it and dissect it. :-p > > Depends on what level you want that. Perf script output from the above > tracepoints would do on one level. If you wanted a higher level to > re-exercise load balancing then it wouldn't completely be enough, or at > least a lot of guesswork would be needed. It all depends on what level you want to optimise, is the way I look at it. Userspace driver, you capture the client->driver userspace API (e.g. cairo-trace, apitrace). But for optimising scheduling layout, we just need a workload descriptor like wsim -- with perhaps the only tweak being able to define latency/throughput metrics relevant to that workload, and being able to integrate with a pseudo display server. The challenge as I see it is being able to convince the user that it is a useful diagnosis step and being able to generate a reasonable wsim automatically. -Chris
On 26/06/2018 12:48, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-06-26 12:24:51) >> >> On 26/06/2018 11:55, Chris Wilson wrote: >>> Quoting Tvrtko Ursulin (2018-06-26 11:46:51) >>>> >>>> On 25/06/2018 21:02, Chris Wilson wrote: >>>>> If we know what is wanted can we define that better in terms of >>>>> dma_fence and leave lowlevel for debugging (or think of how we achieve >>>>> the same with generic bpf? kprobes)? Hmm, I wonder how far we can push >>>>> that. >>>> >>>> What is wanted is for instance take trace.pl on any kernel anywhere and >>>> it is able to deduce/draw the exact metrics/timeline of command >>>> submission for an workload. >>>> >>>> At the moment it without low level tracepoints, and without the >>>> intel_engine_notify tweak, it is workload dependent on how close it >>>> could get. >>> >>> Interjecting what dma-fence already has (or we could use), not sure how >>> well userspace can actually map it to their timelines. >>>> >>>> So a set of tracepoints to allow drawing the timeline: >>>> >>>> 1. request_queue (or _add) >>> dma_fence_init >>> >>>> 2. request_submit >>> >>>> 3. intel_engine_notify >>> For obvious reasons, no match in dma_fence. >>> >>>> 4. request_in >>> dma_fence_emit >>> >>>> 5. request out >>> dma_fence_signal (similar, not quite, we would have to force irq >>> signaling). >> >> Yes not quite the same due potential time shift between user interrupt >> and dma_fence_signal call via different paths. >> >>> >>>> With this set the above is possible and we don't need a lot of work to >>>> get there. >>> >>> From a brief glance we are missing a dma_fence_queue for request_submit >>> replacement. >>> >>> So next question is what information do we get from our tracepoints (or >>> more precisely do you use) that we lack in dma_fence? >> >> Port=%u and preemption (completed=%u) comes immediately to mind. Way to >> tie with engines would be nice or it is all abstract timelines. >> >> Going this direction sounds like a long detour to get where we almost >> are. I suspect you are valuing the benefit of it being generic and hence >> and parsing tool could be cross-driver. But you can also just punt the >> "abstractising" into the parsing tool. > > It's just that this about the third time this has been raised in the > last couple of weeks with the other two requests being from a generic > tooling pov (Eric Anholt for gnome-shell tweaking, and some one > else looking for a gpuvis-like tool). So it seems like there is > interest, even if I doubt that it'll help answer any questions beyond > what you can just extract from looking at userspace. (Imo, the only > people these tracepoints are useful for are people writing patches for > the driver. For everyone else, you can just observe system behaviour and > optimise your code for your workload. Otoh, can one trust a black > box, argh.) Some of the things might be obtainable purely from userspace via heavily instrumented builds, which may be in the realm of possible for during development, but I don't think it is feasible in general both because it is too involved, and because it would preclude existence of tools which can trace any random client. > To have a second set of nearly equivalent tracepoints, we need to have > strong justification why we couldn't just use or extend the generic set. I was hoping that the conversation so far established that nearly equivalent is not close enough for intended use cases. And that is not possible to make the generic ones so. > Plus I feel a lot more comfortable exporting a set of generic > tracepoints, than those where we may be leaking more knowledge of the HW > than we can reasonably expect to support for the indefinite future. I think it is accepted we cannot guarantee low level tracepoints will be supportable in the future world of GuC scheduling. (How and what we will do there is yet unresolved.) But at least we get much better usability for platforms up to there, and for very small effort. The idea is not to mark these as ABI but just improve user experience. You are I suppose worried that if these tracepoints disappeared due being un-implementable someone will complain? I just want that anyone can run trace.pl and see how virtual engine behaves, without having to recompile the kernel. And VTune people want the same for their enterprise-level customers. Both tools are ready to adapt should it be required. Its I repeat just usability and user experience out of the box. > >>>> And with the Virtual Engine it will become more interesting to have >>>> this. So if we had a bug report saying load balancing is not working >>>> well, we could just say "please run it via trace.pl --trace and attach >>>> perf script output". That way we could easily see whether or not is is a >>>> problem in userspace behaviour or else. >>> >>> And there I was wanting a script to capture the workload so that we >>> could replay it and dissect it. :-p >> >> Depends on what level you want that. Perf script output from the above >> tracepoints would do on one level. If you wanted a higher level to >> re-exercise load balancing then it wouldn't completely be enough, or at >> least a lot of guesswork would be needed. > > It all depends on what level you want to optimise, is the way I look at > it. Userspace driver, you capture the client->driver userspace API (e.g. > cairo-trace, apitrace). But for optimising scheduling layout, we just > need a workload descriptor like wsim -- with perhaps the only tweak > being able to define latency/throughput metrics relevant to that > workload, and being able to integrate with a pseudo display server. The > challenge as I see it is being able to convince the user that it is a > useful diagnosis step and being able to generate a reasonable wsim > automatically. To derive wsim's from apitraces sounds much more challenging but also I think is orthogonal. Tracing could be always there on the low level whether the client is real or simulated. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > > On 26/06/2018 12:48, Chris Wilson wrote: > > It's just that this about the third time this has been raised in the > > last couple of weeks with the other two requests being from a generic > > tooling pov (Eric Anholt for gnome-shell tweaking, and some one > > else looking for a gpuvis-like tool). So it seems like there is > > interest, even if I doubt that it'll help answer any questions beyond > > what you can just extract from looking at userspace. (Imo, the only > > people these tracepoints are useful for are people writing patches for > > the driver. For everyone else, you can just observe system behaviour and > > optimise your code for your workload. Otoh, can one trust a black > > box, argh.) > > Some of the things might be obtainable purely from userspace via heavily > instrumented builds, which may be in the realm of possible for during > development, but I don't think it is feasible in general both because it > is too involved, and because it would preclude existence of tools which > can trace any random client. > > > To have a second set of nearly equivalent tracepoints, we need to have > > strong justification why we couldn't just use or extend the generic set. > > I was hoping that the conversation so far established that nearly > equivalent is not close enough for intended use cases. And that is not > possible to make the generic ones so. (I just don't see the point of those use cases. I trace the kernel to fix the kernel...) > > Plus I feel a lot more comfortable exporting a set of generic > > tracepoints, than those where we may be leaking more knowledge of the HW > > than we can reasonably expect to support for the indefinite future. > > I think it is accepted we cannot guarantee low level tracepoints will be > supportable in the future world of GuC scheduling. (How and what we will > do there is yet unresolved.) But at least we get much better usability > for platforms up to there, and for very small effort. The idea is not to > mark these as ABI but just improve user experience. > > You are I suppose worried that if these tracepoints disappeared due > being un-implementable someone will complain? They already do... > I just want that anyone can run trace.pl and see how virtual engine > behaves, without having to recompile the kernel. And VTune people want > the same for their enterprise-level customers. Both tools are ready to > adapt should it be required. Its I repeat just usability and user > experience out of the box. The out-of-the-box user experience should not require the use of such tools in the first place! If they are trying to work around the kernel (and that's the only use of this information I see) we have bugs a plenty. [snip because I repeated myself] I think my issues boil down to: 1 - people will complain no matter what (when it changes, when it is no longer available) 2 - people will use it to workaround not fix; the information about kernel behaviour should only be used with a view to fixing that behaviour As such, I am quite happy to have it limited to driver developers that want to fix issues at source (OpenCL, I'm looking at you). There's tons of other user observable information out there for tuning userspace, why does the latency of runnable->queued matter if you will not do anything about it? Other things like dependency graphs, if you can't keep control of your own fences, you've already lost. I don't see any value in giving the information away, just the cost. If you can convince Joonas of its merit, and if we can define just exactly what ABI it constitutes, then I'd be happy to be the one who says "I told you so" in the future for a change. -Chris
+Joonas On 08/08/2018 13:42, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) >> >> On 26/06/2018 12:48, Chris Wilson wrote: >>> It's just that this about the third time this has been raised in the >>> last couple of weeks with the other two requests being from a generic >>> tooling pov (Eric Anholt for gnome-shell tweaking, and some one >>> else looking for a gpuvis-like tool). So it seems like there is >>> interest, even if I doubt that it'll help answer any questions beyond >>> what you can just extract from looking at userspace. (Imo, the only >>> people these tracepoints are useful for are people writing patches for >>> the driver. For everyone else, you can just observe system behaviour and >>> optimise your code for your workload. Otoh, can one trust a black >>> box, argh.) >> >> Some of the things might be obtainable purely from userspace via heavily >> instrumented builds, which may be in the realm of possible for during >> development, but I don't think it is feasible in general both because it >> is too involved, and because it would preclude existence of tools which >> can trace any random client. >> >>> To have a second set of nearly equivalent tracepoints, we need to have >>> strong justification why we couldn't just use or extend the generic set. >> >> I was hoping that the conversation so far established that nearly >> equivalent is not close enough for intended use cases. And that is not >> possible to make the generic ones so. > > (I just don't see the point of those use cases. I trace the kernel to > fix the kernel...) Yes and with virtual engine we will have a bigger reason to trace the kernel with a random client. > >>> Plus I feel a lot more comfortable exporting a set of generic >>> tracepoints, than those where we may be leaking more knowledge of the HW >>> than we can reasonably expect to support for the indefinite future. >> >> I think it is accepted we cannot guarantee low level tracepoints will be >> supportable in the future world of GuC scheduling. (How and what we will >> do there is yet unresolved.) But at least we get much better usability >> for platforms up to there, and for very small effort. The idea is not to >> mark these as ABI but just improve user experience. >> >> You are I suppose worried that if these tracepoints disappeared due >> being un-implementable someone will complain? > > They already do... > >> I just want that anyone can run trace.pl and see how virtual engine >> behaves, without having to recompile the kernel. And VTune people want >> the same for their enterprise-level customers. Both tools are ready to >> adapt should it be required. Its I repeat just usability and user >> experience out of the box. > > The out-of-the-box user experience should not require the use of such > tools in the first place! If they are trying to work around the kernel > (and that's the only use of this information I see) we have bugs a > plenty. > > [snip because I repeated myself] > > I think my issues boil down to: > > 1 - people will complain no matter what (when it changes, when it is no > longer available) > > 2 - people will use it to workaround not fix; the information about kernel > behaviour should only be used with a view to fixing that behaviour > > As such, I am quite happy to have it limited to driver developers that > want to fix issues at source (OpenCL, I'm looking at you). There's tons > of other user observable information out there for tuning userspace, > why does the latency of runnable->queued matter if you will not do anything > about it? Other things like dependency graphs, if you can't keep control > of your own fences, you've already lost. This is true, no disagreement. My point simply was that we can provide this info easily to anyone. There is a little bit of analogy with perf scheduler tracing/map etc. > I don't see any value in giving the information away, just the cost. If > you can convince Joonas of its merit, and if we can define just exactly > what ABI it constitutes, then I'd be happy to be the one who says "I > told you so" in the future for a change. I think Joonas was okay in principle that we soft-commit to _trying_ to keep _some_ tracepoint stable-ish (where it makes sense and after some discussion for each) if IGT also materializes which auto-pings us (via CI) when we break one of them. But I may be misremembering so Joonas please comment. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > On 08/08/2018 13:42, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > This is true, no disagreement. My point simply was that we can provide > this info easily to anyone. There is a little bit of analogy with perf > scheduler tracing/map etc. > > > I don't see any value in giving the information away, just the cost. If > > you can convince Joonas of its merit, and if we can define just exactly > > what ABI it constitutes, then I'd be happy to be the one who says "I > > told you so" in the future for a change. > > I think Joonas was okay in principle that we soft-commit to _trying_ to > keep _some_ tracepoint stable-ish (where it makes sense and after some > discussion for each) if IGT also materializes which auto-pings us (via > CI) when we break one of them. But I may be misremembering so Joonas > please comment. Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. Regards, Joonas
Joonas, sorry for interfering; could you please explain more regarding the options for tracing scheduling events better than tracepoints? After scheduling moves to GuC tools will have to switch to something like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? I know gpuvis is not the only attempt to use tracepoints for the same purpose. (there're trace.pl and S.E.A. and of course VTune though it probably is not considered to be existing as it's not open source). And assuming this movement towards GuC is it not too late to invent a completely new way to provide tools with scheduling info from kmd? Could we just improve the existing way and let it live its last years\months? gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too. Thus it appears that intel gfx on linux is the most closed platform, not bothering of observability (or even bothering about how to forbid observability). Not long ago the MediaSDK team diagnosed a problem with their workloads looking at VTune timelines - seeing the difference between the time request came to kmd and time it went runnable & comparing the queues on 2 engines they understood that their requests have dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed. I can add Dmitry Rogozhkin to discussion if the usefulness of scheduling timeline in tools is questionable, as far as I remember this wasn't the only use case they had, I'm sure he can add more. Thank you, Svetlana -----Original Message----- From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Joonas Lahtinen Sent: Monday, August 13, 2018 12:55 PM To: Chris Wilson <chris@chris-wilson.co.uk>; Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > On 08/08/2018 13:42, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > This is true, no disagreement. My point simply was that we can provide > this info easily to anyone. There is a little bit of analogy with perf > scheduler tracing/map etc. > > > I don't see any value in giving the information away, just the cost. > > If you can convince Joonas of its merit, and if we can define just > > exactly what ABI it constitutes, then I'd be happy to be the one who > > says "I told you so" in the future for a change. > > I think Joonas was okay in principle that we soft-commit to _trying_ > to keep _some_ tracepoint stable-ish (where it makes sense and after > some discussion for each) if IGT also materializes which auto-pings us > (via > CI) when we break one of them. But I may be misremembering so Joonas > please comment. Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. Regards, Joonas
Quoting Kukanova, Svetlana (2018-08-13 16:44:49) > Joonas, sorry for interfering; could you please explain more regarding the > options for tracing scheduling events better than tracepoints? > After scheduling moves to GuC tools will have to switch to something like > GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? > I know gpuvis is not the only attempt to use tracepoints for the same purpose. > (there're trace.pl and S.E.A. and of course VTune though it probably is not > considered to be existing as it's not open source). > And assuming this movement towards GuC is it not too late to invent a > completely new way to provide tools with scheduling info from kmd? > Could we just improve the existing way and let it live its last years\months? Hi, You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information. The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information. The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled. > > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW execution; > it cosplays Microsoft GPUView which works out-of-the-box on Windows too. > Thus it appears that intel gfx on linux is the most closed platform, not > bothering of observability (or even bothering about how to forbid observability). gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up. And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests). Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner. > Not long ago the MediaSDK team diagnosed a problem with their workloads > looking at VTune timelines - seeing the difference between the time request > came to kmd and time it went runnable & comparing the queues on 2 engines they > understood that their requests have dependencies that were definitely > unexpected. MediaSDK reported the problem to driver people and it was fixed. > > I can add Dmitry Rogozhkin to discussion if the usefulness of scheduling > timeline in tools is questionable, as far as I remember this wasn't the only > use case they had, I'm sure he can add more. I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information. There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code. The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers. Regards, Joonas > > Thank you, > Svetlana > > -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Joonas Lahtinen > Sent: Monday, August 13, 2018 12:55 PM > To: Chris Wilson <chris@chris-wilson.co.uk>; Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > > On 08/08/2018 13:42, Chris Wilson wrote: > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > > This is true, no disagreement. My point simply was that we can provide > > this info easily to anyone. There is a little bit of analogy with perf > > scheduler tracing/map etc. > > > > > I don't see any value in giving the information away, just the cost. > > > If you can convince Joonas of its merit, and if we can define just > > > exactly what ABI it constitutes, then I'd be happy to be the one who > > > says "I told you so" in the future for a change. > > > > I think Joonas was okay in principle that we soft-commit to _trying_ > > to keep _some_ tracepoint stable-ish (where it makes sense and after > > some discussion for each) if IGT also materializes which auto-pings us > > (via > > CI) when we break one of them. But I may be misremembering so Joonas > > please comment. > > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. > > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. > > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. > > Regards, Joonas > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 21/08/2018 13:06, Joonas Lahtinen wrote: > Quoting Kukanova, Svetlana (2018-08-13 16:44:49) >> Joonas, sorry for interfering; could you please explain more regarding the >> options for tracing scheduling events better than tracepoints? >> After scheduling moves to GuC tools will have to switch to something like >> GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? >> I know gpuvis is not the only attempt to use tracepoints for the same purpose. >> (there're trace.pl and S.E.A. and of course VTune though it probably is not >> considered to be existing as it's not open source). >> And assuming this movement towards GuC is it not too late to invent a >> completely new way to provide tools with scheduling info from kmd? >> Could we just improve the existing way and let it live its last years\months? > > Hi, > > You actually mentioned the prime reason why we should not go and > hastily make tracepoints a stable uAPI with regards to scheduling > information. > > The scheduler's nature will be evolving when some of the scheduling > decisions are moved to GuC and the way how we get the information > will be changing at that point, so tracepoints will indeed be a > very bad mechanism for providing the information. > > The kernel scheduler is definitely not going anywhere with the > introduction of more hardware scheduling capabilities, so it is a > misconception to think that the interface would need to be completely > different for when GuC is enabled. On the last paragraph - even with the today's GuC i915 already loses visibility of CSB interrupts. So there is already a big difference in semantics of what request_in and request_out tracepoints mean. Put preemption into the picture and we just don't know any more when something started executing on the GPU, when it got preempted, re-submitted etc. So I think it is fair to say that moving more of scheduling into the GuC creates a problem for tools which want to represent request execution timelines. Regards, Tvrtko
On 22/08/2018 13:49, Tvrtko Ursulin wrote: > > On 21/08/2018 13:06, Joonas Lahtinen wrote: >> Quoting Kukanova, Svetlana (2018-08-13 16:44:49) >>> Joonas, sorry for interfering; could you please explain more >>> regarding the >>> options for tracing scheduling events better than tracepoints? >>> After scheduling moves to GuC tools will have to switch to something >>> like >>> GuC-logging; but while kmd does scheduling isn't kernel-tracing the >>> best solution? >>> I know gpuvis is not the only attempt to use tracepoints for the same >>> purpose. >>> (there're trace.pl and S.E.A. and of course VTune though it probably >>> is not >>> considered to be existing as it's not open source). >>> And assuming this movement towards GuC is it not too late to invent a >>> completely new way to provide tools with scheduling info from kmd? >>> Could we just improve the existing way and let it live its last >>> years\months? >> >> Hi, >> >> You actually mentioned the prime reason why we should not go and >> hastily make tracepoints a stable uAPI with regards to scheduling >> information. >> >> The scheduler's nature will be evolving when some of the scheduling >> decisions are moved to GuC and the way how we get the information >> will be changing at that point, so tracepoints will indeed be a >> very bad mechanism for providing the information. >> >> The kernel scheduler is definitely not going anywhere with the >> introduction of more hardware scheduling capabilities, so it is a >> misconception to think that the interface would need to be completely >> different for when GuC is enabled. > > On the last paragraph - even with the today's GuC i915 already loses > visibility of CSB interrupts. So there is already a big difference in > semantics of what request_in and request_out tracepoints mean. Put > preemption into the picture and we just don't know any more when > something started executing on the GPU, when it got preempted, > re-submitted etc. So I think it is fair to say that moving more of > scheduling into the GuC creates a problem for tools which want to > represent request execution timelines. P.S. To clarify - which is exactly why we marked those tracpoints as low level and why it is problematic to rely on them. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-08-22 15:49:52) > > On 21/08/2018 13:06, Joonas Lahtinen wrote: > > Quoting Kukanova, Svetlana (2018-08-13 16:44:49) > >> Joonas, sorry for interfering; could you please explain more regarding the > >> options for tracing scheduling events better than tracepoints? > >> After scheduling moves to GuC tools will have to switch to something like > >> GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? > >> I know gpuvis is not the only attempt to use tracepoints for the same purpose. > >> (there're trace.pl and S.E.A. and of course VTune though it probably is not > >> considered to be existing as it's not open source). > >> And assuming this movement towards GuC is it not too late to invent a > >> completely new way to provide tools with scheduling info from kmd? > >> Could we just improve the existing way and let it live its last years\months? > > > > Hi, > > > > You actually mentioned the prime reason why we should not go and > > hastily make tracepoints a stable uAPI with regards to scheduling > > information. > > > > The scheduler's nature will be evolving when some of the scheduling > > decisions are moved to GuC and the way how we get the information > > will be changing at that point, so tracepoints will indeed be a > > very bad mechanism for providing the information. > > > > The kernel scheduler is definitely not going anywhere with the > > introduction of more hardware scheduling capabilities, so it is a > > misconception to think that the interface would need to be completely > > different for when GuC is enabled. To clarify, I meant to underline that there is not going to be a steep switching point where a transition from interface A to B, which Svetlana referred to, would happen naturally. The introduced interface will have to provide the information for years and kernel versions to come, and we already have a some data that tracepoints may not be the format of choice due to GuC. > On the last paragraph - even with the today's GuC i915 already loses > visibility of CSB interrupts. So there is already a big difference in > semantics of what request_in and request_out tracepoints mean. Put > preemption into the picture and we just don't know any more when > something started executing on the GPU, when it got preempted, > re-submitted etc. So I think it is fair to say that moving more of > scheduling into the GuC creates a problem for tools which want to > represent request execution timelines. Yes, for tools that depend on the tracepoints. That's why it is most likely best to introduce the information in some other form, but I am starting to sound like a broken record already :) Regards, Joonas > > Regards, > > Tvrtko
> Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner. Sorry for my ignorance, but looks like I don't understand what developer vs. end-user means here. With regard to GPU profiling VTune's end-user is somebody who develops gfx or media applications basing on MediaSDK, OpenCL, C for Media, etc. Or, more often it's an intel application engineer working with those people's code. AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data he\she decides that the problem is on the deeper level of the gfx stack, not in the customer's code. Then Dmitry's team would be experimenting with VTune and deciding if the problem is in their code or it's deeper in i915. Don't think that i915 people use VTune (sadly:)) so here the chain is broken. Otherwise they could e.g. blame HW based on the same data. I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an "end-user" and who's a "developer"? Or is a "developer" a kernel developer only? And e.g. Dmitry is an end-user and thus he is not supposed to use tools like gpuvis or VTune? Looks like all the chain before i915 is annoyed by the kernel-rebuilding requirement. >The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers. That sounds like a solution from an ideal world. I mean if DRM had a uAPI for scheduling observability and all the drivers had to implement this. And the drivers would require info from HW like GuC pointing to the necessity of uAPI support... Would be just great for all the tools (, developers and end-users). But I have no idea what kind of impulse should it be to bring this to reality. And if all the energy available to human kind at the given evolution point would be enough to at least start this. Or am I just too pessimistic? Are there some simple defined steps to be done to make it? Can we build a realistic plan? E.g. is this the first step? - > There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code. How does it usually work, I mean you can't have a widely shipped open-source consumer already using a non-existent feature that is to be requested? And I can't imagine what kind of existing tool should it be to decide suddenly that it needs to add GPU scheduling tracing to the list of its features. If you want to have a new tool for GPU scheduling timeline - and it sounds like a sane idea, looks like we agree on the use cases etc. - how can you make it open source first and then get the API to be based on from i915? Or am I just missing the point completely? If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) - would Dmitry be eligible to request observability features for tools? Thank you, Svetlana -----Original Message----- From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] Sent: Tuesday, August 21, 2018 3:07 PM To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option Quoting Kukanova, Svetlana (2018-08-13 16:44:49) > Joonas, sorry for interfering; could you please explain more regarding > the options for tracing scheduling events better than tracepoints? > After scheduling moves to GuC tools will have to switch to something > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? > I know gpuvis is not the only attempt to use tracepoints for the same purpose. > (there're trace.pl and S.E.A. and of course VTune though it probably > is not considered to be existing as it's not open source). > And assuming this movement towards GuC is it not too late to invent a > completely new way to provide tools with scheduling info from kmd? > Could we just improve the existing way and let it live its last years\months? Hi, You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information. The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information. The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled. > > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too. > Thus it appears that intel gfx on linux is the most closed platform, > not bothering of observability (or even bothering about how to forbid observability). gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up. And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests). Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner. > Not long ago the MediaSDK team diagnosed a problem with their > workloads looking at VTune timelines - seeing the difference between > the time request came to kmd and time it went runnable & comparing the > queues on 2 engines they understood that their requests have > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed. > > I can add Dmitry Rogozhkin to discussion if the usefulness of > scheduling timeline in tools is questionable, as far as I remember > this wasn't the only use case they had, I'm sure he can add more. I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information. There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code. The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers. Regards, Joonas > > Thank you, > Svetlana > > -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On > Behalf Of Joonas Lahtinen > Sent: Monday, August 13, 2018 12:55 PM > To: Chris Wilson <chris@chris-wilson.co.uk>; > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin > <tursulin@ursulin.net>; Tvrtko Ursulin > <tvrtko.ursulin@linux.intel.com> > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > > On 08/08/2018 13:42, Chris Wilson wrote: > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > > This is true, no disagreement. My point simply was that we can > > provide this info easily to anyone. There is a little bit of analogy > > with perf scheduler tracing/map etc. > > > > > I don't see any value in giving the information away, just the cost. > > > If you can convince Joonas of its merit, and if we can define just > > > exactly what ABI it constitutes, then I'd be happy to be the one > > > who says "I told you so" in the future for a change. > > > > I think Joonas was okay in principle that we soft-commit to _trying_ > > to keep _some_ tracepoint stable-ish (where it makes sense and after > > some discussion for each) if IGT also materializes which auto-pings > > us (via > > CI) when we break one of them. But I may be misremembering so Joonas > > please comment. > > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. > > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. > > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. > > Regards, Joonas > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -------------------------------------------------------------------- Joint Stock Company Intel A/O Registered legal address: Krylatsky Hills Business Park, 17 Krylatskaya Str., Bldg 4, Moscow 121614, Russian Federation This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Quoting Kukanova, Svetlana (2018-08-27 16:37:14) > > Once there is an actual request to have some metrics from vanilla kernels > > through some end-user tools (not a developer tool, like here), I'll be glad > > to discuss about how to provide the information the best for them in a > > stable manner. > > Sorry for my ignorance, but looks like I don't understand what developer vs. > end-user means here. > With regard to GPU profiling VTune's end-user is somebody who develops gfx or > media applications basing on MediaSDK, OpenCL, C for Media, etc. > Or, more often it's an intel application engineer working with those people's > code. > AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data > he\she decides that the problem is on the deeper level of the gfx stack, not > in the customer's code. > Then Dmitry's team would be experimenting with VTune and deciding if the > problem is in their code or it's deeper in i915. > Don't think that i915 people use VTune (sadly:)) so here the chain is broken. > Otherwise they could e.g. blame HW based on the same data. > I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an > "end-user" and who's a "developer"? > Or is a "developer" a kernel developer only? > And e.g. Dmitry is an end-user and thus he is not supposed to use tools like > gpuvis or VTune? > Looks like all the chain before i915 is annoyed by the kernel-rebuilding > requirement. With end-user tool I'm referring to something that would have interest in being packaged and shipped by a distro. gpuvis team seems to be doing fine with the application being built from source and being run against a specially configured kernel for their purposes. I would assume there to be some queries about a enabling the tracepoints by default if there was demand. At the same time I would assume them to try to get the application packaged and into distros. And then we would commence discussing how to provide the information in a stable manner (most likely outside tracepoints). So far I'm not seeing such queries from gpuvis direction. > > The interface discussion would probably start from a DRM subsystem level, so > > that the tool would have an equivalent level of base experience from all > > drivers. > > That sounds like a solution from an ideal world. I mean if DRM had a uAPI for > scheduling observability and all the drivers had to implement this. And the > drivers would require info from HW like GuC pointing to the necessity of uAPI > support... > Would be just great for all the tools (, developers and end-users). > But I have no idea what kind of impulse should it be to bring this to reality. > And if all the energy available to human kind at the given evolution point > would be enough to at least start this. > Or am I just too pessimistic? Are there some simple defined steps to be done > to make it? Can we build a realistic plan? Step is "1. Have the tool" :) There seem to be three options: 1) open sourcing VTune 2) contributing to gpuvis project to drive the project into the above mentioned direction. 3) writing a new project from scratch (not encouraged, unless you have something differentiating to bring to the table). Unless somebody actively drives the feature to some Open Source userspace consumer, there won't be an interface for the information from kernel. Demand from an Open Source application is a hard requirement for kickstarting the interface discussion. > E.g. is this the first step? - > > There's just no Open Source tool to first design and then validate the > > interfaces against. There's just the debugging tool which happens to work > > currently, without any guarantees that next kernel version would not cause a > > substantial rework of the interfacing code. > > How does it usually work, I mean you can't have a widely shipped open-source > consumer already using a non-existent feature that is to be requested? > And I can't imagine what kind of existing tool should it be to decide suddenly > that it needs to add GPU scheduling tracing to the list of its features. > If you want to have a new tool for GPU scheduling timeline - and it sounds > like a sane idea, looks like we agree on the use cases etc. - how can you make > it open source first and then get the API to be based on from i915? The order is that you develop the tool and the required kernel changes in parallel in topic branches, to demonstrate the usefulness of the tool and suitability of the kernel interface. Then after all the patches are reviewed (kernel + tool), kernel side is merged first, and then the tool can start working from next kernel release. This has been attempted to be described in the following documentation chapter: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > Or am I just missing the point completely? > If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) - > would Dmitry be eligible to request observability features for tools? MediaSDK should not have anything to do with this, unless it will directly consume the kernel interface in discussion. The need is for some application/library/whatever userspace component to demonstrate the suitability of the kernel interface and act as a counterpart for the kernel interface that can be tested and debugged for changes. This too, is explained in more detail in the above linked documentation chapter. Regards, Joonas > > Thank you, > Svetlana > > -----Original Message----- > From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] > Sent: Tuesday, August 21, 2018 3:07 PM > To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > Quoting Kukanova, Svetlana (2018-08-13 16:44:49) > > Joonas, sorry for interfering; could you please explain more regarding > > the options for tracing scheduling events better than tracepoints? > > After scheduling moves to GuC tools will have to switch to something > > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? > > I know gpuvis is not the only attempt to use tracepoints for the same purpose. > > (there're trace.pl and S.E.A. and of course VTune though it probably > > is not considered to be existing as it's not open source). > > And assuming this movement towards GuC is it not too late to invent a > > completely new way to provide tools with scheduling info from kmd? > > Could we just improve the existing way and let it live its last years\months? > > Hi, > > You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information. > > The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information. > > The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled. > > > > > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW > > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too. > > Thus it appears that intel gfx on linux is the most closed platform, > > not bothering of observability (or even bothering about how to forbid observability). > > gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up. > > And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests). > > Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner. > > > Not long ago the MediaSDK team diagnosed a problem with their > > workloads looking at VTune timelines - seeing the difference between > > the time request came to kmd and time it went runnable & comparing the > > queues on 2 engines they understood that their requests have > > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed. > > > > I can add Dmitry Rogozhkin to discussion if the usefulness of > > scheduling timeline in tools is questionable, as far as I remember > > this wasn't the only use case they had, I'm sure he can add more. > > I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information. > > There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code. > > The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers. > > Regards, Joonas > > > > > Thank you, > > Svetlana > > > > -----Original Message----- > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On > > Behalf Of Joonas Lahtinen > > Sent: Monday, August 13, 2018 12:55 PM > > To: Chris Wilson <chris@chris-wilson.co.uk>; > > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin > > <tursulin@ursulin.net>; Tvrtko Ursulin > > <tvrtko.ursulin@linux.intel.com> > > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove > > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > > > Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > > > On 08/08/2018 13:42, Chris Wilson wrote: > > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > > > This is true, no disagreement. My point simply was that we can > > > provide this info easily to anyone. There is a little bit of analogy > > > with perf scheduler tracing/map etc. > > > > > > > I don't see any value in giving the information away, just the cost. > > > > If you can convince Joonas of its merit, and if we can define just > > > > exactly what ABI it constitutes, then I'd be happy to be the one > > > > who says "I told you so" in the future for a change. > > > > > > I think Joonas was okay in principle that we soft-commit to _trying_ > > > to keep _some_ tracepoint stable-ish (where it makes sense and after > > > some discussion for each) if IGT also materializes which auto-pings > > > us (via > > > CI) when we break one of them. But I may be misremembering so Joonas > > > please comment. > > > > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. > > > > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. > > > > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. > > > > Regards, Joonas > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
OK, so no end-user queries matter, just the queries from the tools for end-users do, right? And with making the open-source tool (shipped by distros, etc.) suitable for negotiations we need to hurry while at least the trace-point mechanism is not yet completely broken and can be used to show usefulness and to have at least something that can be taken to distro? If the new tool and kernel changes required by it are developed in parallel - you don't have that "shipped by a distro" condition, BTW, right? Or in case of parallel discussion you're deciding if the suggested tool has rights to exist? -----Original Message----- From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] Sent: Wednesday, August 29, 2018 5:52 PM To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option Quoting Kukanova, Svetlana (2018-08-27 16:37:14) > > Once there is an actual request to have some metrics from vanilla kernels > > through some end-user tools (not a developer tool, like here), I'll be glad > > to discuss about how to provide the information the best for them in a > > stable manner. > > Sorry for my ignorance, but looks like I don't understand what developer vs. > end-user means here. > With regard to GPU profiling VTune's end-user is somebody who develops gfx or > media applications basing on MediaSDK, OpenCL, C for Media, etc. > Or, more often it's an intel application engineer working with those people's > code. > AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data > he\she decides that the problem is on the deeper level of the gfx stack, not > in the customer's code. > Then Dmitry's team would be experimenting with VTune and deciding if the > problem is in their code or it's deeper in i915. > Don't think that i915 people use VTune (sadly:)) so here the chain is broken. > Otherwise they could e.g. blame HW based on the same data. > I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an > "end-user" and who's a "developer"? > Or is a "developer" a kernel developer only? > And e.g. Dmitry is an end-user and thus he is not supposed to use tools like > gpuvis or VTune? > Looks like all the chain before i915 is annoyed by the kernel-rebuilding > requirement. With end-user tool I'm referring to something that would have interest in being packaged and shipped by a distro. gpuvis team seems to be doing fine with the application being built from source and being run against a specially configured kernel for their purposes. I would assume there to be some queries about a enabling the tracepoints by default if there was demand. At the same time I would assume them to try to get the application packaged and into distros. And then we would commence discussing how to provide the information in a stable manner (most likely outside tracepoints). So far I'm not seeing such queries from gpuvis direction. > > The interface discussion would probably start from a DRM subsystem level, so > > that the tool would have an equivalent level of base experience from all > > drivers. > > That sounds like a solution from an ideal world. I mean if DRM had a uAPI for > scheduling observability and all the drivers had to implement this. And the > drivers would require info from HW like GuC pointing to the necessity of uAPI > support... > Would be just great for all the tools (, developers and end-users). > But I have no idea what kind of impulse should it be to bring this to reality. > And if all the energy available to human kind at the given evolution point > would be enough to at least start this. > Or am I just too pessimistic? Are there some simple defined steps to be done > to make it? Can we build a realistic plan? Step is "1. Have the tool" :) There seem to be three options: 1) open sourcing VTune 2) contributing to gpuvis project to drive the project into the above mentioned direction. 3) writing a new project from scratch (not encouraged, unless you have something differentiating to bring to the table). Unless somebody actively drives the feature to some Open Source userspace consumer, there won't be an interface for the information from kernel. Demand from an Open Source application is a hard requirement for kickstarting the interface discussion. > E.g. is this the first step? - > > There's just no Open Source tool to first design and then validate the > > interfaces against. There's just the debugging tool which happens to work > > currently, without any guarantees that next kernel version would not cause a > > substantial rework of the interfacing code. > > How does it usually work, I mean you can't have a widely shipped open-source > consumer already using a non-existent feature that is to be requested? > And I can't imagine what kind of existing tool should it be to decide suddenly > that it needs to add GPU scheduling tracing to the list of its features. > If you want to have a new tool for GPU scheduling timeline - and it sounds > like a sane idea, looks like we agree on the use cases etc. - how can you make > it open source first and then get the API to be based on from i915? The order is that you develop the tool and the required kernel changes in parallel in topic branches, to demonstrate the usefulness of the tool and suitability of the kernel interface. Then after all the patches are reviewed (kernel + tool), kernel side is merged first, and then the tool can start working from next kernel release. This has been attempted to be described in the following documentation chapter: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > Or am I just missing the point completely? > If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) - > would Dmitry be eligible to request observability features for tools? MediaSDK should not have anything to do with this, unless it will directly consume the kernel interface in discussion. The need is for some application/library/whatever userspace component to demonstrate the suitability of the kernel interface and act as a counterpart for the kernel interface that can be tested and debugged for changes. This too, is explained in more detail in the above linked documentation chapter. Regards, Joonas > > Thank you, > Svetlana > > -----Original Message----- > From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] > Sent: Tuesday, August 21, 2018 3:07 PM > To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > Quoting Kukanova, Svetlana (2018-08-13 16:44:49) > > Joonas, sorry for interfering; could you please explain more regarding > > the options for tracing scheduling events better than tracepoints? > > After scheduling moves to GuC tools will have to switch to something > > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? > > I know gpuvis is not the only attempt to use tracepoints for the same purpose. > > (there're trace.pl and S.E.A. and of course VTune though it probably > > is not considered to be existing as it's not open source). > > And assuming this movement towards GuC is it not too late to invent a > > completely new way to provide tools with scheduling info from kmd? > > Could we just improve the existing way and let it live its last years\months? > > Hi, > > You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information. > > The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information. > > The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled. > > > > > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW > > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too. > > Thus it appears that intel gfx on linux is the most closed platform, > > not bothering of observability (or even bothering about how to forbid observability). > > gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up. > > And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests). > > Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner. > > > Not long ago the MediaSDK team diagnosed a problem with their > > workloads looking at VTune timelines - seeing the difference between > > the time request came to kmd and time it went runnable & comparing the > > queues on 2 engines they understood that their requests have > > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed. > > > > I can add Dmitry Rogozhkin to discussion if the usefulness of > > scheduling timeline in tools is questionable, as far as I remember > > this wasn't the only use case they had, I'm sure he can add more. > > I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information. > > There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code. > > The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers. > > Regards, Joonas > > > > > Thank you, > > Svetlana > > > > -----Original Message----- > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On > > Behalf Of Joonas Lahtinen > > Sent: Monday, August 13, 2018 12:55 PM > > To: Chris Wilson <chris@chris-wilson.co.uk>; > > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin > > <tursulin@ursulin.net>; Tvrtko Ursulin > > <tvrtko.ursulin@linux.intel.com> > > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove > > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > > > Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > > > On 08/08/2018 13:42, Chris Wilson wrote: > > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > > > This is true, no disagreement. My point simply was that we can > > > provide this info easily to anyone. There is a little bit of analogy > > > with perf scheduler tracing/map etc. > > > > > > > I don't see any value in giving the information away, just the cost. > > > > If you can convince Joonas of its merit, and if we can define just > > > > exactly what ABI it constitutes, then I'd be happy to be the one > > > > who says "I told you so" in the future for a change. > > > > > > I think Joonas was okay in principle that we soft-commit to _trying_ > > > to keep _some_ tracepoint stable-ish (where it makes sense and after > > > some discussion for each) if IGT also materializes which auto-pings > > > us (via > > > CI) when we break one of them. But I may be misremembering so Joonas > > > please comment. > > > > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. > > > > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. > > > > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. > > > > Regards, Joonas > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -------------------------------------------------------------------- Joint Stock Company Intel A/O Registered legal address: Krylatsky Hills Business Park, 17 Krylatskaya Str., Bldg 4, Moscow 121614, Russian Federation This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Quoting Kukanova, Svetlana (2018-09-03 15:22:20) > OK, so no end-user queries matter, just the queries from the tools for > end-users do, right? End-user queries do matter, but we don't have the bandwidth to implement all the tools/software in the world. So for those reasons, we need to have the interest from a party that is ready to implement the software. So to proceed at a technical level, interest from developer of a tool is needed. Simple as that. > And with making the open-source tool (shipped by distros, etc.) suitable > for negotiations we need to hurry while at least the trace-point mechanism > is not yet completely broken and can be used to show usefulness and to have > at least something that can be taken to distro? I'm not sure I understood the question, but anything shipping to distros as stable tool should not depend on tracepoints. Not even initially, as tracepoints are volatile to change between kernel updates. > If the new tool and kernel changes required by it are developed in parallel - > you don't have that "shipped by a distro" condition, BTW, right? Or in case of > parallel discussion you're deciding if the suggested tool has rights to > exist? Usually a tool/software would be already established before it requests some kernel changes. That'd of course require it to be useful before introducing the new interfaces. If the tools existence is completely reliant on some new interface provided by kernel (like here), then we would like to get a green light from some distro that they are interested in packaging the suggested software to accompany the kernel changes. It all comes down to negotiating and collaborating with the community. This is pretty theoretical discussion before there is somebody stepping up to develop and maintain the tool. So I'll stop here until that happens. Regards, Joonas > -----Original Message----- > From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] > Sent: Wednesday, August 29, 2018 5:52 PM > To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > Quoting Kukanova, Svetlana (2018-08-27 16:37:14) > > > Once there is an actual request to have some metrics from vanilla kernels > > > through some end-user tools (not a developer tool, like here), I'll be glad > > > to discuss about how to provide the information the best for them in a > > > stable manner. > > > > Sorry for my ignorance, but looks like I don't understand what developer vs. > > end-user means here. > > With regard to GPU profiling VTune's end-user is somebody who develops gfx or > > media applications basing on MediaSDK, OpenCL, C for Media, etc. > > Or, more often it's an intel application engineer working with those people's > > code. > > AE in his\her turn may contact e.g. Dmitry's team if judging by VTune data > > he\she decides that the problem is on the deeper level of the gfx stack, not > > in the customer's code. > > Then Dmitry's team would be experimenting with VTune and deciding if the > > problem is in their code or it's deeper in i915. > > Don't think that i915 people use VTune (sadly:)) so here the chain is broken. > > Otherwise they could e.g. blame HW based on the same data. > > I'm wondering who in this chain (app developer, AE, Dmitry, i915) is an > > "end-user" and who's a "developer"? > > Or is a "developer" a kernel developer only? > > And e.g. Dmitry is an end-user and thus he is not supposed to use tools like > > gpuvis or VTune? > > Looks like all the chain before i915 is annoyed by the kernel-rebuilding > > requirement. > > With end-user tool I'm referring to something that would have interest > in being packaged and shipped by a distro. > > gpuvis team seems to be doing fine with the application being built from source > and being run against a specially configured kernel for their purposes. I would > assume there to be some queries about a enabling the tracepoints by default if > there was demand. At the same time I would assume them to try to get the > application packaged and into distros. > > And then we would commence discussing how to provide the information in a stable > manner (most likely outside tracepoints). So far I'm not seeing such queries > from gpuvis direction. > > > > The interface discussion would probably start from a DRM subsystem level, so > > > that the tool would have an equivalent level of base experience from all > > > drivers. > > > > That sounds like a solution from an ideal world. I mean if DRM had a uAPI for > > scheduling observability and all the drivers had to implement this. And the > > drivers would require info from HW like GuC pointing to the necessity of uAPI > > support... > > Would be just great for all the tools (, developers and end-users). > > But I have no idea what kind of impulse should it be to bring this to reality. > > And if all the energy available to human kind at the given evolution point > > would be enough to at least start this. > > Or am I just too pessimistic? Are there some simple defined steps to be done > > to make it? Can we build a realistic plan? > > Step is "1. Have the tool" :) There seem to be three options: 1) open sourcing > VTune 2) contributing to gpuvis project to drive the project into the above > mentioned direction. 3) writing a new project from scratch (not encouraged, > unless you have something differentiating to bring to the table). > > Unless somebody actively drives the feature to some Open Source userspace > consumer, there won't be an interface for the information from kernel. > Demand from an Open Source application is a hard requirement for kickstarting > the interface discussion. > > > E.g. is this the first step? - > > > There's just no Open Source tool to first design and then validate the > > > interfaces against. There's just the debugging tool which happens to work > > > currently, without any guarantees that next kernel version would not cause a > > > substantial rework of the interfacing code. > > > > How does it usually work, I mean you can't have a widely shipped open-source > > consumer already using a non-existent feature that is to be requested? > > And I can't imagine what kind of existing tool should it be to decide suddenly > > that it needs to add GPU scheduling tracing to the list of its features. > > If you want to have a new tool for GPU scheduling timeline - and it sounds > > like a sane idea, looks like we agree on the use cases etc. - how can you make > > it open source first and then get the API to be based on from i915? > > The order is that you develop the tool and the required kernel changes > in parallel in topic branches, to demonstrate the usefulness of the tool > and suitability of the kernel interface. Then after all the patches are > reviewed (kernel + tool), kernel side is merged first, and then the tool > can start working from next kernel release. > > This has been attempted to be described in the following > documentation chapter: > > https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > > > Or am I just missing the point completely? > > If the open-sourced MediaSDK was shipped with some distro (isn't it, btw?) - > > would Dmitry be eligible to request observability features for tools? > > MediaSDK should not have anything to do with this, unless it will directly > consume the kernel interface in discussion. > > The need is for some application/library/whatever userspace component to > demonstrate the suitability of the kernel interface and act as a counterpart > for the kernel interface that can be tested and debugged for changes. > This too, is explained in more detail in the above linked documentation > chapter. > > Regards, Joonas > > > > > Thank you, > > Svetlana > > > > -----Original Message----- > > From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] > > Sent: Tuesday, August 21, 2018 3:07 PM > > To: Intel-gfx@lists.freedesktop.org; Kukanova, Svetlana <svetlana.kukanova@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Tvrtko Ursulin <tursulin@ursulin.net>; Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > > > Quoting Kukanova, Svetlana (2018-08-13 16:44:49) > > > Joonas, sorry for interfering; could you please explain more regarding > > > the options for tracing scheduling events better than tracepoints? > > > After scheduling moves to GuC tools will have to switch to something > > > like GuC-logging; but while kmd does scheduling isn't kernel-tracing the best solution? > > > I know gpuvis is not the only attempt to use tracepoints for the same purpose. > > > (there're trace.pl and S.E.A. and of course VTune though it probably > > > is not considered to be existing as it's not open source). > > > And assuming this movement towards GuC is it not too late to invent a > > > completely new way to provide tools with scheduling info from kmd? > > > Could we just improve the existing way and let it live its last years\months? > > > > Hi, > > > > You actually mentioned the prime reason why we should not go and hastily make tracepoints a stable uAPI with regards to scheduling information. > > > > The scheduler's nature will be evolving when some of the scheduling decisions are moved to GuC and the way how we get the information will be changing at that point, so tracepoints will indeed be a very bad mechanism for providing the information. > > > > The kernel scheduler is definitely not going anywhere with the introduction of more hardware scheduling capabilities, so it is a misconception to think that the interface would need to be completely different for when GuC is enabled. > > > > > > > > gpuvis works w\o modifying kernel for AMDgpu showing HW queue and HW > > > execution; it cosplays Microsoft GPUView which works out-of-the-box on Windows too. > > > Thus it appears that intel gfx on linux is the most closed platform, > > > not bothering of observability (or even bothering about how to forbid observability). > > > > gpuvis is a developer tool. The tracepoints behind this configure switch are way more low-level than what the gpuvis seems to support for AMDGPU *at all*. They seem to stick to IOCTL level. So from what I see, we should be on-par with the competition even without any special kernel configuration. So lets not get things mixed up. > > > > And I remind, the tool is not shipping anywhere really (except the AUR), but just built from source by developers in need, and they seem to be just fine with re-compiling the kernel (as there have been no requests). > > > > Once there is an actual request to have some metrics from vanilla kernels through some end-user tools (not a developer tool, like here), I'll be glad to discuss about how to provide the information the best for them in a stable manner. > > > > > Not long ago the MediaSDK team diagnosed a problem with their > > > workloads looking at VTune timelines - seeing the difference between > > > the time request came to kmd and time it went runnable & comparing the > > > queues on 2 engines they understood that their requests have > > > dependencies that were definitely unexpected. MediaSDK reported the problem to driver people and it was fixed. > > > > > > I can add Dmitry Rogozhkin to discussion if the usefulness of > > > scheduling timeline in tools is questionable, as far as I remember > > > this wasn't the only use case they had, I'm sure he can add more. > > > > I'm well aware of the use cases. And Dmitry is well aware of the need for an Open Source consumer for any requested stable uAPIs. And we don't currently have that, so there's no disconnect on information. > > > > There's just no Open Source tool to first design and then validate the interfaces against. There's just the debugging tool which happens to work currently, without any guarantees that next kernel version would not cause a substantial rework of the interfacing code. > > > > The interface discussion would probably start from a DRM subsystem level, so that the tool would have an equivalent level of base experience from all drivers. > > > > Regards, Joonas > > > > > > > > Thank you, > > > Svetlana > > > > > > -----Original Message----- > > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On > > > Behalf Of Joonas Lahtinen > > > Sent: Monday, August 13, 2018 12:55 PM > > > To: Chris Wilson <chris@chris-wilson.co.uk>; > > > Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin > > > <tursulin@ursulin.net>; Tvrtko Ursulin > > > <tvrtko.ursulin@linux.intel.com> > > > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/tracepoints: Remove > > > DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option > > > > > > Quoting Tvrtko Ursulin (2018-08-08 15:56:01) > > > > On 08/08/2018 13:42, Chris Wilson wrote: > > > > > Quoting Tvrtko Ursulin (2018-08-08 13:13:08) > > > > This is true, no disagreement. My point simply was that we can > > > > provide this info easily to anyone. There is a little bit of analogy > > > > with perf scheduler tracing/map etc. > > > > > > > > > I don't see any value in giving the information away, just the cost. > > > > > If you can convince Joonas of its merit, and if we can define just > > > > > exactly what ABI it constitutes, then I'd be happy to be the one > > > > > who says "I told you so" in the future for a change. > > > > > > > > I think Joonas was okay in principle that we soft-commit to _trying_ > > > > to keep _some_ tracepoint stable-ish (where it makes sense and after > > > > some discussion for each) if IGT also materializes which auto-pings > > > > us (via > > > > CI) when we break one of them. But I may be misremembering so Joonas > > > > please comment. > > > > > > Currently gpuvis, using these, seems to be only packaged in one AUR repo, and they do make a not in the wiki how you need to configure kernel for debugging. And there's been no apparent demand for them to have it in stock kernel. > > > > > > And even when we do get demand for having gpuvis or another tool working from vanilla kernel, tracepoints being a rather tricky subject, I would start the discussion by going through alternative means of providing the information the tool needs and considering those. > > > > > > So lets still keep this option as it was introduced. The whole "tracepoints as stable uAPI" idea is a can of worms which is only dug into when other options are exhausted. > > > > > > Regards, Joonas > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug index 9de8b1c51a5c..058094235329 100644 --- a/drivers/gpu/drm/i915/Kconfig.debug +++ b/drivers/gpu/drm/i915/Kconfig.debug @@ -132,17 +132,6 @@ config DRM_I915_SELFTEST_BROKEN If in doubt, say "N". -config DRM_I915_LOW_LEVEL_TRACEPOINTS - bool "Enable low level request tracing events" - depends on DRM_I915 - default n - help - Choose this option to turn on low level request tracing events. - This provides the ability to precisely monitor engine utilisation - and also analyze the request dependency resolving timeline. - - If in doubt, say "N". - config DRM_I915_DEBUG_VBLANK_EVADE bool "Enable extra debug warnings for vblank evasion" depends on DRM_I915 diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 4a6a15075afa..c0352a1b036c 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -679,7 +679,6 @@ DEFINE_EVENT(i915_request, i915_request_add, TP_ARGS(rq) ); -#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq) @@ -751,25 +750,6 @@ TRACE_EVENT(i915_request_out, __entry->global_seqno, __entry->completed) ); -#else -#if !defined(TRACE_HEADER_MULTI_READ) -static inline void -trace_i915_request_submit(struct i915_request *rq) -{ -} - -static inline void -trace_i915_request_in(struct i915_request *rq, unsigned int port) -{ -} - -static inline void -trace_i915_request_out(struct i915_request *rq) -{ -} -#endif -#endif - TRACE_EVENT(intel_engine_notify, TP_PROTO(struct intel_engine_cs *engine, bool waiters), TP_ARGS(engine, waiters),