mbox series

[RFC,0/8] GPU memory tracepoints

Message ID 20211021031027.537-1-gurchetansingh@chromium.org (mailing list archive)
Headers show
Series GPU memory tracepoints | expand

Message

Gurchetan Singh Oct. 21, 2021, 3:10 a.m. UTC
This is latest iteration of GPU memory tracepoints [1].

In the past, there were questions about the "big picture" of memory  
accounting [2], especially given related work on dma-buf heaps and DRM
cgroups [3].  Also, there was a desire for a non-driver specific solution.

The great news is the dma-buf heaps work as recently landed [4].  It uses
sys-fs and the plan is to use it in conjunction with the tracepoint
solution [5].  We're aiming for the GPU tracepoint to calculate totals
per DRM-instance (a proxy for per-process on Android) and per-DRM device.

The cgroups work looks terrific too and hopefully we can deduplicate code once
that's merged.  Though that's abit of an implementation detail, so long as
the "GPU tracepoints" +  "dma-buf heap stats" plan sounds good for Android.

This series modifies the GPU memory tracepoint API in a non-breaking fashion
(patch 1), and adds accounting via the GEM subsystem (patches 2 --> 7). Given
the multiple places where memory events happen, there's a bunch trace events
scattered in various places.  The hardest part is allocation, where each driver
has their own API.  If there's a better way, do say so.

The last patch is incomplete; we would like general feedback before proceeding
further.

[1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyiwei@google.com/
[2] https://lists.freedesktop.org/archives/dri-devel/2021-January/295120.html
[3] https://www.spinics.net/lists/cgroups/msg27867.html
[4] https://www.spinics.net/lists/linux-doc/msg97788.html
[5] https://source.android.com/devices/graphics/implement-dma-buf-gpu-mem

Gurchetan Singh (8):
  tracing/gpu: modify gpu_mem_total
  drm: add new tracepoint fields to drm_device and drm_file
  drm: add helper functions for gpu_mem_total and gpu_mem_instance
  drm: start using drm_gem_trace_gpu_mem_total
  drm: start using drm_gem_trace_gpu_mem_instance
  drm: track real and fake imports in drm_prime_member
  drm: trace memory import per DRM file
  drm: trace memory import per DRM device

 drivers/gpu/drm/Kconfig        |  1 +
 drivers/gpu/drm/drm_gem.c      | 65 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/drm_internal.h |  4 +--
 drivers/gpu/drm/drm_prime.c    | 22 +++++++++---
 include/drm/drm_device.h       | 16 +++++++++
 include/drm/drm_file.h         | 16 +++++++++
 include/drm/drm_gem.h          |  7 ++++
 include/trace/events/gpu_mem.h | 61 +++++++++++++++++++++----------
 8 files changed, 166 insertions(+), 26 deletions(-)

Comments

Daniel Vetter Oct. 21, 2021, noon UTC | #1
On Wed, Oct 20, 2021 at 08:10:19PM -0700, Gurchetan Singh wrote:
> This is latest iteration of GPU memory tracepoints [1].
> 
> In the past, there were questions about the "big picture" of memory  
> accounting [2], especially given related work on dma-buf heaps and DRM
> cgroups [3].  Also, there was a desire for a non-driver specific solution.
> 
> The great news is the dma-buf heaps work as recently landed [4].  It uses
> sys-fs and the plan is to use it in conjunction with the tracepoint
> solution [5].  We're aiming for the GPU tracepoint to calculate totals
> per DRM-instance (a proxy for per-process on Android) and per-DRM device.
> 
> The cgroups work looks terrific too and hopefully we can deduplicate code once
> that's merged.  Though that's abit of an implementation detail, so long as
> the "GPU tracepoints" +  "dma-buf heap stats" plan sounds good for Android.

Can we please start out with depulicated code, and integrate this with
cgroups?

The problem with gpu memory account is that everyone wants their own
thing, they're all slightly differently, and all supported by a different
subset of drivers. That doesn't make sense to support in upstream at all.

Please huddle together so that there's one set of "track gpu memory"
calls, and that does cgroups, tracepoints and everything else that an OS
might want to have.

Also ideally this thing works for both integrated soc gpu (including an
answer for special memory pools like cma) _and_ discrete gpus using ttm.
Or at least has an answer to both, because again if we end up with totally
different tracking for the soc vs the discrete gpu world, we've lost.
-Daniel

> 
> This series modifies the GPU memory tracepoint API in a non-breaking fashion
> (patch 1), and adds accounting via the GEM subsystem (patches 2 --> 7). Given
> the multiple places where memory events happen, there's a bunch trace events
> scattered in various places.  The hardest part is allocation, where each driver
> has their own API.  If there's a better way, do say so.
> 
> The last patch is incomplete; we would like general feedback before proceeding
> further.
> 
> [1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyiwei@google.com/
> [2] https://lists.freedesktop.org/archives/dri-devel/2021-January/295120.html
> [3] https://www.spinics.net/lists/cgroups/msg27867.html
> [4] https://www.spinics.net/lists/linux-doc/msg97788.html
> [5] https://source.android.com/devices/graphics/implement-dma-buf-gpu-mem
> 
> Gurchetan Singh (8):
>   tracing/gpu: modify gpu_mem_total
>   drm: add new tracepoint fields to drm_device and drm_file
>   drm: add helper functions for gpu_mem_total and gpu_mem_instance
>   drm: start using drm_gem_trace_gpu_mem_total
>   drm: start using drm_gem_trace_gpu_mem_instance
>   drm: track real and fake imports in drm_prime_member
>   drm: trace memory import per DRM file
>   drm: trace memory import per DRM device
> 
>  drivers/gpu/drm/Kconfig        |  1 +
>  drivers/gpu/drm/drm_gem.c      | 65 +++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/drm_internal.h |  4 +--
>  drivers/gpu/drm/drm_prime.c    | 22 +++++++++---
>  include/drm/drm_device.h       | 16 +++++++++
>  include/drm/drm_file.h         | 16 +++++++++
>  include/drm/drm_gem.h          |  7 ++++
>  include/trace/events/gpu_mem.h | 61 +++++++++++++++++++++----------
>  8 files changed, 166 insertions(+), 26 deletions(-)
> 
> -- 
> 2.25.1
>
Kalesh Singh Oct. 21, 2021, 10:38 p.m. UTC | #2
On Thu, Oct 21, 2021 at 5:00 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Oct 20, 2021 at 08:10:19PM -0700, Gurchetan Singh wrote:
> > This is latest iteration of GPU memory tracepoints [1].
> >
> > In the past, there were questions about the "big picture" of memory
> > accounting [2], especially given related work on dma-buf heaps and DRM
> > cgroups [3].  Also, there was a desire for a non-driver specific solution.
> >
> > The great news is the dma-buf heaps work as recently landed [4].  It uses
> > sys-fs and the plan is to use it in conjunction with the tracepoint
> > solution [5].  We're aiming for the GPU tracepoint to calculate totals
> > per DRM-instance (a proxy for per-process on Android) and per-DRM device.
> >
> > The cgroups work looks terrific too and hopefully we can deduplicate code once
> > that's merged.  Though that's abit of an implementation detail, so long as
> > the "GPU tracepoints" +  "dma-buf heap stats" plan sounds good for Android.
>
> Can we please start out with depulicated code, and integrate this with
> cgroups?

Thanks for the comments Dan,

The cgroups work is currently targeting allocator attribution so it
wouldn’t give insight to shared / imported memory - this is included
as part of the totals in the tracepoint. We will start a separate
discussion with the gpu community on including imported memory into
cgroups design. Who would you recommend to be included? (in case we
don't already know all the interested parties).

The current tracepoint and the cgroups are not conflicting designs but
rather complimentary. These are some of the gaps which the tracepoint
helps to cover:
1. Imported gpu memory accounting
2. The tracepoint can be used to track gpu memory usage over time
(useful to detect memory usage spikes, for example), while cgroups can
be used to view usage as a more granular and static state.
3. For systems where cgroups aren't enabled the tracepoint data can be
a good alternative to identify memory issues.
4. Non-drm devices can make use of the tracepoint for reporting.

It would be great if we can also keep the tracepoint, as we don’t have
another alternative that provides all it offers (cgroups can certainly
be extended to cover some of these),  and it's currently being used by
all Android devices.

Thanks,
Kalesh

>
> The problem with gpu memory account is that everyone wants their own
> thing, they're all slightly differently, and all supported by a different
> subset of drivers. That doesn't make sense to support in upstream at all.
>
> Please huddle together so that there's one set of "track gpu memory"
> calls, and that does cgroups, tracepoints and everything else that an OS
> might want to have.
>
> Also ideally this thing works for both integrated soc gpu (including an
> answer for special memory pools like cma) _and_ discrete gpus using ttm.
> Or at least has an answer to both, because again if we end up with totally
> different tracking for the soc vs the discrete gpu world, we've lost.
> -Daniel
>
> >
> > This series modifies the GPU memory tracepoint API in a non-breaking fashion
> > (patch 1), and adds accounting via the GEM subsystem (patches 2 --> 7). Given
> > the multiple places where memory events happen, there's a bunch trace events
> > scattered in various places.  The hardest part is allocation, where each driver
> > has their own API.  If there's a better way, do say so.
> >
> > The last patch is incomplete; we would like general feedback before proceeding
> > further.
> >
> > [1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyiwei@google.com/
> > [2] https://lists.freedesktop.org/archives/dri-devel/2021-January/295120.html
> > [3] https://www.spinics.net/lists/cgroups/msg27867.html
> > [4] https://www.spinics.net/lists/linux-doc/msg97788.html
> > [5] https://source.android.com/devices/graphics/implement-dma-buf-gpu-mem
> >
> > Gurchetan Singh (8):
> >   tracing/gpu: modify gpu_mem_total
> >   drm: add new tracepoint fields to drm_device and drm_file
> >   drm: add helper functions for gpu_mem_total and gpu_mem_instance
> >   drm: start using drm_gem_trace_gpu_mem_total
> >   drm: start using drm_gem_trace_gpu_mem_instance
> >   drm: track real and fake imports in drm_prime_member
> >   drm: trace memory import per DRM file
> >   drm: trace memory import per DRM device
> >
> >  drivers/gpu/drm/Kconfig        |  1 +
> >  drivers/gpu/drm/drm_gem.c      | 65 +++++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/drm_internal.h |  4 +--
> >  drivers/gpu/drm/drm_prime.c    | 22 +++++++++---
> >  include/drm/drm_device.h       | 16 +++++++++
> >  include/drm/drm_file.h         | 16 +++++++++
> >  include/drm/drm_gem.h          |  7 ++++
> >  include/trace/events/gpu_mem.h | 61 +++++++++++++++++++++----------
> >  8 files changed, 166 insertions(+), 26 deletions(-)
> >
> > --
> > 2.25.1
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Kalesh Singh Nov. 1, 2021, 6:54 p.m. UTC | #3
On Thu, Oct 21, 2021 at 3:38 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> On Thu, Oct 21, 2021 at 5:00 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Oct 20, 2021 at 08:10:19PM -0700, Gurchetan Singh wrote:
> > > This is latest iteration of GPU memory tracepoints [1].
> > >
> > > In the past, there were questions about the "big picture" of memory
> > > accounting [2], especially given related work on dma-buf heaps and DRM
> > > cgroups [3].  Also, there was a desire for a non-driver specific solution.
> > >
> > > The great news is the dma-buf heaps work as recently landed [4].  It uses
> > > sys-fs and the plan is to use it in conjunction with the tracepoint
> > > solution [5].  We're aiming for the GPU tracepoint to calculate totals
> > > per DRM-instance (a proxy for per-process on Android) and per-DRM device.
> > >
> > > The cgroups work looks terrific too and hopefully we can deduplicate code once
> > > that's merged.  Though that's abit of an implementation detail, so long as
> > > the "GPU tracepoints" +  "dma-buf heap stats" plan sounds good for Android.
> >
> > Can we please start out with depulicated code, and integrate this with
> > cgroups?
>
> Thanks for the comments Dan,
>
> The cgroups work is currently targeting allocator attribution so it
> wouldn’t give insight to shared / imported memory - this is included
> as part of the totals in the tracepoint. We will start a separate
> discussion with the gpu community on including imported memory into
> cgroups design. Who would you recommend to be included? (in case we
> don't already know all the interested parties).
>
> The current tracepoint and the cgroups are not conflicting designs but
> rather complimentary. These are some of the gaps which the tracepoint
> helps to cover:
> 1. Imported gpu memory accounting
> 2. The tracepoint can be used to track gpu memory usage over time
> (useful to detect memory usage spikes, for example), while cgroups can
> be used to view usage as a more granular and static state.
> 3. For systems where cgroups aren't enabled the tracepoint data can be
> a good alternative to identify memory issues.
> 4. Non-drm devices can make use of the tracepoint for reporting.
>
> It would be great if we can also keep the tracepoint, as we don’t have
> another alternative that provides all it offers (cgroups can certainly
> be extended to cover some of these),  and it's currently being used by
> all Android devices.

Hi Daniel,

We had a follow up discussion with Kenny on using drm cgroups. In
summary, we think that the tracepoints and croups here are orthogonal
and should not block each other. Would appreciate any advice you have
on moving this forward.

Thanks,
Kalesh

>
> Thanks,
> Kalesh
>
> >
> > The problem with gpu memory account is that everyone wants their own
> > thing, they're all slightly differently, and all supported by a different
> > subset of drivers. That doesn't make sense to support in upstream at all.
> >
> > Please huddle together so that there's one set of "track gpu memory"
> > calls, and that does cgroups, tracepoints and everything else that an OS
> > might want to have.
> >
> > Also ideally this thing works for both integrated soc gpu (including an
> > answer for special memory pools like cma) _and_ discrete gpus using ttm.
> > Or at least has an answer to both, because again if we end up with totally
> > different tracking for the soc vs the discrete gpu world, we've lost.
> > -Daniel
> >
> > >
> > > This series modifies the GPU memory tracepoint API in a non-breaking fashion
> > > (patch 1), and adds accounting via the GEM subsystem (patches 2 --> 7). Given
> > > the multiple places where memory events happen, there's a bunch trace events
> > > scattered in various places.  The hardest part is allocation, where each driver
> > > has their own API.  If there's a better way, do say so.
> > >
> > > The last patch is incomplete; we would like general feedback before proceeding
> > > further.
> > >
> > > [1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyiwei@google.com/
> > > [2] https://lists.freedesktop.org/archives/dri-devel/2021-January/295120.html
> > > [3] https://www.spinics.net/lists/cgroups/msg27867.html
> > > [4] https://www.spinics.net/lists/linux-doc/msg97788.html
> > > [5] https://source.android.com/devices/graphics/implement-dma-buf-gpu-mem
> > >
> > > Gurchetan Singh (8):
> > >   tracing/gpu: modify gpu_mem_total
> > >   drm: add new tracepoint fields to drm_device and drm_file
> > >   drm: add helper functions for gpu_mem_total and gpu_mem_instance
> > >   drm: start using drm_gem_trace_gpu_mem_total
> > >   drm: start using drm_gem_trace_gpu_mem_instance
> > >   drm: track real and fake imports in drm_prime_member
> > >   drm: trace memory import per DRM file
> > >   drm: trace memory import per DRM device
> > >
> > >  drivers/gpu/drm/Kconfig        |  1 +
> > >  drivers/gpu/drm/drm_gem.c      | 65 +++++++++++++++++++++++++++++++++-
> > >  drivers/gpu/drm/drm_internal.h |  4 +--
> > >  drivers/gpu/drm/drm_prime.c    | 22 +++++++++---
> > >  include/drm/drm_device.h       | 16 +++++++++
> > >  include/drm/drm_file.h         | 16 +++++++++
> > >  include/drm/drm_gem.h          |  7 ++++
> > >  include/trace/events/gpu_mem.h | 61 +++++++++++++++++++++----------
> > >  8 files changed, 166 insertions(+), 26 deletions(-)
> > >
> > > --
> > > 2.25.1
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Kalesh Singh Nov. 17, 2021, 6:06 p.m. UTC | #4
On Mon, Nov 1, 2021 at 11:54 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> On Thu, Oct 21, 2021 at 3:38 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > On Thu, Oct 21, 2021 at 5:00 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 08:10:19PM -0700, Gurchetan Singh wrote:
> > > > This is latest iteration of GPU memory tracepoints [1].
> > > >
> > > > In the past, there were questions about the "big picture" of memory
> > > > accounting [2], especially given related work on dma-buf heaps and DRM
> > > > cgroups [3].  Also, there was a desire for a non-driver specific solution.
> > > >
> > > > The great news is the dma-buf heaps work as recently landed [4].  It uses
> > > > sys-fs and the plan is to use it in conjunction with the tracepoint
> > > > solution [5].  We're aiming for the GPU tracepoint to calculate totals
> > > > per DRM-instance (a proxy for per-process on Android) and per-DRM device.
> > > >
> > > > The cgroups work looks terrific too and hopefully we can deduplicate code once
> > > > that's merged.  Though that's abit of an implementation detail, so long as
> > > > the "GPU tracepoints" +  "dma-buf heap stats" plan sounds good for Android.
> > >
> > > Can we please start out with depulicated code, and integrate this with
> > > cgroups?
> >
> > Thanks for the comments Dan,
> >
> > The cgroups work is currently targeting allocator attribution so it
> > wouldn’t give insight to shared / imported memory - this is included
> > as part of the totals in the tracepoint. We will start a separate
> > discussion with the gpu community on including imported memory into
> > cgroups design. Who would you recommend to be included? (in case we
> > don't already know all the interested parties).
> >
> > The current tracepoint and the cgroups are not conflicting designs but
> > rather complimentary. These are some of the gaps which the tracepoint
> > helps to cover:
> > 1. Imported gpu memory accounting
> > 2. The tracepoint can be used to track gpu memory usage over time
> > (useful to detect memory usage spikes, for example), while cgroups can
> > be used to view usage as a more granular and static state.
> > 3. For systems where cgroups aren't enabled the tracepoint data can be
> > a good alternative to identify memory issues.
> > 4. Non-drm devices can make use of the tracepoint for reporting.
> >
> > It would be great if we can also keep the tracepoint, as we don’t have
> > another alternative that provides all it offers (cgroups can certainly
> > be extended to cover some of these),  and it's currently being used by
> > all Android devices.
>
> Hi Daniel,
>
> We had a follow up discussion with Kenny on using drm cgroups. In
> summary, we think that the tracepoints and croups here are orthogonal
> and should not block each other. Would appreciate any advice you have
> on moving this forward.

Hi Daniel,

Friendly ping on this. After discussion with Kenny, we think the
tracepoint and cgroups are complimentary accounting mechanisms. One of
the main use cases for the tracepoint in Android is for profiling GPU
memory using tools like perfetto [1], instead of using periodic
polling. Are there still objections to this? Please advise.

[1] https://perfetto.dev/docs/quickstart/android-tracing

Thanks,
Kalesh

>
> Thanks,
> Kalesh
>
> >
> > Thanks,
> > Kalesh
> >
> > >
> > > The problem with gpu memory account is that everyone wants their own
> > > thing, they're all slightly differently, and all supported by a different
> > > subset of drivers. That doesn't make sense to support in upstream at all.
> > >
> > > Please huddle together so that there's one set of "track gpu memory"
> > > calls, and that does cgroups, tracepoints and everything else that an OS
> > > might want to have.
> > >
> > > Also ideally this thing works for both integrated soc gpu (including an
> > > answer for special memory pools like cma) _and_ discrete gpus using ttm.
> > > Or at least has an answer to both, because again if we end up with totally
> > > different tracking for the soc vs the discrete gpu world, we've lost.
> > > -Daniel
> > >
> > > >
> > > > This series modifies the GPU memory tracepoint API in a non-breaking fashion
> > > > (patch 1), and adds accounting via the GEM subsystem (patches 2 --> 7). Given
> > > > the multiple places where memory events happen, there's a bunch trace events
> > > > scattered in various places.  The hardest part is allocation, where each driver
> > > > has their own API.  If there's a better way, do say so.
> > > >
> > > > The last patch is incomplete; we would like general feedback before proceeding
> > > > further.
> > > >
> > > > [1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyiwei@google.com/
> > > > [2] https://lists.freedesktop.org/archives/dri-devel/2021-January/295120.html
> > > > [3] https://www.spinics.net/lists/cgroups/msg27867.html
> > > > [4] https://www.spinics.net/lists/linux-doc/msg97788.html
> > > > [5] https://source.android.com/devices/graphics/implement-dma-buf-gpu-mem
> > > >
> > > > Gurchetan Singh (8):
> > > >   tracing/gpu: modify gpu_mem_total
> > > >   drm: add new tracepoint fields to drm_device and drm_file
> > > >   drm: add helper functions for gpu_mem_total and gpu_mem_instance
> > > >   drm: start using drm_gem_trace_gpu_mem_total
> > > >   drm: start using drm_gem_trace_gpu_mem_instance
> > > >   drm: track real and fake imports in drm_prime_member
> > > >   drm: trace memory import per DRM file
> > > >   drm: trace memory import per DRM device
> > > >
> > > >  drivers/gpu/drm/Kconfig        |  1 +
> > > >  drivers/gpu/drm/drm_gem.c      | 65 +++++++++++++++++++++++++++++++++-
> > > >  drivers/gpu/drm/drm_internal.h |  4 +--
> > > >  drivers/gpu/drm/drm_prime.c    | 22 +++++++++---
> > > >  include/drm/drm_device.h       | 16 +++++++++
> > > >  include/drm/drm_file.h         | 16 +++++++++
> > > >  include/drm/drm_gem.h          |  7 ++++
> > > >  include/trace/events/gpu_mem.h | 61 +++++++++++++++++++++----------
> > > >  8 files changed, 166 insertions(+), 26 deletions(-)
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch