diff mbox series

[RFC,3/3] drm/i915/gt: Export device and per-process runtimes via procfs

Message ID 20210204121121.2660-3-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [RFC,1/3] proc: Show GPU runtimes | expand

Commit Message

Chris Wilson Feb. 4, 2021, 12:11 p.m. UTC
Register with /proc/gpu to provide the client runtimes for generic
top-like overview, e.g. gnome-system-monitor can use this information to
show the per-process multi-GPU usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile            |  1 +
 drivers/gpu/drm/i915/gt/intel_gt.c       |  5 ++
 drivers/gpu/drm/i915/gt/intel_gt_proc.c  | 66 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_proc.h  | 14 +++++
 drivers/gpu/drm/i915/gt/intel_gt_types.h |  3 ++
 5 files changed, 89 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_proc.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_proc.h

Comments

Emil Velikov Feb. 12, 2021, 2:57 p.m. UTC | #1
Hi Chris,

On Thu, 4 Feb 2021 at 12:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Register with /proc/gpu to provide the client runtimes for generic
> top-like overview, e.g. gnome-system-monitor can use this information to
> show the per-process multi-GPU usage.
>
Exposing this information to userspace sounds great IMHO and like the
proposed "channels" for the device engines.
If it were me, I would have the channel names a) exposed to userspace
and b) be a "fixed set".

Whereby with a "fixed set" I mean, we should have these akin to the
KMS UAPI properties, where we have core helpers exposing prop X/Y and
there should be no driver specific ones.
This would allow for consistent and deterministic userspace handling,
even if some hardware/drivers do not have all engines - say no copy
engine.


> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_proc.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: MIT
Thanks for making these available under MIT.

> +/*
> + * Copyright © 2020 Intel Corporation

Might want to make this 2021 in the next revision.

HTH
Emil
Chris Wilson Feb. 12, 2021, 3:16 p.m. UTC | #2
Quoting Emil Velikov (2021-02-12 14:57:56)
> Hi Chris,
> 
> On Thu, 4 Feb 2021 at 12:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Register with /proc/gpu to provide the client runtimes for generic
> > top-like overview, e.g. gnome-system-monitor can use this information to
> > show the per-process multi-GPU usage.
> >
> Exposing this information to userspace sounds great IMHO and like the
> proposed "channels" for the device engines.
> If it were me, I would have the channel names a) exposed to userspace
> and b) be a "fixed set".

- Total
- Graphics
- Compute
- Unified
- Video
- Copy
- Display
- Other

Enough versatility for the foreseeable future?
But plan for extension.

The other aspect then is the capacity of each channel. We can keep it
simple as the union/average (whichever the driver has to hand) runtime in
nanoseconds over all IP blocks within a channel.
-Chris
Emil Velikov Feb. 12, 2021, 3:45 p.m. UTC | #3
On Fri, 12 Feb 2021 at 15:16, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Emil Velikov (2021-02-12 14:57:56)
> > Hi Chris,
> >
> > On Thu, 4 Feb 2021 at 12:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > >
> > > Register with /proc/gpu to provide the client runtimes for generic
> > > top-like overview, e.g. gnome-system-monitor can use this information to
> > > show the per-process multi-GPU usage.
> > >
> > Exposing this information to userspace sounds great IMHO and like the
> > proposed "channels" for the device engines.
> > If it were me, I would have the channel names a) exposed to userspace
> > and b) be a "fixed set".
>
> - Total
> - Graphics
> - Compute
> - Unified
> - Video
> - Copy
> - Display
> - Other
>
> Enough versatility for the foreseeable future?
> But plan for extension.
>
With a bit of documentation about "unified" (is it a metric also
counted towards any of the rest) it would be perfect.
For future extension one might consider splitting video into
encoder/decoder/post-processing.

> The other aspect then is the capacity of each channel. We can keep it
> simple as the union/average (whichever the driver has to hand) runtime in
> nanoseconds over all IP blocks within a channel.

Not sure what you mean with capacity. Are you referring to having
multiple instances of the same engine (say 3 separate copy engines)?
Personally I'm inclined to keep these separate entries, since some
hardware can have multiple ones.

For example - before the latest changes nouveau had 8 copy engines,
3+3 video 'generic' video (enc,dec)oder engines, amongst others.

Thanks
Emil
Chris Wilson Feb. 12, 2021, 4:07 p.m. UTC | #4
Quoting Emil Velikov (2021-02-12 15:45:04)
> On Fri, 12 Feb 2021 at 15:16, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Quoting Emil Velikov (2021-02-12 14:57:56)
> > > Hi Chris,
> > >
> > > On Thu, 4 Feb 2021 at 12:11, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > >
> > > > Register with /proc/gpu to provide the client runtimes for generic
> > > > top-like overview, e.g. gnome-system-monitor can use this information to
> > > > show the per-process multi-GPU usage.
> > > >
> > > Exposing this information to userspace sounds great IMHO and like the
> > > proposed "channels" for the device engines.
> > > If it were me, I would have the channel names a) exposed to userspace
> > > and b) be a "fixed set".
> >
> > - Total
> > - Graphics
> > - Compute
> > - Unified
> > - Video
> > - Copy
> > - Display
> > - Other
> >
> > Enough versatility for the foreseeable future?
> > But plan for extension.
> >
> With a bit of documentation about "unified" (is it a metric also
> counted towards any of the rest) it would be perfect.

With unified I was trying to find a place to things that are neither
wholly graphics nor compute, as some may prefer not to categorise
themselves as one or the other. Also whether or not some cores are more
compute than others (so should there be an AI/RT/ALU?)

> For future extension one might consider splitting video into
> encoder/decoder/post-processing.

Ok, I wasn't sure how commonly those functions were split on different
HW.

> > The other aspect then is the capacity of each channel. We can keep it
> > simple as the union/average (whichever the driver has to hand) runtime in
> > nanoseconds over all IP blocks within a channel.
> 
> Not sure what you mean with capacity. Are you referring to having
> multiple instances of the same engine (say 3 separate copy engines)?
> Personally I'm inclined to keep these separate entries, since some
> hardware can have multiple ones.
> 
> For example - before the latest changes nouveau had 8 copy engines,
> 3+3 video 'generic' video (enc,dec)oder engines, amongst others.

Yes, most HW have multiple engines within a family. Trying to keep it
simple, I thought presenting just one runtime metric for the whole
channel. Especially for the single-line per device format I had picked :)

If we switch to a more extensible format,

	-'$device0' : 
		-$channel0 : {
			Total : $total # avg/union over all engines
			Engines : [ $0, $1, ... ]
		}
		...

	-'$device1' : 
		...

Using the same fixed channel names, and dev_name(), pesky concerns such
as keeping it as a simple scanf can be forgotten.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index ce01634d4ea7..16171f65f5d1 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -104,6 +104,7 @@  gt-y += \
 	gt/intel_gt_irq.o \
 	gt/intel_gt_pm.o \
 	gt/intel_gt_pm_irq.o \
+	gt/intel_gt_proc.o \
 	gt/intel_gt_requests.o \
 	gt/intel_gtt.o \
 	gt/intel_llc.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index ca76f93bc03d..72199c13330d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -12,6 +12,7 @@ 
 #include "intel_gt_buffer_pool.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_pm.h"
+#include "intel_gt_proc.h"
 #include "intel_gt_requests.h"
 #include "intel_mocs.h"
 #include "intel_rc6.h"
@@ -373,6 +374,8 @@  void intel_gt_driver_register(struct intel_gt *gt)
 	intel_rps_driver_register(&gt->rps);
 
 	debugfs_gt_register(gt);
+
+	intel_gt_driver_register__proc(gt);
 }
 
 static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
@@ -656,6 +659,8 @@  void intel_gt_driver_unregister(struct intel_gt *gt)
 {
 	intel_wakeref_t wakeref;
 
+	intel_gt_driver_unregister__proc(gt);
+
 	intel_rps_driver_unregister(&gt->rps);
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_proc.c b/drivers/gpu/drm/i915/gt/intel_gt_proc.c
new file mode 100644
index 000000000000..42db22326c7c
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_proc.c
@@ -0,0 +1,66 @@ 
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include <linux/proc_gpu.h>
+
+#include "i915_drm_client.h"
+#include "i915_drv.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_proc.h"
+
+static void proc_runtime_pid(struct intel_gt *gt,
+			     struct pid *pid,
+			     struct proc_gpu_runtime *rt)
+{
+	struct i915_drm_clients *clients = &gt->i915->clients;
+
+	BUILD_BUG_ON(MAX_ENGINE_CLASS >= ARRAY_SIZE(rt->channel));
+
+	rt->device = i915_drm_clients_get_runtime(clients, pid, rt->channel);
+	rt->nchannel = MAX_ENGINE_CLASS + 1;
+}
+
+static void proc_runtime_device(struct intel_gt *gt,
+				struct pid *pid,
+				struct proc_gpu_runtime *rt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	ktime_t dummy;
+
+	rt->nchannel = 0;
+	for_each_engine(engine, gt, id) {
+		rt->channel[rt->nchannel++] =
+			intel_engine_get_busy_time(engine, &dummy);
+		if (rt->nchannel == ARRAY_SIZE(rt->channel))
+			break;
+	}
+	rt->device = intel_gt_get_awake_time(gt);
+}
+
+static void proc_runtime(struct proc_gpu *pg,
+			 struct pid *pid,
+			 struct proc_gpu_runtime *rt)
+{
+	struct intel_gt *gt = container_of(pg, typeof(*gt), proc);
+
+	strscpy(rt->name, dev_name(gt->i915->drm.dev), sizeof(rt->name));
+	if (pid)
+		proc_runtime_pid(gt, pid, rt);
+	else
+		proc_runtime_device(gt, pid, rt);
+}
+
+void intel_gt_driver_register__proc(struct intel_gt *gt)
+{
+	gt->proc.fn = proc_runtime;
+	proc_gpu_register(&gt->proc);
+}
+
+void intel_gt_driver_unregister__proc(struct intel_gt *gt)
+{
+	proc_gpu_unregister(&gt->proc);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_proc.h b/drivers/gpu/drm/i915/gt/intel_gt_proc.h
new file mode 100644
index 000000000000..7a9bff0fb020
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_proc.h
@@ -0,0 +1,14 @@ 
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef INTEL_GT_PROC_H
+#define INTEL_GT_PROC_H
+
+struct intel_gt;
+
+void intel_gt_driver_register__proc(struct intel_gt *gt);
+void intel_gt_driver_unregister__proc(struct intel_gt *gt);
+
+#endif /* INTEL_GT_PROC_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 626af37c7790..3fc6d9741764 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -10,6 +10,7 @@ 
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
+#include <linux/proc_gpu.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
@@ -135,6 +136,8 @@  struct intel_gt {
 
 	struct i915_vma *scratch;
 
+	struct proc_gpu proc;
+
 	struct intel_gt_info {
 		intel_engine_mask_t engine_mask;
 		u8 num_engines;