mbox series

[0/5] Per client engine busyness

Message ID 20191216120704.958-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
Headers show
Series Per client engine busyness | expand

Message

Tvrtko Ursulin Dec. 16, 2019, 12:06 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Another re-spin of the per-client engine busyness series.

Review feedback from last round has been addressed* and the tracking simplified.

(*Apart from re-using the ctx->idr_lock for the global toggle, I kept using
struct mutext for that.)

Internally we track time spent on engines for each struct intel_context. This
can serve as a building block for several features from the want list:
smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
wanted by some customers, cgroups controller, dynamic SSEU tuning,...

Externally, in sysfs, we expose time spent on GPU per client and per engine
class.

There is also a global toggle to enable this extra tracking although it is open
whether it is warranted and we should not just always track.

Sysfs interface enables us to implement a "top-like" tool for GPU tasks. Or with
a "screenshot":
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s

      IMC reads:     4414 MiB/s
     IMC writes:     3805 MiB/s

          ENGINE      BUSY                                      MI_SEMA MI_WAIT
     Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
       Blitter/0    0.00% |                                   |      0%      0%
         Video/0    0.00% |                                   |      0%      0%
  VideoEnhance/0    0.00% |                                   |      0%      0%

  PID            NAME  Render/3D      Blitter        Video      VideoEnhance
 2733       neverball |██████▌     ||            ||            ||            |
 2047            Xorg |███▊        ||            ||            ||            |
 2737        glxgears |█▍          ||            ||            ||            |
 2128           xfwm4 |            ||            ||            ||            |
 2047            Xorg |            ||            ||            ||            |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Implementation wise we add a a bunch of files in sysfs like:

	# cd /sys/class/drm/card0/clients/
	# tree
	.
	├── 7
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	├── 8
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	├── 9
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	└── enable_stats

Files in 'busy' directories are numbered using the engine class ABI values and
they contain accumulated nanoseconds each client spent on engines of a
respective class.

I will post the corresponding patch to intel_gpu_top for reference as well.

Tvrtko Ursulin (5):
  drm/i915: Track per-context engine busyness
  drm/i915: Expose list of clients in sysfs
  drm/i915: Update client name on context create
  drm/i915: Expose per-engine client busyness
  drm/i915: Add sysfs toggle to enable per-client engine stats

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  24 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |  20 ++
 drivers/gpu/drm/i915/gt/intel_context.h       |  11 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  16 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  47 +++-
 drivers/gpu/drm/i915/i915_drv.h               |  41 +++
 drivers/gpu/drm/i915/i915_gem.c               | 234 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_sysfs.c             |  84 +++++++
 9 files changed, 465 insertions(+), 21 deletions(-)

Comments

Chris Wilson Dec. 16, 2019, 1:09 p.m. UTC | #1
Quoting Tvrtko Ursulin (2019-12-16 12:06:59)
> Implementation wise we add a a bunch of files in sysfs like:
> 
>         # cd /sys/class/drm/card0/clients/
>         # tree
>         .
>         ├── 7
>         │   ├── busy
>         │   │   ├── 0

Prefer '0' over rcs?

> I will post the corresponding patch to intel_gpu_top for reference as well.

The other requirement is that we need to at least prove the sysfs
interface exists in gt. perf_sysfs?

Quick list,
- check igt_spin_t responses (pretty much verbatim of perf_pmu.c)
- check the client name is correct around fd passing
- check interactions with ctx->engines[]
-Chris
Tvrtko Ursulin Dec. 16, 2019, 1:20 p.m. UTC | #2
On 16/12/2019 13:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-12-16 12:06:59)
>> Implementation wise we add a a bunch of files in sysfs like:
>>
>>          # cd /sys/class/drm/card0/clients/
>>          # tree
>>          .
>>          ├── 7
>>          │   ├── busy
>>          │   │   ├── 0
> 
> Prefer '0' over rcs?

I think so, saves userspace keeping a map of names to class enum. Or 
maybe it doesn't, depends. Saves us having to come up with ABI names. 
But I think I could be easily convinced either way.

>> I will post the corresponding patch to intel_gpu_top for reference as well.
> 
> The other requirement is that we need to at least prove the sysfs
> interface exists in gt. perf_sysfs?
> 
> Quick list,
> - check igt_spin_t responses (pretty much verbatim of perf_pmu.c)
> - check the client name is correct around fd passing
> - check interactions with ctx->engines[]

Yep, I know it will be needed. But haven't been bothering yet since the 
series has been in a hopeless mode for what, two years or so. I forgot 
to name it RFC this time round.. :)

Regards,

Tvrtko