mbox series

[RFC,00/13] 21st century intel_gpu_top

Message ID 20181003120406.6784-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
Headers show
Series 21st century intel_gpu_top | expand

Message

Tvrtko Ursulin Oct. 3, 2018, 12:03 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A collection of patches which I have been sending before, sometimes together and
sometimes separately, which enable intel_gpu_top to report queue depths (also
translates as overall GPU load average) and per DRM client per engine busyness.

This enables a fancy intel_gpu_top which looks like this (a picture is worth a
thousand words):

intel-gpu-top - load avg  3.30,  1.51,  0.08;  949/ 949 MHz;    0% RC6;  14.66 Watts;     3605 irqs/s

      IMC reads:     4651 MiB/s
     IMC writes:       25 MiB/s

          ENGINE      BUSY                                                                                Q   r   R MI_SEMA MI_WAIT
     Render/3D/0    61.51% |█████████████████████████████████████████████▌                            |   3   0   1      0%      0%
       Blitter/0     0.00% |                                                                          |   0   0   0      0%      0%
         Video/0    60.86% |█████████████████████████████████████████████                             |   1   0   1      0%      0%
         Video/1    59.04% |███████████████████████████████████████████▋                              |   1   0   1      0%      0%
  VideoEnhance/0     0.00% |                                                                          |   0   0   0      0%      0%

  PID            NAME     Render/3D/0            Blitter/0              Video/0               Video/1            VideoEnhance/0
23373        gem_wsim |█████▎              ||                    ||████████▍           ||█████▎              ||                    |
23374        gem_wsim |███▉                ||                    ||██▏                 ||███                 ||                    |
23375        gem_wsim |███                 ||                    ||█▍                  ||███▌                ||                    |

All of this work actually came to be via different feature requests not directly
asking for this. Things like engine queue depth query and per context engine
busyness ioctl. Those bits need userspace which is not there yet and so I have
removed them from this posting to avoid confusion.

What remains is a set of patches which add some PMU counters and a completely
new sysfs interface to enable intel_gpu_top to read the per client stats.

IGT counterpart will be sent separately.

Tvrtko Ursulin (13):
  drm/i915/pmu: Fix enable count array size and bounds checking
  drm/i915: Keep a count of requests waiting for a slot on GPU
  drm/i915: Keep a count of requests submitted from userspace
  drm/i915/pmu: Add queued counter
  drm/i915/pmu: Add runnable counter
  drm/i915/pmu: Add running counter
  drm/i915: Store engine backpointer in the intel_context
  drm/i915: Move intel_engine_context_in/out into intel_lrc.c
  drm/i915: Track per-context engine busyness
  drm/i915: Expose list of clients in sysfs
  drm/i915: Update client name on context create
  drm/i915: Expose per-engine client busyness
  drm/i915: Add sysfs toggle to enable per-client engine stats

 drivers/gpu/drm/i915/i915_drv.h         |  39 +++++
 drivers/gpu/drm/i915/i915_gem.c         | 197 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.c |  18 ++-
 drivers/gpu/drm/i915/i915_gem_context.h |  18 +++
 drivers/gpu/drm/i915/i915_pmu.c         | 103 +++++++++++--
 drivers/gpu/drm/i915/i915_request.c     |  10 ++
 drivers/gpu/drm/i915/i915_sysfs.c       |  81 ++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |  33 +++-
 drivers/gpu/drm/i915/intel_lrc.c        | 109 ++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  76 +++------
 include/uapi/drm/i915_drm.h             |  19 ++-
 11 files changed, 614 insertions(+), 89 deletions(-)

Comments

Chris Wilson Oct. 3, 2018, 12:36 p.m. UTC | #1
Quoting Tvrtko Ursulin (2018-10-03 13:03:53)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A collection of patches which I have been sending before, sometimes together and
> sometimes separately, which enable intel_gpu_top to report queue depths (also
> translates as overall GPU load average) and per DRM client per engine busyness.

Queued falls apart with v.engine and I don't have a good suggestion for
a remedy. :(
-Chris
Tvrtko Ursulin Oct. 3, 2018, 12:57 p.m. UTC | #2
On 03/10/2018 13:36, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-10-03 13:03:53)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> A collection of patches which I have been sending before, sometimes together and
>> sometimes separately, which enable intel_gpu_top to report queue depths (also
>> translates as overall GPU load average) and per DRM client per engine busyness.
> 
> Queued falls apart with v.engine and I don't have a good suggestion for
> a remedy. :(

Indeed, I forgot about it. I have now even found a few months old branch 
with queued and runnable removed already.

I think we also talked about the option of exposing aggregate engine 
class counters but that also has problems.

We could go global and not expose this per engine, but that wouldn't 
make <gen11 users happy.

Regards,

Tvrtko
Tvrtko Ursulin Oct. 10, 2018, 11:49 a.m. UTC | #3
On 03/10/2018 13:03, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A collection of patches which I have been sending before, sometimes together and
> sometimes separately, which enable intel_gpu_top to report queue depths (also
> translates as overall GPU load average) and per DRM client per engine busyness.
> 
> This enables a fancy intel_gpu_top which looks like this (a picture is worth a
> thousand words):
> 
> intel-gpu-top - load avg  3.30,  1.51,  0.08;  949/ 949 MHz;    0% RC6;  14.66 Watts;     3605 irqs/s
> 
>        IMC reads:     4651 MiB/s
>       IMC writes:       25 MiB/s
> 
>            ENGINE      BUSY                                                                                Q   r   R MI_SEMA MI_WAIT
>       Render/3D/0    61.51% |█████████████████████████████████████████████▌                            |   3   0   1      0%      0%
>         Blitter/0     0.00% |                                                                          |   0   0   0      0%      0%
>           Video/0    60.86% |█████████████████████████████████████████████                             |   1   0   1      0%      0%
>           Video/1    59.04% |███████████████████████████████████████████▋                              |   1   0   1      0%      0%
>    VideoEnhance/0     0.00% |                                                                          |   0   0   0      0%      0%
> 
>    PID            NAME     Render/3D/0            Blitter/0              Video/0               Video/1            VideoEnhance/0
> 23373        gem_wsim |█████▎              ||                    ||████████▍           ||█████▎              ||                    |
> 23374        gem_wsim |███▉                ||                    ||██▏                 ||███                 ||                    |
> 23375        gem_wsim |███                 ||                    ||█▍                  ||███▌                ||                    |
> 
> All of this work actually came to be via different feature requests not directly
> asking for this. Things like engine queue depth query and per context engine
> busyness ioctl. Those bits need userspace which is not there yet and so I have
> removed them from this posting to avoid confusion.
> 
> What remains is a set of patches which add some PMU counters and a completely
> new sysfs interface to enable intel_gpu_top to read the per client stats.
> 
> IGT counterpart will be sent separately.

FWIW at least one more person thinks this would be a nice to have 
feature - https://twitter.com/IntelGraphics/status/1047991913972826112. 
But it sure feels weird to cross-link twitter to intel-gfx! Sign of 
times.. :)

Regards,

Tvrtko

> 
> Tvrtko Ursulin (13):
>    drm/i915/pmu: Fix enable count array size and bounds checking
>    drm/i915: Keep a count of requests waiting for a slot on GPU
>    drm/i915: Keep a count of requests submitted from userspace
>    drm/i915/pmu: Add queued counter
>    drm/i915/pmu: Add runnable counter
>    drm/i915/pmu: Add running counter
>    drm/i915: Store engine backpointer in the intel_context
>    drm/i915: Move intel_engine_context_in/out into intel_lrc.c
>    drm/i915: Track per-context engine busyness
>    drm/i915: Expose list of clients in sysfs
>    drm/i915: Update client name on context create
>    drm/i915: Expose per-engine client busyness
>    drm/i915: Add sysfs toggle to enable per-client engine stats
> 
>   drivers/gpu/drm/i915/i915_drv.h         |  39 +++++
>   drivers/gpu/drm/i915/i915_gem.c         | 197 +++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.c |  18 ++-
>   drivers/gpu/drm/i915/i915_gem_context.h |  18 +++
>   drivers/gpu/drm/i915/i915_pmu.c         | 103 +++++++++++--
>   drivers/gpu/drm/i915/i915_request.c     |  10 ++
>   drivers/gpu/drm/i915/i915_sysfs.c       |  81 ++++++++++
>   drivers/gpu/drm/i915/intel_engine_cs.c  |  33 +++-
>   drivers/gpu/drm/i915/intel_lrc.c        | 109 ++++++++++++-
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  76 +++------
>   include/uapi/drm/i915_drm.h             |  19 ++-
>   11 files changed, 614 insertions(+), 89 deletions(-)
>