[03/14] drm/i915: Framework for capturing command stream based OA reports.

From: Sourab Gupta <sourab.gupta@intel.com>

From: Sourab Gupta <sourab.gupta@intel.com>

This patch introduces a framework to capture OA counter reports associated
with Render command stream. We can then associate the reports captured
through this mechanism with their corresponding context id's. This can be
further extended to associate any other metadata information with the
corresponding samples (since the association with Render command stream
gives us the ability to capture these information while inserting the
corresponding capture commands into the command stream).

The OA reports generated in this way are associated with a corresponding
workload, and thus can be used the delimit the workload (i.e. sample the
counters at the workload boundaries), within an ongoing stream of periodic
counter snapshots.

There may be usecases wherein we need more than periodic OA capture mode
which is supported currently. This mode is primarily used for two usecases:
    - Ability to capture system wide metrics, alongwith the ability to map
      the reports back to individual contexts (particularly for HSW).
    - Ability to inject tags for work, into the reports. This provides
      visibility into the multiple stages of work within single context.

The userspace will be able to distinguish between the periodic and CS based
OA reports by the virtue of source_info sample field.

The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, and is inserted at BB boundaries.
The data thus captured will be stored in a separate buffer, which will
be different from the buffer used otherwise for periodic OA capture mode.
The metadata information pertaining to snapshot is maintained in a list,
which also has offsets into the gem buffer object per captured snapshot.
In order to track whether the gpu has completed processing the node,
a field pertaining to corresponding gem request is added, which is tracked
for completion of the command.

Both periodic and CS based reports are associated with a single stream
(corresponding to render engine), and it is expected to have the samples
in the sequential order according to their timestamps. Now, since these
reports are collected in separate buffers, these are merge sorted at the
time of forwarding to userspace during the read call.

v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
few related patches are squashed together for better readability

v3: Updated perf sample capture emit hook name. Reserving space upfront
in the ring for emitting sample capture commands and using
req->fence.seqno for tracking samples. Added SRCU protection for streams.
Changed the stream last_request tracking to resv object. (Chris)
Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
stream to global per-engine structure. (Sagar)
Update unpin and put in the free routines to i915_vma_unpin_and_release.
Making use of perf stream cs_buffer vma resv instead of separate resv obj.
Pruned perf stream vma resv during gem_idle. (Chris)
Changed payload field ctx_id to u64 to keep all sample data aligned at 8
bytes. (Lionel)
stall/flush prior to sample capture is not added. Do we need to give this
control to user to select whether to stall/flush at each sample?

v4: Removed state enum and relying on bool state. Kept single srcu as
currently only RCS stream is opened. Removed ctx id support, will be added
in the next patch in series. polling workqueue related structures are kept
singleton too for now. (Lionel)
Moved the CS samples allocation from request emission to submission as the
backed __i915_gem_request_submit determines the order of execution on the
hardware. Changed assigning the offset from cs_buffer vma to sample based
to make sure we reuse the samples non-linearly in case some samples are
discarded due to preemption later.
Added sample identifier to know where in the request execution sample is
captured. This can be useful to associate OA reports based on begin and
end of request execution.

v5: Removed periodic check from stream_wait_ioctl to allow opening
stream without OA data but with other CS properties. Added comment about
handling preemption while reading cs_samples.

Testcase: igt/intel_perf_dapc
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |   94 ++-
 drivers/gpu/drm/i915/i915_gem.c            |    1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
 drivers/gpu/drm/i915/i915_gem_request.c    |    2 +
 drivers/gpu/drm/i915/i915_gem_request.h    |    3 +
 drivers/gpu/drm/i915/i915_perf.c           | 1147 ++++++++++++++++++++++------
 include/uapi/drm/i915_drm.h                |    8 +
 7 files changed, 1040 insertions(+), 223 deletions(-)

[03/14] drm/i915: Framework for capturing command stream based OA reports.

Commit Message

Patch