diff mbox

[RFC,7/8] drm/i915/tracepoints: Add backend level request in and out tracepoints

Message ID 1485518487-4464-8-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Jan. 27, 2017, 12:01 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Two new tracepoints placed at the call sites where requests are
actually passed to the GPU enable userspace to track engine
utilisation.

These tracepoints are only enabled when the
DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c |  2 ++
 drivers/gpu/drm/i915/i915_trace.h          | 49 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c           |  4 +++
 3 files changed, 55 insertions(+)

Comments

Chris Wilson Jan. 27, 2017, 12:27 p.m. UTC | #1
On Fri, Jan 27, 2017 at 12:01:26PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Two new tracepoints placed at the call sites where requests are
> actually passed to the GPU enable userspace to track engine
> utilisation.
> 
> These tracepoints are only enabled when the
> DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_guc_submission.c |  2 ++
>  drivers/gpu/drm/i915/i915_trace.h          | 49 ++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c           |  4 +++
>  3 files changed, 55 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 8ced9e26f075..beec88a30347 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -518,6 +518,8 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq)
>  	if (i915_vma_is_map_and_fenceable(rq->ring->vma))
>  		POSTING_READ_FW(GUC_STATUS);
>  
> +	trace_i915_gem_request_in(rq, 0);

But how to get out for guc? We could do a similar in for ringbuffer.
The original way I used it was in == dispatch, out == ring-notify, which
is why trace_i915_gem_ring_dispatch enabled the signaling.

Hmm. Following on from that we can add the out tracepoint as a
fence-callback. For the moment, I'd drop guc/legacy
trace_i915_gem_request_in and we will try something more magical. Though
once we do enable the fake GuC scheduler we will have an appropriate
place to drop an trace_i915_gem_request_out. Just leaving ringbuffer as
the odd one out, but for who arguably the in/out logic is not as
important.
-Chris
Tvrtko Ursulin Jan. 27, 2017, 1:59 p.m. UTC | #2
On 27/01/2017 12:27, Chris Wilson wrote:
> On Fri, Jan 27, 2017 at 12:01:26PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Two new tracepoints placed at the call sites where requests are
>> actually passed to the GPU enable userspace to track engine
>> utilisation.
>>
>> These tracepoints are only enabled when the
>> DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_guc_submission.c |  2 ++
>>  drivers/gpu/drm/i915/i915_trace.h          | 49 ++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/i915/intel_lrc.c           |  4 +++
>>  3 files changed, 55 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 8ced9e26f075..beec88a30347 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -518,6 +518,8 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq)
>>  	if (i915_vma_is_map_and_fenceable(rq->ring->vma))
>>  		POSTING_READ_FW(GUC_STATUS);
>>
>> +	trace_i915_gem_request_in(rq, 0);
>
> But how to get out for guc? We could do a similar in for ringbuffer.
> The original way I used it was in == dispatch, out == ring-notify, which
> is why trace_i915_gem_ring_dispatch enabled the signaling.

That was my thinking as well.

> Hmm. Following on from that we can add the out tracepoint as a
> fence-callback. For the moment, I'd drop guc/legacy
> trace_i915_gem_request_in and we will try something more magical. Though
> once we do enable the fake GuC scheduler we will have an appropriate
> place to drop an trace_i915_gem_request_out. Just leaving ringbuffer as
> the odd one out, but for who arguably the in/out logic is not as
> important.

Fence signalled is very lazy if no one is listening, no? So until the 
GUC backend I thought request_in and deriving the _out from _notify in 
userspace would be OK. Meaning enable signalling stays until then.

Regards,

Tvrtko
Chris Wilson Jan. 27, 2017, 2:07 p.m. UTC | #3
On Fri, Jan 27, 2017 at 01:59:54PM +0000, Tvrtko Ursulin wrote:
> On 27/01/2017 12:27, Chris Wilson wrote:
> >Hmm. Following on from that we can add the out tracepoint as a
> >fence-callback. For the moment, I'd drop guc/legacy
> >trace_i915_gem_request_in and we will try something more magical. Though
> >once we do enable the fake GuC scheduler we will have an appropriate
> >place to drop an trace_i915_gem_request_out. Just leaving ringbuffer as
> >the odd one out, but for who arguably the in/out logic is not as
> >important.
> 
> Fence signalled is very lazy if no one is listening, no? So until
> the GUC backend I thought request_in and deriving the _out from
> _notify in userspace would be OK. Meaning enable signalling stays
> until then.

It's lazy unless we use fence_add_callback(). I was thinking of some
magic inside the trace_request_in macro to add something like
fence_add_callback(this_fence, __trace_request_out);

It still has the problem request_out doesn't work unless request_in is
also being watched, but it has the advantage of being the same for all
generators (i.e. more convenient for userspace).

But as it seems limited to ringbuffer, we may consider it no longer
required?
-Chris
Tvrtko Ursulin Jan. 27, 2017, 2:18 p.m. UTC | #4
On 27/01/2017 14:07, Chris Wilson wrote:
> On Fri, Jan 27, 2017 at 01:59:54PM +0000, Tvrtko Ursulin wrote:
>> On 27/01/2017 12:27, Chris Wilson wrote:
>>> Hmm. Following on from that we can add the out tracepoint as a
>>> fence-callback. For the moment, I'd drop guc/legacy
>>> trace_i915_gem_request_in and we will try something more magical. Though
>>> once we do enable the fake GuC scheduler we will have an appropriate
>>> place to drop an trace_i915_gem_request_out. Just leaving ringbuffer as
>>> the odd one out, but for who arguably the in/out logic is not as
>>> important.
>>
>> Fence signalled is very lazy if no one is listening, no? So until
>> the GUC backend I thought request_in and deriving the _out from
>> _notify in userspace would be OK. Meaning enable signalling stays
>> until then.
>
> It's lazy unless we use fence_add_callback(). I was thinking of some
> magic inside the trace_request_in macro to add something like
> fence_add_callback(this_fence, __trace_request_out);
>
> It still has the problem request_out doesn't work unless request_in is
> also being watched, but it has the advantage of being the same for all
> generators (i.e. more convenient for userspace).
>
> But as it seems limited to ringbuffer, we may consider it no longer
> required?

You mean bank on the GUC backend getting in soon? Yeah, that sounds 
reasonable. I'll drop the sw signalling from notify then.

Regards,

Tvrtko
Chris Wilson Jan. 27, 2017, 2:29 p.m. UTC | #5
On Fri, Jan 27, 2017 at 02:18:06PM +0000, Tvrtko Ursulin wrote:
> 
> On 27/01/2017 14:07, Chris Wilson wrote:
> >On Fri, Jan 27, 2017 at 01:59:54PM +0000, Tvrtko Ursulin wrote:
> >>On 27/01/2017 12:27, Chris Wilson wrote:
> >>>Hmm. Following on from that we can add the out tracepoint as a
> >>>fence-callback. For the moment, I'd drop guc/legacy
> >>>trace_i915_gem_request_in and we will try something more magical. Though
> >>>once we do enable the fake GuC scheduler we will have an appropriate
> >>>place to drop an trace_i915_gem_request_out. Just leaving ringbuffer as
> >>>the odd one out, but for who arguably the in/out logic is not as
> >>>important.
> >>
> >>Fence signalled is very lazy if no one is listening, no? So until
> >>the GUC backend I thought request_in and deriving the _out from
> >>_notify in userspace would be OK. Meaning enable signalling stays
> >>until then.
> >
> >It's lazy unless we use fence_add_callback(). I was thinking of some
> >magic inside the trace_request_in macro to add something like
> >fence_add_callback(this_fence, __trace_request_out);
> >
> >It still has the problem request_out doesn't work unless request_in is
> >also being watched, but it has the advantage of being the same for all
> >generators (i.e. more convenient for userspace).
> >
> >But as it seems limited to ringbuffer, we may consider it no longer
> >required?
> 
> You mean bank on the GUC backend getting in soon? Yeah, that sounds
> reasonable. I'll drop the sw signalling from notify then.

Considering Joonas asks quite regularly "are we nearly there yet", I
think so.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 8ced9e26f075..beec88a30347 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -518,6 +518,8 @@  static void __i915_guc_submit(struct drm_i915_gem_request *rq)
 	if (i915_vma_is_map_and_fenceable(rq->ring->vma))
 		POSTING_READ_FW(GUC_STATUS);
 
+	trace_i915_gem_request_in(rq, 0);
+
 	b_ret = guc_ring_doorbell(client);
 
 	client->submissions[engine_id] += 1;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index d24b89d0e3ab..18dd21653a80 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -582,6 +582,35 @@  DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 	    TP_ARGS(req)
 );
 
+DECLARE_EVENT_CLASS(i915_gem_request_hw,
+		    TP_PROTO(struct drm_i915_gem_request *req,
+			     unsigned int port),
+		    TP_ARGS(req, port),
+
+		    TP_STRUCT__entry(
+				     __field(u32, dev)
+				     __field(u32, ring)
+				     __field(u32, seqno)
+				     __field(u32, global_seqno)
+				     __field(u32, ctx)
+				     __field(u32, port)
+				    ),
+
+		    TP_fast_assign(
+			           __entry->dev = req->i915->drm.primary->index;
+			           __entry->ring = req->engine->id;
+			           __entry->ctx = req->ctx->hw_id;
+			           __entry->seqno = req->fence.seqno;
+			           __entry->global_seqno = req->global_seqno;
+			           __entry->port = port;
+			          ),
+
+		    TP_printk("dev=%u, ring=%u, ctx=%u, seqno=%u, global_seqno=%u, port=%u",
+			      __entry->dev, __entry->ring, __entry->ctx,
+			      __entry->seqno, __entry->global_seqno,
+			      __entry->port)
+);
+
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
 DEFINE_EVENT(i915_gem_request, i915_gem_request_submit,
 	     TP_PROTO(struct drm_i915_gem_request *req),
@@ -592,6 +621,16 @@  DEFINE_EVENT(i915_gem_request, i915_gem_request_execute,
 	     TP_PROTO(struct drm_i915_gem_request *req),
 	     TP_ARGS(req)
 );
+
+DEFINE_EVENT(i915_gem_request_hw, i915_gem_request_in,
+	     TP_PROTO(struct drm_i915_gem_request *req, unsigned int port),
+	     TP_ARGS(req, port)
+);
+
+DEFINE_EVENT(i915_gem_request_hw, i915_gem_request_out,
+	     TP_PROTO(struct drm_i915_gem_request *req, unsigned int port),
+	     TP_ARGS(req, port)
+);
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
 static inline void
@@ -603,6 +642,16 @@  static inline void
 trace_i915_gem_request_execute(struct drm_i915_gem_request *req)
 {
 }
+
+static inline void
+trace_i915_gem_request_in(struct drm_i915_gem_request *req, unsigned int port)
+{
+}
+
+static inline void
+trace_i915_gem_request_out(struct drm_i915_gem_request *req, unsigned int port)
+{
+}
 #endif
 #endif
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bee9d565b8f3..85581a05657b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -505,6 +505,7 @@  static void execlists_dequeue(struct intel_engine_cs *engine)
 		cursor->priotree.priority = INT_MAX;
 
 		__i915_gem_request_submit(cursor);
+		trace_i915_gem_request_in(cursor, port - engine->execlist_port);
 		last = cursor;
 		submit = true;
 	}
@@ -561,6 +562,7 @@  static void intel_lrc_irq_handler(unsigned long data)
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
 	struct execlist_port *port = engine->execlist_port;
 	struct drm_i915_private *dev_priv = engine->i915;
+	unsigned int portidx = 0;
 
 	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
 
@@ -597,6 +599,8 @@  static void intel_lrc_irq_handler(unsigned long data)
 				execlists_context_status_change(port[0].request,
 								INTEL_CONTEXT_SCHEDULE_OUT);
 
+				trace_i915_gem_request_out(port[0].request,
+							   portidx++);
 				i915_gem_request_put(port[0].request);
 				port[0] = port[1];
 				memset(&port[1], 0, sizeof(port[1]));