[RFC,4/4] drm/i915/perf: Send system clock monotonic time in perf samples
diff mbox

Message ID 1510748034-14034-5-git-send-email-sagar.a.kamble@intel.com
State New
Headers show

Commit Message

sagar.a.kamble@intel.com Nov. 15, 2017, 12:13 p.m. UTC
From: Sourab Gupta <sourab.gupta@intel.com>

Currently, we have the ability to only forward the GPU timestamps in the
samples (which are generated via OA reports). This limits the ability to
correlate these samples with the system events.

An ability is therefore needed to report timestamps in different clock
domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
practical use to the userspace. This ability becomes important
when we want to correlate/plot GPU events/samples with other system events
on the same timeline (e.g. vblank events, or timestamps when work was
submitted to kernel, etc.)

The patch here proposes a mechanism to achieve this. The correlation
between gpu time and system time is established using the timestamp clock
associated with the command stream, abstracted as timecounter/cyclecounter
to retrieve gpu/system time correlated values.

v2: Added i915_driver_init_late() function to capture the new late init
phase for perf (Chris)

v3: Removed cross-timestamp changes.

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sourab Gupta <sourab.gupta@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h      |  7 +++++++
 2 files changed, 34 insertions(+)

Comments

Chris Wilson Nov. 15, 2017, 12:31 p.m. UTC | #1
Quoting Sagar Arun Kamble (2017-11-15 12:13:54)
> From: Sourab Gupta <sourab.gupta@intel.com>
> 
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
> 
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
> 
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
> 
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
> 
> v3: Removed cross-timestamp changes.
> 
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h      |  7 +++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>  
>  #define SAMPLE_OA_REPORT       BIT(0)
>  #define SAMPLE_GPU_TS          BIT(1)
> +#define SAMPLE_SYSTEM_TS       BIT(2)
>  
>  /**
>   * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>         struct drm_i915_perf_record_header header;
>         u32 sample_flags = stream->sample_flags;
>         u64 gpu_ts = 0;
> +       u64 system_ts = 0;
>  
>         header.type = DRM_I915_PERF_RECORD_SAMPLE;
>         header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>  
>                 if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>                         return -EFAULT;
> +               buf += I915_PERF_TS_SAMPLE_SIZE;
> +       }
> +
> +       if (sample_flags & SAMPLE_SYSTEM_TS) {
> +               gpu_ts = get_gpu_ts_from_oa_report(stream, report);

Scope your variables. Stops us from being confused as to where else
gpu_ts or sys_ts may be reused. For instance I first thought you were
using SAMPLE_GPU_TS to initialise gpu_ts

> +               /*
> +                * XXX: timecounter_cyc2time considers time backwards if delta
> +                * timestamp is more than half the max ns time covered by
> +                * counter. It will be ~35min for 36 bit counter. If this much
> +                * sampling duration is needed we will have to update tc->nsec
> +                * by explicitly reading the timecounter (timecounter_read)
> +                * before this duration.
> +                */
> +               system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> +               if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> +                       return -EFAULT;

Advance buf.
sagar.a.kamble@intel.com Nov. 15, 2017, 4:51 p.m. UTC | #2
On 11/15/2017 6:01 PM, Chris Wilson wrote:
> Quoting Sagar Arun Kamble (2017-11-15 12:13:54)
>> From: Sourab Gupta <sourab.gupta@intel.com>
>>
>> Currently, we have the ability to only forward the GPU timestamps in the
>> samples (which are generated via OA reports). This limits the ability to
>> correlate these samples with the system events.
>>
>> An ability is therefore needed to report timestamps in different clock
>> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
>> practical use to the userspace. This ability becomes important
>> when we want to correlate/plot GPU events/samples with other system events
>> on the same timeline (e.g. vblank events, or timestamps when work was
>> submitted to kernel, etc.)
>>
>> The patch here proposes a mechanism to achieve this. The correlation
>> between gpu time and system time is established using the timestamp clock
>> associated with the command stream, abstracted as timecounter/cyclecounter
>> to retrieve gpu/system time correlated values.
>>
>> v2: Added i915_driver_init_late() function to capture the new late init
>> phase for perf (Chris)
>>
>> v3: Removed cross-timestamp changes.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h      |  7 +++++++
>>   2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 3b721d7..94ee924 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -336,6 +336,7 @@
>>   
>>   #define SAMPLE_OA_REPORT       BIT(0)
>>   #define SAMPLE_GPU_TS          BIT(1)
>> +#define SAMPLE_SYSTEM_TS       BIT(2)
>>   
>>   /**
>>    * struct perf_open_properties - for validated properties given to open a stream
>> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>>          struct drm_i915_perf_record_header header;
>>          u32 sample_flags = stream->sample_flags;
>>          u64 gpu_ts = 0;
>> +       u64 system_ts = 0;
>>   
>>          header.type = DRM_I915_PERF_RECORD_SAMPLE;
>>          header.pad = 0;
>> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>>   
>>                  if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>>                          return -EFAULT;
>> +               buf += I915_PERF_TS_SAMPLE_SIZE;
>> +       }
>> +
>> +       if (sample_flags & SAMPLE_SYSTEM_TS) {
>> +               gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> Scope your variables. Stops us from being confused as to where else
> gpu_ts or sys_ts may be reused. For instance I first thought you were
> using SAMPLE_GPU_TS to initialise gpu_ts
Yes
>> +               /*
>> +                * XXX: timecounter_cyc2time considers time backwards if delta
>> +                * timestamp is more than half the max ns time covered by
>> +                * counter. It will be ~35min for 36 bit counter. If this much
>> +                * sampling duration is needed we will have to update tc->nsec
>> +                * by explicitly reading the timecounter (timecounter_read)
>> +                * before this duration.
>> +                */
>> +               system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
>> +
>> +               if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
>> +                       return -EFAULT;
> Advance buf.
Had kept advance logic only if there is new field to be added as this 
advance is missing for
SAMPLE_OA_REPORT currently in drm-tip. Will fix.
Thanks for the review :)
sagar.a.kamble@intel.com Nov. 15, 2017, 5:54 p.m. UTC | #3
On 11/15/2017 5:43 PM, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
>
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
>
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
>
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
>
> v3: Removed cross-timestamp changes.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h      |  7 +++++++
>   2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>   
>   #define SAMPLE_OA_REPORT	BIT(0)
>   #define SAMPLE_GPU_TS		BIT(1)
> +#define SAMPLE_SYSTEM_TS	BIT(2)
>   
>   /**
>    * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	struct drm_i915_perf_record_header header;
>   	u32 sample_flags = stream->sample_flags;
>   	u64 gpu_ts = 0;
> +	u64 system_ts = 0;
>   
>   	header.type = DRM_I915_PERF_RECORD_SAMPLE;
>   	header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   
>   		if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>   			return -EFAULT;
> +		buf += I915_PERF_TS_SAMPLE_SIZE;
> +	}
> +
> +	if (sample_flags & SAMPLE_SYSTEM_TS) {
> +		gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> +		/*
> +		 * XXX: timecounter_cyc2time considers time backwards if delta
> +		 * timestamp is more than half the max ns time covered by
> +		 * counter. It will be ~35min for 36 bit counter. If this much
> +		 * sampling duration is needed we will have to update tc->nsec
> +		 * by explicitly reading the timecounter (timecounter_read)
> +		 * before this duration.
This is implemented as overflow watchdog in mlx5e_timestamp_init 
(drivers/net/ethernet/mellanox).
Will need similar mechanism here.
> +		 */
> +		system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> +		if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> +			return -EFAULT;
>   	}
>   
>   	(*offset) += header.size;
> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>   	}
>   
> +	if (props->sample_flags & SAMPLE_SYSTEM_TS) {
> +		stream->sample_flags |= SAMPLE_SYSTEM_TS;
> +		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> +	}
> +
>   	dev_priv->perf.oa.oa_buffer.format_size = format_size;
>   	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>   		return -EINVAL;
> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   		case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>   			props->sample_flags |= SAMPLE_GPU_TS;
>   			break;
> +		case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
> +			props->sample_flags |= SAMPLE_SYSTEM_TS;
> +			break;
>   		case DRM_I915_PERF_PROP_OA_METRICS_SET:
>   			if (value == 0) {
>   				DRM_DEBUG("Unknown OA metric set ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0b9249e..283859c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>   	DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>   
>   	/**
> +	 * This property requests inclusion of CLOCK_MONOTONIC system time in
> +	 * the perf sample data.
> +	 */
> +	DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
> +
> +	/**
>   	 * The value specifies which set of OA unit metrics should be
>   	 * be configured, defining the contents of any OA unit reports.
>   	 */
> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>   	 *
>   	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>   	 *     { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
> +	 *     { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
>   	 * };
>   	 */
>   	DRM_I915_PERF_RECORD_SAMPLE = 1,
Lionel Landwerlin Dec. 5, 2017, 2:22 p.m. UTC | #4
On 15/11/17 12:13, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
>
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
>
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
>
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
>
> v3: Removed cross-timestamp changes.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h      |  7 +++++++
>   2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>   
>   #define SAMPLE_OA_REPORT	BIT(0)
>   #define SAMPLE_GPU_TS		BIT(1)
> +#define SAMPLE_SYSTEM_TS	BIT(2)
>   
>   /**
>    * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	struct drm_i915_perf_record_header header;
>   	u32 sample_flags = stream->sample_flags;
>   	u64 gpu_ts = 0;
> +	u64 system_ts = 0;
>   
>   	header.type = DRM_I915_PERF_RECORD_SAMPLE;
>   	header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   
>   		if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>   			return -EFAULT;
> +		buf += I915_PERF_TS_SAMPLE_SIZE;

This is a ridiculous nit, but I think using sizeof(u64) would be more 
readable than this I915_PERF_TS_SAMPLE_SIZE define.

> +	}
> +
> +	if (sample_flags & SAMPLE_SYSTEM_TS) {
> +		gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> +		/*
> +		 * XXX: timecounter_cyc2time considers time backwards if delta
> +		 * timestamp is more than half the max ns time covered by
> +		 * counter. It will be ~35min for 36 bit counter. If this much
> +		 * sampling duration is needed we will have to update tc->nsec
> +		 * by explicitly reading the timecounter (timecounter_read)
> +		 * before this duration.
> +		 */
> +		system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> +		if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> +			return -EFAULT;
>   	}
>   
>   	(*offset) += header.size;
> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>   	}
>   
> +	if (props->sample_flags & SAMPLE_SYSTEM_TS) {
> +		stream->sample_flags |= SAMPLE_SYSTEM_TS;
> +		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> +	}
> +
>   	dev_priv->perf.oa.oa_buffer.format_size = format_size;
>   	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>   		return -EINVAL;
> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>   		case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>   			props->sample_flags |= SAMPLE_GPU_TS;
>   			break;
> +		case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
> +			props->sample_flags |= SAMPLE_SYSTEM_TS;
> +			break;
>   		case DRM_I915_PERF_PROP_OA_METRICS_SET:
>   			if (value == 0) {
>   				DRM_DEBUG("Unknown OA metric set ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0b9249e..283859c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>   	DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>   
>   	/**
> +	 * This property requests inclusion of CLOCK_MONOTONIC system time in
> +	 * the perf sample data.
> +	 */
> +	DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
> +
> +	/**
>   	 * The value specifies which set of OA unit metrics should be
>   	 * be configured, defining the contents of any OA unit reports.
>   	 */
> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>   	 *
>   	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>   	 *     { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
> +	 *     { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS

I would just add those u64 fields before oa_report.
Since the report sizes are dictated by the hardware, I'm afraid that one 
day someone might come up with a non 64bit aligned format (however 
unlikely).
And since the new properties mean you need to be aware of the potential 
new offsets, it's not breaking existing userspace.

>   	 * };
>   	 */
>   	DRM_I915_PERF_RECORD_SAMPLE = 1,
sagar.a.kamble@intel.com Dec. 6, 2017, 8:31 a.m. UTC | #5
On 12/5/2017 7:52 PM, Lionel Landwerlin wrote:
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> From: Sourab Gupta <sourab.gupta@intel.com>
>>
>> Currently, we have the ability to only forward the GPU timestamps in the
>> samples (which are generated via OA reports). This limits the ability to
>> correlate these samples with the system events.
>>
>> An ability is therefore needed to report timestamps in different clock
>> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
>> practical use to the userspace. This ability becomes important
>> when we want to correlate/plot GPU events/samples with other system 
>> events
>> on the same timeline (e.g. vblank events, or timestamps when work was
>> submitted to kernel, etc.)
>>
>> The patch here proposes a mechanism to achieve this. The correlation
>> between gpu time and system time is established using the timestamp 
>> clock
>> associated with the command stream, abstracted as 
>> timecounter/cyclecounter
>> to retrieve gpu/system time correlated values.
>>
>> v2: Added i915_driver_init_late() function to capture the new late init
>> phase for perf (Chris)
>>
>> v3: Removed cross-timestamp changes.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>>   include/uapi/drm/i915_drm.h      |  7 +++++++
>>   2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index 3b721d7..94ee924 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -336,6 +336,7 @@
>>     #define SAMPLE_OA_REPORT    BIT(0)
>>   #define SAMPLE_GPU_TS        BIT(1)
>> +#define SAMPLE_SYSTEM_TS    BIT(2)
>>     /**
>>    * struct perf_open_properties - for validated properties given to 
>> open a stream
>> @@ -622,6 +623,7 @@ static int append_oa_sample(struct 
>> i915_perf_stream *stream,
>>       struct drm_i915_perf_record_header header;
>>       u32 sample_flags = stream->sample_flags;
>>       u64 gpu_ts = 0;
>> +    u64 system_ts = 0;
>>         header.type = DRM_I915_PERF_RECORD_SAMPLE;
>>       header.pad = 0;
>> @@ -647,6 +649,23 @@ static int append_oa_sample(struct 
>> i915_perf_stream *stream,
>>             if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>>               return -EFAULT;
>> +        buf += I915_PERF_TS_SAMPLE_SIZE;
>
> This is a ridiculous nit, but I think using sizeof(u64) would be more 
> readable than this I915_PERF_TS_SAMPLE_SIZE define.

Will update. Thanks.

>
>> +    }
>> +
>> +    if (sample_flags & SAMPLE_SYSTEM_TS) {
>> +        gpu_ts = get_gpu_ts_from_oa_report(stream, report);
>> +        /*
>> +         * XXX: timecounter_cyc2time considers time backwards if delta
>> +         * timestamp is more than half the max ns time covered by
>> +         * counter. It will be ~35min for 36 bit counter. If this much
>> +         * sampling duration is needed we will have to update tc->nsec
>> +         * by explicitly reading the timecounter (timecounter_read)
>> +         * before this duration.
>> +         */
>> +        system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
>> +
>> +        if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
>> +            return -EFAULT;
>>       }
>>         (*offset) += header.size;
>> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct 
>> i915_perf_stream *stream,
>>           stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>>       }
>>   +    if (props->sample_flags & SAMPLE_SYSTEM_TS) {
>> +        stream->sample_flags |= SAMPLE_SYSTEM_TS;
>> +        stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>> +    }
>> +
>>       dev_priv->perf.oa.oa_buffer.format_size = format_size;
>>       if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>>           return -EINVAL;
>> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct 
>> drm_i915_private *dev_priv,
>>           case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>>               props->sample_flags |= SAMPLE_GPU_TS;
>>               break;
>> +        case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
>> +            props->sample_flags |= SAMPLE_SYSTEM_TS;
>> +            break;
>>           case DRM_I915_PERF_PROP_OA_METRICS_SET:
>>               if (value == 0) {
>>                   DRM_DEBUG("Unknown OA metric set ID\n");
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 0b9249e..283859c 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>>       DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>>         /**
>> +     * This property requests inclusion of CLOCK_MONOTONIC system 
>> time in
>> +     * the perf sample data.
>> +     */
>> +    DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
>> +
>> +    /**
>>        * The value specifies which set of OA unit metrics should be
>>        * be configured, defining the contents of any OA unit reports.
>>        */
>> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>>        *
>>        *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>>        *     { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
>> +     *     { u64 system_timestamp; } && 
>> DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
>
> I would just add those u64 fields before oa_report.
> Since the report sizes are dictated by the hardware, I'm afraid that 
> one day someone might come up with a non 64bit aligned format (however 
> unlikely).
> And since the new properties mean you need to be aware of the 
> potential new offsets, it's not breaking existing userspace.

Sure. Will update.

>
>>        * };
>>        */
>>       DRM_I915_PERF_RECORD_SAMPLE = 1,
>
>

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3b721d7..94ee924 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -336,6 +336,7 @@ 
 
 #define SAMPLE_OA_REPORT	BIT(0)
 #define SAMPLE_GPU_TS		BIT(1)
+#define SAMPLE_SYSTEM_TS	BIT(2)
 
 /**
  * struct perf_open_properties - for validated properties given to open a stream
@@ -622,6 +623,7 @@  static int append_oa_sample(struct i915_perf_stream *stream,
 	struct drm_i915_perf_record_header header;
 	u32 sample_flags = stream->sample_flags;
 	u64 gpu_ts = 0;
+	u64 system_ts = 0;
 
 	header.type = DRM_I915_PERF_RECORD_SAMPLE;
 	header.pad = 0;
@@ -647,6 +649,23 @@  static int append_oa_sample(struct i915_perf_stream *stream,
 
 		if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
 			return -EFAULT;
+		buf += I915_PERF_TS_SAMPLE_SIZE;
+	}
+
+	if (sample_flags & SAMPLE_SYSTEM_TS) {
+		gpu_ts = get_gpu_ts_from_oa_report(stream, report);
+		/*
+		 * XXX: timecounter_cyc2time considers time backwards if delta
+		 * timestamp is more than half the max ns time covered by
+		 * counter. It will be ~35min for 36 bit counter. If this much
+		 * sampling duration is needed we will have to update tc->nsec
+		 * by explicitly reading the timecounter (timecounter_read)
+		 * before this duration.
+		 */
+		system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
+
+		if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
+			return -EFAULT;
 	}
 
 	(*offset) += header.size;
@@ -2137,6 +2156,11 @@  static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
 	}
 
+	if (props->sample_flags & SAMPLE_SYSTEM_TS) {
+		stream->sample_flags |= SAMPLE_SYSTEM_TS;
+		stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
+	}
+
 	dev_priv->perf.oa.oa_buffer.format_size = format_size;
 	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
 		return -EINVAL;
@@ -2857,6 +2881,9 @@  static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 		case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
 			props->sample_flags |= SAMPLE_GPU_TS;
 			break;
+		case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
+			props->sample_flags |= SAMPLE_SYSTEM_TS;
+			break;
 		case DRM_I915_PERF_PROP_OA_METRICS_SET:
 			if (value == 0) {
 				DRM_DEBUG("Unknown OA metric set ID\n");
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0b9249e..283859c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1453,6 +1453,12 @@  enum drm_i915_perf_property_id {
 	DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
 
 	/**
+	 * This property requests inclusion of CLOCK_MONOTONIC system time in
+	 * the perf sample data.
+	 */
+	DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
+
+	/**
 	 * The value specifies which set of OA unit metrics should be
 	 * be configured, defining the contents of any OA unit reports.
 	 */
@@ -1539,6 +1545,7 @@  enum drm_i915_perf_record_type {
 	 *
 	 *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
 	 *     { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
+	 *     { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
 	 * };
 	 */
 	DRM_I915_PERF_RECORD_SAMPLE = 1,