diff mbox series

[16/19] drm/i915/perf: Apply Wa_18013179988

Message ID 20220823204155.8178-17-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series Add DG2 OA support | expand

Commit Message

Umesh Nerlige Ramappa Aug. 23, 2022, 8:41 p.m. UTC
OA reports in the OA buffer contain an OA timestamp field that helps
user calculate delta between 2 OA reports. The calculation relies on the
CS timestamp frequency to convert the timestamp value to nanoseconds.
The CS timestamp frequency is a function of the CTC_SHIFT value in
RPM_CONFIG0.

In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
actual value from RPM_CONFIG0. At the user level, this results in an
error in calculating delta between 2 OA reports since the OA timestamp
is not shifted in the same manner as CS timestamp.

To resolve this, return actual OA timestamp frequency to the user in
i915_getparam_ioctl.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_getparam.c |  3 +++
 drivers/gpu/drm/i915/i915_perf.c     | 30 ++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_perf.h     |  2 ++
 include/uapi/drm/i915_drm.h          |  6 ++++++
 4 files changed, 39 insertions(+), 2 deletions(-)

Comments

Ashutosh Dixit Sept. 16, 2022, 5:16 a.m. UTC | #1
On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> OA reports in the OA buffer contain an OA timestamp field that helps
> user calculate delta between 2 OA reports. The calculation relies on the
> CS timestamp frequency to convert the timestamp value to nanoseconds.
> The CS timestamp frequency is a function of the CTC_SHIFT value in
> RPM_CONFIG0.
>
> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> actual value from RPM_CONFIG0. At the user level, this results in an
> error in calculating delta between 2 OA reports since the OA timestamp
> is not shifted in the same manner as CS timestamp.
>
> To resolve this, return actual OA timestamp frequency to the user in
> i915_getparam_ioctl.

Rather than exposing actual OA timestamp frequency to userspace (with the
corresponding uapi change, specially if it's only DG2 and not all future
products) questions about a couple of other options:

Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
          same as OA freq :-)

   The HSD seems to mention this:
   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
   Note: Changing the shift setting on live driver may break apps that are
   currently running (including desktop manager).

Option 2. Is it possible to correct the timestamps in OA report headers to
          compensate for the difference between OA and GT frequencies (say when
          copying OA data to userspace)?

	  Though not sure if this is preferable to having userspace do this.

A couple of minor optional nits on that patch below too.

> +u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
> +{
> +	/* Wa_18013179988:dg2 */
> +	if (IS_DG2(i915)) {
> +		intel_wakeref_t wakeref;
> +		u32 reg, shift;
> +
> +		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
> +			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
> +
> +		shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
> +			 GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;

This can be:
		shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);

>  static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
>  {
> -	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
> -					     2ULL << exponent);
> +	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
> +	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
> +
> +	return div_u64(nom + den - 1, den);

div_u64_roundup?

Thanks.
--
Ashutosh
Ashutosh Dixit Sept. 16, 2022, 3:22 p.m. UTC | #2
On Thu, 15 Sep 2022 22:16:30 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >
>
> Hi Umesh,
>
> > OA reports in the OA buffer contain an OA timestamp field that helps
> > user calculate delta between 2 OA reports. The calculation relies on the
> > CS timestamp frequency to convert the timestamp value to nanoseconds.
> > The CS timestamp frequency is a function of the CTC_SHIFT value in
> > RPM_CONFIG0.
> >
> > In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> > actual value from RPM_CONFIG0. At the user level, this results in an
> > error in calculating delta between 2 OA reports since the OA timestamp
> > is not shifted in the same manner as CS timestamp.
> >
> > To resolve this, return actual OA timestamp frequency to the user in
> > i915_getparam_ioctl.
>
> Rather than exposing actual OA timestamp frequency to userspace (with the
> corresponding uapi change, specially if it's only DG2 and not all future
> products) questions about a couple of other options:
>
> Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>           same as OA freq :-)
>
>    The HSD seems to mention this:
>    Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>    Note: Changing the shift setting on live driver may break apps that are
>    currently running (including desktop manager).
>
> Option 2. Is it possible to correct the timestamps in OA report headers to
>           compensate for the difference between OA and GT frequencies (say when
>           copying OA data to userspace)?
>
>	  Though not sure if this is preferable to having userspace do this.

Also do we need input from userland on this patch? UMD's might need to
assess the impact of having different GT and OA frequencies at their end
since they consume OA data?

Thanks.
--
Ashutosh
Umesh Nerlige Ramappa Sept. 16, 2022, 6:56 p.m. UTC | #3
On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
>On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> OA reports in the OA buffer contain an OA timestamp field that helps
>> user calculate delta between 2 OA reports. The calculation relies on the
>> CS timestamp frequency to convert the timestamp value to nanoseconds.
>> The CS timestamp frequency is a function of the CTC_SHIFT value in
>> RPM_CONFIG0.
>>
>> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> actual value from RPM_CONFIG0. At the user level, this results in an
>> error in calculating delta between 2 OA reports since the OA timestamp
>> is not shifted in the same manner as CS timestamp.
>>
>> To resolve this, return actual OA timestamp frequency to the user in
>> i915_getparam_ioctl.
>
>Rather than exposing actual OA timestamp frequency to userspace (with the
>corresponding uapi change, specially if it's only DG2 and not all future
>products) questions about a couple of other options:
>
>Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>          same as OA freq :-)
>
>   The HSD seems to mention this:
>   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>   Note: Changing the shift setting on live driver may break apps that are
>   currently running (including desktop manager).
>
>Option 2. Is it possible to correct the timestamps in OA report headers to
>          compensate for the difference between OA and GT frequencies (say when
>          copying OA data to userspace)?
>
>	  Though not sure if this is preferable to having userspace do this.

It does affect other platforms too. There's no guarantee on what the 
CTC_SHIFT value would be for different platforms, so user would have to 
at least query that somehow (maybe from i915). It's simpler for user to 
use the exported OA frequency since it is also backwards compatible.

https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is 
consumed by GPUvis. That reminds me, I should include the UMD links for 
the patches with uapi changes.

>
>A couple of minor optional nits on that patch below too.
>
>> +u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
>> +{
>> +	/* Wa_18013179988:dg2 */
>> +	if (IS_DG2(i915)) {
>> +		intel_wakeref_t wakeref;
>> +		u32 reg, shift;
>> +
>> +		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
>> +			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
>> +
>> +		shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
>> +			 GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;
>
>This can be:
>		shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);

sure, will change

>
>>  static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
>>  {
>> -	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
>> -					     2ULL << exponent);
>> +	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
>> +	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
>> +
>> +	return div_u64(nom + den - 1, den);
>
>div_u64_roundup?

true, but that is statically defined within intel_gt_clock_utils.c. I 
didn't think there are enough users to export it outside.

Thanks,
Umesh


>
>Thanks.
>--
>Ashutosh
Umesh Nerlige Ramappa Sept. 16, 2022, 7:04 p.m. UTC | #4
On Fri, Sep 16, 2022 at 08:22:40AM -0700, Dixit, Ashutosh wrote:
>On Thu, 15 Sep 2022 22:16:30 -0700, Dixit, Ashutosh wrote:
>>
>> On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>> >
>>
>> Hi Umesh,
>>
>> > OA reports in the OA buffer contain an OA timestamp field that helps
>> > user calculate delta between 2 OA reports. The calculation relies on the
>> > CS timestamp frequency to convert the timestamp value to nanoseconds.
>> > The CS timestamp frequency is a function of the CTC_SHIFT value in
>> > RPM_CONFIG0.
>> >
>> > In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> > actual value from RPM_CONFIG0. At the user level, this results in an
>> > error in calculating delta between 2 OA reports since the OA timestamp
>> > is not shifted in the same manner as CS timestamp.
>> >
>> > To resolve this, return actual OA timestamp frequency to the user in
>> > i915_getparam_ioctl.
>>
>> Rather than exposing actual OA timestamp frequency to userspace (with the
>> corresponding uapi change, specially if it's only DG2 and not all future
>> products) questions about a couple of other options:
>>
>> Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>>           same as OA freq :-)
>>
>>    The HSD seems to mention this:
>>    Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>>    Note: Changing the shift setting on live driver may break apps that are
>>    currently running (including desktop manager).
>>
>> Option 2. Is it possible to correct the timestamps in OA report headers to
>>           compensate for the difference between OA and GT frequencies (say when
>>           copying OA data to userspace)?
>>
>>	  Though not sure if this is preferable to having userspace do this.
>
>Also do we need input from userland on this patch? UMD's might need to
>assess the impact of having different GT and OA frequencies at their end
>since they consume OA data?

Lionel is aware of the change and I believe he has some patches to 
consume this API for the GPUvis support, but we need an Ack from 
Joonas/maintainer to merge any uapi changes. I will add them to the next 
revision.

Thanks,
Umesh

>
>Thanks.
>--
>Ashutosh
Ashutosh Dixit Sept. 16, 2022, 7:57 p.m. UTC | #5
On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
>
> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > Hi Umesh,
> >
> >> OA reports in the OA buffer contain an OA timestamp field that helps
> >> user calculate delta between 2 OA reports. The calculation relies on the
> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
> >> RPM_CONFIG0.
> >>
> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> >> actual value from RPM_CONFIG0. At the user level, this results in an
> >> error in calculating delta between 2 OA reports since the OA timestamp
> >> is not shifted in the same manner as CS timestamp.
> >>
> >> To resolve this, return actual OA timestamp frequency to the user in
> >> i915_getparam_ioctl.
> >
> > Rather than exposing actual OA timestamp frequency to userspace (with the
> > corresponding uapi change, specially if it's only DG2 and not all future
> > products) questions about a couple of other options:
> >
> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
> >          same as OA freq :-)
> >
> >   The HSD seems to mention this:
> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
> >   Note: Changing the shift setting on live driver may break apps that are
> >   currently running (including desktop manager).
> >
> > Option 2. Is it possible to correct the timestamps in OA report headers to
> >          compensate for the difference between OA and GT frequencies (say when
> >          copying OA data to userspace)?
> >
> >	  Though not sure if this is preferable to having userspace do this.
>
> It does affect other platforms too. There's no guarantee on what the
> CTC_SHIFT value would be for different platforms, so user would have to at
> least query that somehow (maybe from i915). It's simpler for user to use
> the exported OA frequency since it is also backwards compatible.

Is Option 2 above feasible since it would stop propagating the change to
various UMD's?

> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
> consumed by GPUvis. That reminds me, I should include the UMD links for the
> patches with uapi changes.

I was thinking more about UMD's which analayze OA data and who till now are
probably assuming OA freq == GT freq and will now have to drop that
assumption. So not sure how widespread would be these changes in
the (multiple different?) UMD(s).

Thanks.
--
Ashutosh
Umesh Nerlige Ramappa Sept. 16, 2022, 8:25 p.m. UTC | #6
On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
>On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
>> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>> >>
>> >
>> > Hi Umesh,
>> >
>> >> OA reports in the OA buffer contain an OA timestamp field that helps
>> >> user calculate delta between 2 OA reports. The calculation relies on the
>> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
>> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
>> >> RPM_CONFIG0.
>> >>
>> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> >> actual value from RPM_CONFIG0. At the user level, this results in an
>> >> error in calculating delta between 2 OA reports since the OA timestamp
>> >> is not shifted in the same manner as CS timestamp.
>> >>
>> >> To resolve this, return actual OA timestamp frequency to the user in
>> >> i915_getparam_ioctl.
>> >
>> > Rather than exposing actual OA timestamp frequency to userspace (with the
>> > corresponding uapi change, specially if it's only DG2 and not all future
>> > products) questions about a couple of other options:
>> >
>> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>> >          same as OA freq :-)
>> >
>> >   The HSD seems to mention this:
>> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>> >   Note: Changing the shift setting on live driver may break apps that are
>> >   currently running (including desktop manager).
>> >
>> > Option 2. Is it possible to correct the timestamps in OA report headers to
>> >          compensate for the difference between OA and GT frequencies (say when
>> >          copying OA data to userspace)?
>> >
>> >	  Though not sure if this is preferable to having userspace do this.
>>
>> It does affect other platforms too. There's no guarantee on what the
>> CTC_SHIFT value would be for different platforms, so user would have to at
>> least query that somehow (maybe from i915). It's simpler for user to use
>> the exported OA frequency since it is also backwards compatible.
>
>Is Option 2 above feasible since it would stop propagating the change to
>various UMD's?

Hmm, there is logic today that squashes context ids when doing oa buffer
filtering, but it does that on selective reports (i.e. if a gem_context 
is passed).

For this issue: for a 16MB OA buffer with 256 byte reports, that would 
be an additional write of 262144 in the kmd (to smem). For 20us sampled 
OA reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2 
concerns:

- the mmapped use case may break, but I don't see that being upstreamed.  
   We may have divergent solutions for upstream and internal.
- blocking/polling tests in IGT will be sensitive to this change on some 
   platforms and may need to be bolstered.

I will give it a shot and get back,

Thanks,
Umesh

>
>> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
>> consumed by GPUvis. That reminds me, I should include the UMD links for the
>> patches with uapi changes.
>
>I was thinking more about UMD's which analayze OA data and who till now are
>probably assuming OA freq == GT freq and will now have to drop that
>assumption. So not sure how widespread would be these changes in
>the (multiple different?) UMD(s).
>
>Thanks.
>--
>Ashutosh
Ashutosh Dixit Sept. 16, 2022, 9 p.m. UTC | #7
On Fri, 16 Sep 2022 13:25:17 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
> >> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >> >>
> >> >
> >> > Hi Umesh,
> >> >
> >> >> OA reports in the OA buffer contain an OA timestamp field that helps
> >> >> user calculate delta between 2 OA reports. The calculation relies on the
> >> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
> >> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
> >> >> RPM_CONFIG0.
> >> >>
> >> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> >> >> actual value from RPM_CONFIG0. At the user level, this results in an
> >> >> error in calculating delta between 2 OA reports since the OA timestamp
> >> >> is not shifted in the same manner as CS timestamp.
> >> >>
> >> >> To resolve this, return actual OA timestamp frequency to the user in
> >> >> i915_getparam_ioctl.
> >> >
> >> > Rather than exposing actual OA timestamp frequency to userspace (with the
> >> > corresponding uapi change, specially if it's only DG2 and not all future
> >> > products) questions about a couple of other options:
> >> >
> >> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
> >> >          same as OA freq :-)
> >> >
> >> >   The HSD seems to mention this:
> >> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
> >> >   Note: Changing the shift setting on live driver may break apps that are
> >> >   currently running (including desktop manager).
> >> >
> >> > Option 2. Is it possible to correct the timestamps in OA report headers to
> >> >          compensate for the difference between OA and GT frequencies (say when
> >> >          copying OA data to userspace)?
> >> >
> >> >	  Though not sure if this is preferable to having userspace do this.
> >>
> >> It does affect other platforms too. There's no guarantee on what the
> >> CTC_SHIFT value would be for different platforms, so user would have to at
> >> least query that somehow (maybe from i915). It's simpler for user to use
> >> the exported OA frequency since it is also backwards compatible.
> >
> > Is Option 2 above feasible since it would stop propagating the change to
> > various UMD's?
>
> Hmm, there is logic today that squashes context ids when doing oa buffer
> filtering, but it does that on selective reports (i.e. if a gem_context is
> passed).
>
> For this issue: for a 16MB OA buffer with 256 byte reports, that would be
> an additional write of 262144 in the kmd (to smem). For 20us sampled OA
> reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2
> concerns:
>
> - the mmapped use case may break, but I don't see that being upstreamed.
> We may have divergent solutions for upstream and internal.
> - blocking/polling tests in IGT will be sensitive to this change on some
> platforms and may need to be bolstered.

If this correction/compensation in the kernel works out, even for internal
too we could do the following:

* For non-mmaped case, do the correction in the kernel and expose OA freq
  == GT freq (in the getparam ioctl)
* For mmaped case expose the actual OA freq (!= GT freq)

This will restrict the divergence only to the mmaped case (which we will
probably not be able to upstream).

>
> I will give it a shot and get back,
>
> Thanks,
> Umesh
>
> >
> >> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
> >> consumed by GPUvis. That reminds me, I should include the UMD links for the
> >> patches with uapi changes.
> >
> > I was thinking more about UMD's which analayze OA data and who till now are
> > probably assuming OA freq == GT freq and will now have to drop that
> > assumption. So not sure how widespread would be these changes in
> > the (multiple different?) UMD(s).
> >
> > Thanks.
> > --
> > Ashutosh
Umesh Nerlige Ramappa Sept. 19, 2022, 9:21 p.m. UTC | #8
On Fri, Sep 16, 2022 at 02:00:19PM -0700, Dixit, Ashutosh wrote:
>On Fri, 16 Sep 2022 13:25:17 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
>> > On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
>> >>
>> >> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
>> >> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
>> >> >>
>> >> >
>> >> > Hi Umesh,
>> >> >
>> >> >> OA reports in the OA buffer contain an OA timestamp field that helps
>> >> >> user calculate delta between 2 OA reports. The calculation relies on the
>> >> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
>> >> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
>> >> >> RPM_CONFIG0.
>> >> >>
>> >> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
>> >> >> actual value from RPM_CONFIG0. At the user level, this results in an
>> >> >> error in calculating delta between 2 OA reports since the OA timestamp
>> >> >> is not shifted in the same manner as CS timestamp.
>> >> >>
>> >> >> To resolve this, return actual OA timestamp frequency to the user in
>> >> >> i915_getparam_ioctl.
>> >> >
>> >> > Rather than exposing actual OA timestamp frequency to userspace (with the
>> >> > corresponding uapi change, specially if it's only DG2 and not all future
>> >> > products) questions about a couple of other options:
>> >> >
>> >> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
>> >> >          same as OA freq :-)
>> >> >
>> >> >   The HSD seems to mention this:
>> >> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
>> >> >   Note: Changing the shift setting on live driver may break apps that are
>> >> >   currently running (including desktop manager).
>> >> >
>> >> > Option 2. Is it possible to correct the timestamps in OA report headers to
>> >> >          compensate for the difference between OA and GT frequencies (say when
>> >> >          copying OA data to userspace)?
>> >> >
>> >> >	  Though not sure if this is preferable to having userspace do this.
>> >>
>> >> It does affect other platforms too. There's no guarantee on what the
>> >> CTC_SHIFT value would be for different platforms, so user would have to at
>> >> least query that somehow (maybe from i915). It's simpler for user to use
>> >> the exported OA frequency since it is also backwards compatible.
>> >
>> > Is Option 2 above feasible since it would stop propagating the change to
>> > various UMD's?
>>
>> Hmm, there is logic today that squashes context ids when doing oa buffer
>> filtering, but it does that on selective reports (i.e. if a gem_context is
>> passed).
>>
>> For this issue: for a 16MB OA buffer with 256 byte reports, that would be
>> an additional write of 262144 in the kmd (to smem). For 20us sampled OA
>> reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2
>> concerns:
>>
>> - the mmapped use case may break, but I don't see that being upstreamed.
>> We may have divergent solutions for upstream and internal.
>> - blocking/polling tests in IGT will be sensitive to this change on some
>> platforms and may need to be bolstered.
>
>If this correction/compensation in the kernel works out, even for internal
>too we could do the following:
>
>* For non-mmaped case, do the correction in the kernel and expose OA freq
>  == GT freq (in the getparam ioctl)
>* For mmaped case expose the actual OA freq (!= GT freq)
>
>This will restrict the divergence only to the mmaped case (which we will
>probably not be able to upstream).
>
>>
>> I will give it a shot and get back,

We cannot tweak this in the OA report header since that will be out of 
sync with the counters in the report. The other issue here is that the 
bug also applies to MI_REPORT_PERF_COUNT, and KMD cannot do anything to 
fix that. I would think this interface is the clean way to do this.

Thanks,
Umesh

>>
>> Thanks,
>> Umesh
>>
>> >
>> >> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
>> >> consumed by GPUvis. That reminds me, I should include the UMD links for the
>> >> patches with uapi changes.
>> >
>> > I was thinking more about UMD's which analayze OA data and who till now are
>> > probably assuming OA freq == GT freq and will now have to drop that
>> > assumption. So not sure how widespread would be these changes in
>> > the (multiple different?) UMD(s).
>> >
>> > Thanks.
>> > --
>> > Ashutosh
Ashutosh Dixit Sept. 20, 2022, 1:24 a.m. UTC | #9
On Mon, 19 Sep 2022 14:21:07 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Sep 16, 2022 at 02:00:19PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 16 Sep 2022 13:25:17 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Fri, Sep 16, 2022 at 12:57:19PM -0700, Dixit, Ashutosh wrote:
> >> > On Fri, 16 Sep 2022 11:56:04 -0700, Umesh Nerlige Ramappa wrote:
> >> >>
> >> >> On Thu, Sep 15, 2022 at 10:16:30PM -0700, Dixit, Ashutosh wrote:
> >> >> > On Tue, 23 Aug 2022 13:41:52 -0700, Umesh Nerlige Ramappa wrote:
> >> >> >>
> >> >> >
> >> >> > Hi Umesh,
> >> >> >
> >> >> >> OA reports in the OA buffer contain an OA timestamp field that helps
> >> >> >> user calculate delta between 2 OA reports. The calculation relies on the
> >> >> >> CS timestamp frequency to convert the timestamp value to nanoseconds.
> >> >> >> The CS timestamp frequency is a function of the CTC_SHIFT value in
> >> >> >> RPM_CONFIG0.
> >> >> >>
> >> >> >> In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
> >> >> >> actual value from RPM_CONFIG0. At the user level, this results in an
> >> >> >> error in calculating delta between 2 OA reports since the OA timestamp
> >> >> >> is not shifted in the same manner as CS timestamp.
> >> >> >>
> >> >> >> To resolve this, return actual OA timestamp frequency to the user in
> >> >> >> i915_getparam_ioctl.
> >> >> >
> >> >> > Rather than exposing actual OA timestamp frequency to userspace (with the
> >> >> > corresponding uapi change, specially if it's only DG2 and not all future
> >> >> > products) questions about a couple of other options:
> >> >> >
> >> >> > Option 1. Can we set CTC_SHIFT in RPM_CONFIG0 to 3, so change GT freq to be the
> >> >> >          same as OA freq :-)
> >> >> >
> >> >> >   The HSD seems to mention this:
> >> >> >   Is setting CTC SHIFT to 0b11 on driver init an acceptable W/A?
> >> >> >   Note: Changing the shift setting on live driver may break apps that are
> >> >> >   currently running (including desktop manager).
> >> >> >
> >> >> > Option 2. Is it possible to correct the timestamps in OA report headers to
> >> >> >          compensate for the difference between OA and GT frequencies (say when
> >> >> >          copying OA data to userspace)?
> >> >> >
> >> >> >	  Though not sure if this is preferable to having userspace do this.
> >> >>
> >> >> It does affect other platforms too. There's no guarantee on what the
> >> >> CTC_SHIFT value would be for different platforms, so user would have to at
> >> >> least query that somehow (maybe from i915). It's simpler for user to use
> >> >> the exported OA frequency since it is also backwards compatible.
> >> >
> >> > Is Option 2 above feasible since it would stop propagating the change to
> >> > various UMD's?
> >>
> >> Hmm, there is logic today that squashes context ids when doing oa buffer
> >> filtering, but it does that on selective reports (i.e. if a gem_context is
> >> passed).
> >>
> >> For this issue: for a 16MB OA buffer with 256 byte reports, that would be
> >> an additional write of 262144 in the kmd (to smem). For 20us sampled OA
> >> reports, it would be approx. 195 KB/s. Shouldn't be too much. Only 2
> >> concerns:
> >>
> >> - the mmapped use case may break, but I don't see that being upstreamed.
> >> We may have divergent solutions for upstream and internal.
> >> - blocking/polling tests in IGT will be sensitive to this change on some
> >> platforms and may need to be bolstered.
> >
> > If this correction/compensation in the kernel works out, even for internal
> > too we could do the following:
> >
> > * For non-mmaped case, do the correction in the kernel and expose OA freq
> >  == GT freq (in the getparam ioctl)
> > * For mmaped case expose the actual OA freq (!= GT freq)
> >
> > This will restrict the divergence only to the mmaped case (which we will
> > probably not be able to upstream).
> >
> >>
> >> I will give it a shot and get back,
>
> We cannot tweak this in the OA report header since that will be out of sync
> with the counters in the report. The other issue here is that the bug also
> applies to MI_REPORT_PERF_COUNT, and KMD cannot do anything to fix that. I
> would think this interface is the clean way to do this.

In that case this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

May still need a uapi change ack but that's separate.

>
> Thanks,
> Umesh
>
> >>
> >> Thanks,
> >> Umesh
> >>
> >> >
> >> >> https://patchwork.freedesktop.org/patch/498917/?series=107633&rev=3 is
> >> >> consumed by GPUvis. That reminds me, I should include the UMD links for the
> >> >> patches with uapi changes.
> >> >
> >> > I was thinking more about UMD's which analayze OA data and who till now are
> >> > probably assuming OA freq == GT freq and will now have to drop that
> >> > assumption. So not sure how widespread would be these changes in
> >> > the (multiple different?) UMD(s).
> >> >
> >> > Thanks.
> >> > --
> >> > Ashutosh
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
index 6fd15b39570c..cdb2208ecabd 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -175,6 +175,9 @@  int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_PERF_REVISION:
 		value = i915_perf_ioctl_version();
 		break;
+	case I915_PARAM_OA_TIMESTAMP_FREQUENCY:
+		value = i915_perf_oa_timestamp_frequency(i915);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index efdd16edf8f3..132c2ce8b33b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3180,6 +3180,30 @@  get_sseu_config(struct intel_sseu *out_sseu,
 	return i915_gem_user_to_context_sseu(engine->gt, drm_sseu, out_sseu);
 }
 
+/*
+ * OA timestamp frequency = CS timestamp frequency in most platforms. On some
+ * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
+ * cases, return the adjusted CS timestamp frequency to the user.
+ */
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915)
+{
+	/* Wa_18013179988:dg2 */
+	if (IS_DG2(i915)) {
+		intel_wakeref_t wakeref;
+		u32 reg, shift;
+
+		with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref)
+			reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0);
+
+		shift = (reg & GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK) >>
+			 GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_SHIFT;
+
+		return to_gt(i915)->clock_frequency << (3 - shift);
+	}
+
+	return to_gt(i915)->clock_frequency;
+}
+
 /**
  * i915_oa_stream_init - validate combined props for OA stream and init
  * @stream: An i915 perf stream
@@ -3904,8 +3928,10 @@  i915_perf_open_ioctl_locked(struct i915_perf *perf,
 
 static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
 {
-	return intel_gt_clock_interval_to_ns(to_gt(perf->i915),
-					     2ULL << exponent);
+	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
+	u32 den = i915_perf_oa_timestamp_frequency(perf->i915);
+
+	return div_u64(nom + den - 1, den);
 }
 
 static __always_inline bool
diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h
index 1d1329e5af3a..f96e09a4af04 100644
--- a/drivers/gpu/drm/i915/i915_perf.h
+++ b/drivers/gpu/drm/i915/i915_perf.h
@@ -57,4 +57,6 @@  static inline void i915_oa_config_put(struct i915_oa_config *oa_config)
 	kref_put(&oa_config->ref, i915_oa_config_release);
 }
 
+u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915);
+
 #endif /* __I915_PERF_H__ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index d20d723925b5..5e42e94ea534 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -749,6 +749,12 @@  typedef struct drm_i915_irq_wait {
 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */
 #define I915_PARAM_HAS_USERPTR_PROBE 56
 
+/*
+ * Frequency of the timestamps in OA reports. This used to be the same as the CS
+ * timestamp frequency, but differs on some platforms.
+ */
+#define I915_PARAM_OA_TIMESTAMP_FREQUENCY 57
+
 /* Must be kept compact -- no holes and well documented */
 
 /**