diff mbox series

[3/4] drm/i915/perf: Whitelist OA counter and buffer registers

Message ID 20200724001901.35662-4-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series Allow privileged user to map the OA buffer | expand

Commit Message

Umesh Nerlige Ramappa July 24, 2020, 12:19 a.m. UTC
From: Piotr Maciejewski <piotr.maciejewski@intel.com>

It is useful to have markers in the OA reports to identify triggered
reports. Whitelist some OA counters that can be used as markers.

A triggered report can be found faster if we can sample the HW tail and
head registers when the report was triggered. Whitelist OA buffer
specific registers.

v2:
- Bump up the perf revision (Lionel)
- Use indexing for counters (Lionel)
- Fix selftest for oa ticking register (Umesh)

v3: Pardon whitelisted registers for selftest (Umesh)

v4:
- Document whitelisted registers (Lionel)
- Fix live isolated whitelist for OA regs (Umesh)

Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   | 34 +++++++++++++++++++
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 30 +++++++++++++++-
 drivers/gpu/drm/i915/i915_perf.c              |  8 ++++-
 drivers/gpu/drm/i915/i915_reg.h               | 10 ++++++
 4 files changed, 80 insertions(+), 2 deletions(-)

Comments

Chris Wilson July 24, 2020, 8:55 a.m. UTC | #1
Quoting Umesh Nerlige Ramappa (2020-07-24 01:19:00)
> From: Piotr Maciejewski <piotr.maciejewski@intel.com>
> 
> It is useful to have markers in the OA reports to identify triggered
> reports. Whitelist some OA counters that can be used as markers.
> 
> A triggered report can be found faster if we can sample the HW tail and
> head registers when the report was triggered. Whitelist OA buffer
> specific registers.
> 
> v2:
> - Bump up the perf revision (Lionel)
> - Use indexing for counters (Lionel)
> - Fix selftest for oa ticking register (Umesh)
> 
> v3: Pardon whitelisted registers for selftest (Umesh)
> 
> v4:
> - Document whitelisted registers (Lionel)
> - Fix live isolated whitelist for OA regs (Umesh)
> 
> Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 34 +++++++++++++++++++
>  .../gpu/drm/i915/gt/selftest_workarounds.c    | 30 +++++++++++++++-
>  drivers/gpu/drm/i915/i915_perf.c              |  8 ++++-
>  drivers/gpu/drm/i915/i915_reg.h               | 10 ++++++
>  4 files changed, 80 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index a72ebfd115e5..c950d07beec3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -1392,6 +1392,23 @@ static void gen9_whitelist_build_performance_counters(struct i915_wa_list *w)
>         /* OA buffer trigger report 2/6 used by performance query */
>         whitelist_reg(w, OAREPORTTRIG2);
>         whitelist_reg(w, OAREPORTTRIG6);
> +
> +       /* Performance counters A18-20 used by tbs marker query */
> +       whitelist_reg_ext(w, OA_PERF_COUNTER_A(18),
> +                         RING_FORCE_TO_NONPRIV_ACCESS_RW |
> +                         RING_FORCE_TO_NONPRIV_RANGE_4);
> +
> +       whitelist_reg(w, OA_PERF_COUNTER_A(20));
> +       whitelist_reg(w, OA_PERF_COUNTER_A_UPPER(20));
> +
> +       /* Read access to gpu ticks */
> +       whitelist_reg_ext(w, GEN8_GPU_TICKS,
> +                         RING_FORCE_TO_NONPRIV_ACCESS_RD);
> +
> +       /* Read access to: oa status, head, tail, buffer settings */
> +       whitelist_reg_ext(w, GEN8_OASTATUS,
> +                         RING_FORCE_TO_NONPRIV_ACCESS_RD |
> +                         RING_FORCE_TO_NONPRIV_RANGE_4);

Great. This completely fills RING_MAX_NONPRIV_SLOTS, with over half the
slots going to OA. That does not seem sustainable.

I did not think the extended whitelist settings were available before
cml.
-Chris
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Umesh Nerlige Ramappa July 27, 2020, 7:34 p.m. UTC | #2
On Fri, Jul 24, 2020 at 09:55:35AM +0100, Chris Wilson wrote:
>Quoting Umesh Nerlige Ramappa (2020-07-24 01:19:00)
>> From: Piotr Maciejewski <piotr.maciejewski@intel.com>
>>
>> It is useful to have markers in the OA reports to identify triggered
>> reports. Whitelist some OA counters that can be used as markers.
>>
>> A triggered report can be found faster if we can sample the HW tail and
>> head registers when the report was triggered. Whitelist OA buffer
>> specific registers.
>>
>> v2:
>> - Bump up the perf revision (Lionel)
>> - Use indexing for counters (Lionel)
>> - Fix selftest for oa ticking register (Umesh)
>>
>> v3: Pardon whitelisted registers for selftest (Umesh)
>>
>> v4:
>> - Document whitelisted registers (Lionel)
>> - Fix live isolated whitelist for OA regs (Umesh)
>>
>> Signed-off-by: Piotr Maciejewski <piotr.maciejewski@intel.com>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 34 +++++++++++++++++++
>>  .../gpu/drm/i915/gt/selftest_workarounds.c    | 30 +++++++++++++++-
>>  drivers/gpu/drm/i915/i915_perf.c              |  8 ++++-
>>  drivers/gpu/drm/i915/i915_reg.h               | 10 ++++++
>>  4 files changed, 80 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> index a72ebfd115e5..c950d07beec3 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> @@ -1392,6 +1392,23 @@ static void gen9_whitelist_build_performance_counters(struct i915_wa_list *w)
>>         /* OA buffer trigger report 2/6 used by performance query */
>>         whitelist_reg(w, OAREPORTTRIG2);
>>         whitelist_reg(w, OAREPORTTRIG6);
>> +
>> +       /* Performance counters A18-20 used by tbs marker query */
>> +       whitelist_reg_ext(w, OA_PERF_COUNTER_A(18),
>> +                         RING_FORCE_TO_NONPRIV_ACCESS_RW |
>> +                         RING_FORCE_TO_NONPRIV_RANGE_4);
>> +
>> +       whitelist_reg(w, OA_PERF_COUNTER_A(20));
>> +       whitelist_reg(w, OA_PERF_COUNTER_A_UPPER(20));
>> +
>> +       /* Read access to gpu ticks */
>> +       whitelist_reg_ext(w, GEN8_GPU_TICKS,
>> +                         RING_FORCE_TO_NONPRIV_ACCESS_RD);
>> +
>> +       /* Read access to: oa status, head, tail, buffer settings */
>> +       whitelist_reg_ext(w, GEN8_OASTATUS,
>> +                         RING_FORCE_TO_NONPRIV_ACCESS_RD |
>> +                         RING_FORCE_TO_NONPRIV_RANGE_4);
>
>Great. This completely fills RING_MAX_NONPRIV_SLOTS, with over half the
>slots going to OA. That does not seem sustainable.
>
>I did not think the extended whitelist settings were available before
>cml.

Looks like we can remove A20, A20_upper and gpu ticks to free up 3 
slots. Will post that in the new series. Hoping that will do for now.

Thanks,
Umesh

>-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index a72ebfd115e5..c950d07beec3 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1392,6 +1392,23 @@  static void gen9_whitelist_build_performance_counters(struct i915_wa_list *w)
 	/* OA buffer trigger report 2/6 used by performance query */
 	whitelist_reg(w, OAREPORTTRIG2);
 	whitelist_reg(w, OAREPORTTRIG6);
+
+	/* Performance counters A18-20 used by tbs marker query */
+	whitelist_reg_ext(w, OA_PERF_COUNTER_A(18),
+			  RING_FORCE_TO_NONPRIV_ACCESS_RW |
+			  RING_FORCE_TO_NONPRIV_RANGE_4);
+
+	whitelist_reg(w, OA_PERF_COUNTER_A(20));
+	whitelist_reg(w, OA_PERF_COUNTER_A_UPPER(20));
+
+	/* Read access to gpu ticks */
+	whitelist_reg_ext(w, GEN8_GPU_TICKS,
+			  RING_FORCE_TO_NONPRIV_ACCESS_RD);
+
+	/* Read access to: oa status, head, tail, buffer settings */
+	whitelist_reg_ext(w, GEN8_OASTATUS,
+			  RING_FORCE_TO_NONPRIV_ACCESS_RD |
+			  RING_FORCE_TO_NONPRIV_RANGE_4);
 }
 
 static void gen12_whitelist_build_performance_counters(struct i915_wa_list *w)
@@ -1399,6 +1416,23 @@  static void gen12_whitelist_build_performance_counters(struct i915_wa_list *w)
 	/* OA buffer trigger report 2/6 used by performance query */
 	whitelist_reg(w, GEN12_OAG_OAREPORTTRIG2);
 	whitelist_reg(w, GEN12_OAG_OAREPORTTRIG6);
+
+	/* Performance counters A18-20 used by tbs marker query */
+	whitelist_reg_ext(w, GEN12_OAG_PERF_COUNTER_A(18),
+			  RING_FORCE_TO_NONPRIV_ACCESS_RW |
+			  RING_FORCE_TO_NONPRIV_RANGE_4);
+
+	whitelist_reg(w, GEN12_OAG_PERF_COUNTER_A(20));
+	whitelist_reg(w, GEN12_OAG_PERF_COUNTER_A_UPPER(20));
+
+	/* Read access to gpu ticks */
+	whitelist_reg_ext(w, GEN12_OAG_GPU_TICKS,
+			  RING_FORCE_TO_NONPRIV_ACCESS_RD);
+
+	/* Read access to: oa status, head, tail, buffer settings */
+	whitelist_reg_ext(w, GEN12_OAG_OASTATUS,
+			  RING_FORCE_TO_NONPRIV_ACCESS_RD |
+			  RING_FORCE_TO_NONPRIV_RANGE_4);
 }
 
 static void gen9_whitelist_build(struct i915_wa_list *w)
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 3b1d3dbcd477..7c2c2be8d212 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -431,6 +431,19 @@  static bool timestamp(const struct intel_engine_cs *engine, u32 reg)
 	}
 }
 
+static bool oa_gpu_ticks(u32 reg)
+{
+	reg = reg & ~RING_FORCE_TO_NONPRIV_ACCESS_MASK;
+	switch (reg) {
+	case 0x2910:
+	case 0xda90:
+		return true;
+
+	default:
+		return false;
+	}
+}
+
 static bool ro_register(u32 reg)
 {
 	if ((reg & RING_FORCE_TO_NONPRIV_ACCESS_MASK) ==
@@ -511,7 +524,7 @@  static int check_dirty_whitelist(struct intel_context *ce)
 		if (wo_register(engine, reg))
 			continue;
 
-		if (timestamp(engine, reg))
+		if (timestamp(engine, reg) || oa_gpu_ticks(reg))
 			continue; /* timestamps are expected to autoincrement */
 
 		ro_reg = ro_register(reg);
@@ -918,6 +931,9 @@  static bool find_reg(struct drm_i915_private *i915,
 {
 	u32 offset = i915_mmio_reg_offset(reg);
 
+	/* Clear non priv flags */
+	offset &= RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
+
 	while (count--) {
 		if (INTEL_INFO(i915)->gen_mask & tbl->gen_mask &&
 		    i915_mmio_reg_offset(tbl->reg) == offset)
@@ -938,6 +954,12 @@  static bool pardon_reg(struct drm_i915_private *i915, i915_reg_t reg)
 		{ OAREPORTTRIG6, INTEL_GEN_MASK(8, 11) },
 		{ GEN12_OAG_OAREPORTTRIG2, INTEL_GEN_MASK(12, 12) },
 		{ GEN12_OAG_OAREPORTTRIG6, INTEL_GEN_MASK(12, 12) },
+		{ OA_PERF_COUNTER_A(18), INTEL_GEN_MASK(8, 11) },
+		{ OA_PERF_COUNTER_A(20), INTEL_GEN_MASK(8, 11) },
+		{ OA_PERF_COUNTER_A_UPPER(20), INTEL_GEN_MASK(8, 11) },
+		{ GEN12_OAG_PERF_COUNTER_A(18), INTEL_GEN_MASK(12, 12) },
+		{ GEN12_OAG_PERF_COUNTER_A(20), INTEL_GEN_MASK(12, 12) },
+		{ GEN12_OAG_PERF_COUNTER_A_UPPER(20), INTEL_GEN_MASK(12, 12) },
 	};
 
 	return find_reg(i915, reg, pardon, ARRAY_SIZE(pardon));
@@ -964,6 +986,12 @@  static bool writeonly_reg(struct drm_i915_private *i915, i915_reg_t reg)
 		{ OAREPORTTRIG6, INTEL_GEN_MASK(8, 11) },
 		{ GEN12_OAG_OAREPORTTRIG2, INTEL_GEN_MASK(12, 12) },
 		{ GEN12_OAG_OAREPORTTRIG6, INTEL_GEN_MASK(12, 12) },
+		{ OA_PERF_COUNTER_A(18), INTEL_GEN_MASK(8, 11) },
+		{ OA_PERF_COUNTER_A(20), INTEL_GEN_MASK(8, 11) },
+		{ OA_PERF_COUNTER_A_UPPER(20), INTEL_GEN_MASK(8, 11) },
+		{ GEN12_OAG_PERF_COUNTER_A(18), INTEL_GEN_MASK(12, 12) },
+		{ GEN12_OAG_PERF_COUNTER_A(20), INTEL_GEN_MASK(12, 12) },
+		{ GEN12_OAG_PERF_COUNTER_A_UPPER(20), INTEL_GEN_MASK(12, 12) },
 	};
 
 	return find_reg(i915, reg, wo, ARRAY_SIZE(wo));
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 30f6aeb819aa..2f23aad12c60 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4450,8 +4450,14 @@  int i915_perf_ioctl_version(void)
 	 *
 	 * 6: Whitelist OATRIGGER registers to allow user to trigger reports
 	 *    into the OA buffer. This applies only to gen8+.
+	 *
+	 * 7: Whitelist below OA registers for user to identify the location of
+	 *    triggered reports in the OA buffer. This applies only to gen8+.
+	 *
+	 *    - OA buffer head/tail/status/buffer registers for read only
+	 *    - OA counters A18, A19, A20 for read/write
 	 */
-	return 6;
+	return 7;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 9cc3e312b6b7..c68dc3f39e62 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -675,6 +675,7 @@  static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define  GEN7_OASTATUS2_HEAD_MASK           0xffffffc0
 #define  GEN7_OASTATUS2_MEM_SELECT_GGTT     (1 << 0) /* 0: PPGTT, 1: GGTT */
 
+#define GEN8_GPU_TICKS _MMIO(0x2910)
 #define GEN8_OASTATUS _MMIO(0x2b08)
 #define  GEN8_OASTATUS_OVERRUN_STATUS	    (1 << 3)
 #define  GEN8_OASTATUS_COUNTER_OVERFLOW     (1 << 2)
@@ -733,6 +734,7 @@  static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define  GEN12_OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS     (1 << 2)
 #define  GEN12_OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS (1 << 1)
 
+#define GEN12_OAG_GPU_TICKS _MMIO(0xda90)
 #define GEN12_OAG_OASTATUS _MMIO(0xdafc)
 #define  GEN12_OAG_OASTATUS_COUNTER_OVERFLOW (1 << 2)
 #define  GEN12_OAG_OASTATUS_BUFFER_OVERFLOW  (1 << 1)
@@ -974,6 +976,14 @@  static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define OAREPORTTRIG8_NOA_SELECT_6_SHIFT    24
 #define OAREPORTTRIG8_NOA_SELECT_7_SHIFT    28
 
+/* Performance counters registers */
+#define OA_PERF_COUNTER_A(idx)       _MMIO(0x2800 + 8 * (idx))
+#define OA_PERF_COUNTER_A_UPPER(idx) _MMIO(0x2800 + 8 * (idx) + 4)
+
+/* Gen12 Performance counters registers */
+#define GEN12_OAG_PERF_COUNTER_A(idx)       _MMIO(0xD980 + 8 * (idx))
+#define GEN12_OAG_PERF_COUNTER_A_UPPER(idx) _MMIO(0xD980 + 8 * (idx) + 4)
+
 /* Same layout as OASTARTTRIGX */
 #define GEN12_OAG_OASTARTTRIG1 _MMIO(0xd900)
 #define GEN12_OAG_OASTARTTRIG2 _MMIO(0xd904)