diff mbox

drm/i915: There is only one fault register from GEN8 onwards

Message ID 20171110190116.36669-1-michel.thierry@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michel Thierry Nov. 10, 2017, 7:01 p.m. UTC
Until Haswell/Baytrail, the hardware used to have a per engine fault
register (e.g. 0x4094 - render fault register, 0x4194 - media fault
register and so on). But since Broadwell, all these registers were
combined into a singe one and the engine id stored in bits 14:12.

Not only we should not been reading (and writing to) registers that do
not exist, in platforms with VCS2 (SKL), the address that would belong
this engine (0x4494, VCS2_HW = 4) is already assigned to other register.

References: IHD-OS-BDW-Vol 2c-11.15, page 75.
References: IHD-OS-SKL-Vol 2c-05.16, page 350.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c   | 38 ++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gpu_error.c |  8 +++++---
 drivers/gpu/drm/i915/i915_reg.h       |  2 ++
 3 files changed, 40 insertions(+), 8 deletions(-)

Comments

Chris Wilson Nov. 10, 2017, 8:51 p.m. UTC | #1
Quoting Michel Thierry (2017-11-10 19:01:16)
> Until Haswell/Baytrail, the hardware used to have a per engine fault
> register (e.g. 0x4094 - render fault register, 0x4194 - media fault
> register and so on). But since Broadwell, all these registers were
> combined into a singe one and the engine id stored in bits 14:12.
> 
> Not only we should not been reading (and writing to) registers that do
> not exist, in platforms with VCS2 (SKL), the address that would belong
> this engine (0x4494, VCS2_HW = 4) is already assigned to other register.
> 
> References: IHD-OS-BDW-Vol 2c-11.15, page 75.
> References: IHD-OS-SKL-Vol 2c-05.16, page 350.
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c   | 38 ++++++++++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/i915_gpu_error.c |  8 +++++---
>  drivers/gpu/drm/i915/i915_reg.h       |  2 ++
>  3 files changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 1e40eeb31f9d..66a907330ad2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv)
>         return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active();
>  }
>  
> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
> +void __check_and_clear_faults(struct drm_i915_private *dev_priv)

I am amazed that -Wextra doesn't complain. Try with sparse.

There's an old thread where this was raised and how we are not clearing
the faults early enough.
-Chris
Michel Thierry Nov. 10, 2017, 11:42 p.m. UTC | #2
On 11/10/2017 12:51 PM, Chris Wilson wrote:
> Quoting Michel Thierry (2017-11-10 19:01:16)
>> Until Haswell/Baytrail, the hardware used to have a per engine fault
>> register (e.g. 0x4094 - render fault register, 0x4194 - media fault
>> register and so on). But since Broadwell, all these registers were
>> combined into a singe one and the engine id stored in bits 14:12.
>>
>> Not only we should not been reading (and writing to) registers that do
>> not exist, in platforms with VCS2 (SKL), the address that would belong
>> this engine (0x4494, VCS2_HW = 4) is already assigned to other register.
>>
>> References: IHD-OS-BDW-Vol 2c-11.15, page 75.
>> References: IHD-OS-SKL-Vol 2c-05.16, page 350.
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c   | 38 ++++++++++++++++++++++++++++++-----
>>   drivers/gpu/drm/i915/i915_gpu_error.c |  8 +++++---
>>   drivers/gpu/drm/i915/i915_reg.h       |  2 ++
>>   3 files changed, 40 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 1e40eeb31f9d..66a907330ad2 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv)
>>          return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active();
>>   }
>>   
>> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
>> +void __check_and_clear_faults(struct drm_i915_private *dev_priv)
> 
> I am amazed that -Wextra doesn't complain. Try with sparse.
>

Hmm, no they didn't... gen6_check_and_clear_faults and 
gen8_check_and_clear_faults are less controversial.

> There's an old thread where this was raised and how we are not clearing
> the faults early enough.

Ah, I found that thread [1], and as you said there, right now the 
for_each_engine is a nop (with this patch at least gen8+ would do the 
right thing).

Since gen6/gen7 engines are well known and can't change, do we really 
need for_each_engine?

[1] https://patchwork.freedesktop.org/patch/101331/
Chris Wilson Nov. 10, 2017, 11:50 p.m. UTC | #3
Quoting Michel Thierry (2017-11-10 23:42:31)
> On 11/10/2017 12:51 PM, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-11-10 19:01:16)
> >> Until Haswell/Baytrail, the hardware used to have a per engine fault
> >> register (e.g. 0x4094 - render fault register, 0x4194 - media fault
> >> register and so on). But since Broadwell, all these registers were
> >> combined into a singe one and the engine id stored in bits 14:12.
> >>
> >> Not only we should not been reading (and writing to) registers that do
> >> not exist, in platforms with VCS2 (SKL), the address that would belong
> >> this engine (0x4494, VCS2_HW = 4) is already assigned to other register.
> >>
> >> References: IHD-OS-BDW-Vol 2c-11.15, page 75.
> >> References: IHD-OS-SKL-Vol 2c-05.16, page 350.
> >> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_gtt.c   | 38 ++++++++++++++++++++++++++++++-----
> >>   drivers/gpu/drm/i915/i915_gpu_error.c |  8 +++++---
> >>   drivers/gpu/drm/i915/i915_reg.h       |  2 ++
> >>   3 files changed, 40 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >> index 1e40eeb31f9d..66a907330ad2 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv)
> >>          return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active();
> >>   }
> >>   
> >> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
> >> +void __check_and_clear_faults(struct drm_i915_private *dev_priv)
> > 
> > I am amazed that -Wextra doesn't complain. Try with sparse.
> >
> 
> Hmm, no they didn't... gen6_check_and_clear_faults and 
> gen8_check_and_clear_faults are less controversial.

The missing static and lack of extern declaration. I know sparse catches
it, I guess for K&R the order is just right for it not to complain.

> > There's an old thread where this was raised and how we are not clearing
> > the faults early enough.
> 
> Ah, I found that thread [1], and as you said there, right now the 
> for_each_engine is a nop (with this patch at least gen8+ would do the 
> right thing).
> 
> Since gen6/gen7 engines are well known and can't change, do we really 
> need for_each_engine?

For consistency, yeah it would be nice to keep for_each_engine. We should
be ok to do the clear around i915_driver_init_mmio (it's mmio, it has to
be there ;), maybe worth pulling it into intel_engines_init_mmio() or
i915_gem_init_mmio(). I'm learning towards sanitization from within
intel_engines_init_mmio()
-Chris
Michel Thierry Nov. 11, 2017, 12:15 a.m. UTC | #4
On 11/10/2017 3:50 PM, Chris Wilson wrote:
> Quoting Michel Thierry (2017-11-10 23:42:31)
>> On 11/10/2017 12:51 PM, Chris Wilson wrote:
>>> Quoting Michel Thierry (2017-11-10 19:01:16)
>>>> Until Haswell/Baytrail, the hardware used to have a per engine fault
>>>> register (e.g. 0x4094 - render fault register, 0x4194 - media fault
>>>> register and so on). But since Broadwell, all these registers were
>>>> combined into a singe one and the engine id stored in bits 14:12.
>>>>
>>>> Not only we should not been reading (and writing to) registers that do
>>>> not exist, in platforms with VCS2 (SKL), the address that would belong
>>>> this engine (0x4494, VCS2_HW = 4) is already assigned to other register.
>>>>
>>>> References: IHD-OS-BDW-Vol 2c-11.15, page 75.
>>>> References: IHD-OS-SKL-Vol 2c-05.16, page 350.
>>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem_gtt.c   | 38 ++++++++++++++++++++++++++++++-----
>>>>    drivers/gpu/drm/i915/i915_gpu_error.c |  8 +++++---
>>>>    drivers/gpu/drm/i915/i915_reg.h       |  2 ++
>>>>    3 files changed, 40 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> index 1e40eeb31f9d..66a907330ad2 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv)
>>>>           return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active();
>>>>    }
>>>>    
>>>> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
>>>> +void __check_and_clear_faults(struct drm_i915_private *dev_priv)
>>>
>>> I am amazed that -Wextra doesn't complain. Try with sparse.
>>>
>>
>> Hmm, no they didn't... gen6_check_and_clear_faults and
>> gen8_check_and_clear_faults are less controversial.
> 
> The missing static and lack of extern declaration. I know sparse catches
> it, I guess for K&R the order is just right for it not to complain.
> 
>>> There's an old thread where this was raised and how we are not clearing
>>> the faults early enough.
>>
>> Ah, I found that thread [1], and as you said there, right now the
>> for_each_engine is a nop (with this patch at least gen8+ would do the
>> right thing).
>>
>> Since gen6/gen7 engines are well known and can't change, do we really
>> need for_each_engine?
> 
> For consistency, yeah it would be nice to keep for_each_engine. We should
> be ok to do the clear around i915_driver_init_mmio (it's mmio, it has to
> be there ;), maybe worth pulling it into intel_engines_init_mmio() or
> i915_gem_init_mmio(). I'm learning towards sanitization from within
> intel_engines_init_mmio()

Ok, I'll move the check_and_clear_faults from intel_uncore_init to 
intel_engines_init_mmio (that's the one checking at driver load time).

Thanks,
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1e40eeb31f9d..66a907330ad2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2256,16 +2256,13 @@  static bool needs_idle_maps(struct drm_i915_private *dev_priv)
 	return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active();
 }
 
-void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
+void __check_and_clear_faults(struct drm_i915_private *dev_priv)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
-
-	if (INTEL_INFO(dev_priv)->gen < 6)
-		return;
+	u32 fault_reg;
 
 	for_each_engine(engine, dev_priv, id) {
-		u32 fault_reg;
 		fault_reg = I915_READ(RING_FAULT_REG(engine));
 		if (fault_reg & RING_FAULT_VALID) {
 			DRM_DEBUG_DRIVER("Unexpected fault\n"
@@ -2281,6 +2278,37 @@  void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
 				   fault_reg & ~RING_FAULT_VALID);
 		}
 	}
+}
+
+void __gen8_check_and_clear_faults(struct drm_i915_private *dev_priv)
+{
+	u32 fault_reg = I915_READ(GEN8_RING_FAULT_REG);
+
+	if (fault_reg & RING_FAULT_VALID) {
+		DRM_DEBUG_DRIVER("Unexpected fault\n"
+				 "\tAddr: 0x%08lx\n"
+				 "\tEngine ID: %d\n"
+				 "\tSource ID: %d\n"
+				 "\tType: %d\n",
+				 fault_reg & PAGE_MASK,
+				 GEN8_RING_FAULT_ENGINE_ID(fault_reg),
+				 RING_FAULT_SRCID(fault_reg),
+				 RING_FAULT_FAULT_TYPE(fault_reg));
+		I915_WRITE(GEN8_RING_FAULT_REG,
+			   fault_reg & ~RING_FAULT_VALID);
+	}
+}
+
+void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
+{
+	if (INTEL_GEN(dev_priv) < 6)
+		return;
+
+	/* From GEN8 onwards we only have one 'All Engine Fault Register' */
+	if (INTEL_GEN(dev_priv) >= 8)
+		__gen8_check_and_clear_faults(dev_priv);
+	else
+		__check_and_clear_faults(dev_priv);
 
 	/* Engine specific init may not have been done till this point. */
 	if (dev_priv->engine[RCS])
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5c2d83a838d8..7481c8e1b5a8 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1217,11 +1217,13 @@  static void error_record_engine_registers(struct i915_gpu_state *error,
 
 	if (INTEL_GEN(dev_priv) >= 6) {
 		ee->rc_psmi = I915_READ(RING_PSMI_CTL(engine->mmio_base));
-		ee->fault_reg = I915_READ(RING_FAULT_REG(engine));
-		if (INTEL_GEN(dev_priv) >= 8)
+		if (INTEL_GEN(dev_priv) >= 8) {
 			gen8_record_semaphore_state(error, engine, ee);
-		else
+			ee->fault_reg = I915_READ(GEN8_RING_FAULT_REG);
+		} else {
 			gen6_record_semaphore_state(engine, ee);
+			ee->fault_reg = I915_READ(RING_FAULT_REG(engine));
+		}
 	}
 
 	if (INTEL_GEN(dev_priv) >= 4) {
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6ef33422f762..7ef82ba670e9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2329,6 +2329,8 @@  enum i915_power_well_id {
 #define   ARB_MODE_SWIZZLE_BDW	(1<<1)
 #define RENDER_HWS_PGA_GEN7	_MMIO(0x04080)
 #define RING_FAULT_REG(engine)	_MMIO(0x4094 + 0x100*(engine)->hw_id)
+#define GEN8_RING_FAULT_REG	_MMIO(0x4094)
+#define   GEN8_RING_FAULT_ENGINE_ID(x)	(((x) >> 12) & 0x7)
 #define   RING_FAULT_GTTSEL_MASK (1<<11)
 #define   RING_FAULT_SRCID(x)	(((x) >> 3) & 0xff)
 #define   RING_FAULT_FAULT_TYPE(x) (((x) >> 1) & 0x3)