Message ID | 20171110190116.36669-1-michel.thierry@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Michel Thierry (2017-11-10 19:01:16) > Until Haswell/Baytrail, the hardware used to have a per engine fault > register (e.g. 0x4094 - render fault register, 0x4194 - media fault > register and so on). But since Broadwell, all these registers were > combined into a singe one and the engine id stored in bits 14:12. > > Not only we should not been reading (and writing to) registers that do > not exist, in platforms with VCS2 (SKL), the address that would belong > this engine (0x4494, VCS2_HW = 4) is already assigned to other register. > > References: IHD-OS-BDW-Vol 2c-11.15, page 75. > References: IHD-OS-SKL-Vol 2c-05.16, page 350. > Signed-off-by: Michel Thierry <michel.thierry@intel.com> > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++++++++----- > drivers/gpu/drm/i915/i915_gpu_error.c | 8 +++++--- > drivers/gpu/drm/i915/i915_reg.h | 2 ++ > 3 files changed, 40 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 1e40eeb31f9d..66a907330ad2 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv) > return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active(); > } > > -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) > +void __check_and_clear_faults(struct drm_i915_private *dev_priv) I am amazed that -Wextra doesn't complain. Try with sparse. There's an old thread where this was raised and how we are not clearing the faults early enough. -Chris
On 11/10/2017 12:51 PM, Chris Wilson wrote: > Quoting Michel Thierry (2017-11-10 19:01:16) >> Until Haswell/Baytrail, the hardware used to have a per engine fault >> register (e.g. 0x4094 - render fault register, 0x4194 - media fault >> register and so on). But since Broadwell, all these registers were >> combined into a singe one and the engine id stored in bits 14:12. >> >> Not only we should not been reading (and writing to) registers that do >> not exist, in platforms with VCS2 (SKL), the address that would belong >> this engine (0x4494, VCS2_HW = 4) is already assigned to other register. >> >> References: IHD-OS-BDW-Vol 2c-11.15, page 75. >> References: IHD-OS-SKL-Vol 2c-05.16, page 350. >> Signed-off-by: Michel Thierry <michel.thierry@intel.com> >> --- >> drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++++++++----- >> drivers/gpu/drm/i915/i915_gpu_error.c | 8 +++++--- >> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >> 3 files changed, 40 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c >> index 1e40eeb31f9d..66a907330ad2 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c >> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c >> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv) >> return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active(); >> } >> >> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) >> +void __check_and_clear_faults(struct drm_i915_private *dev_priv) > > I am amazed that -Wextra doesn't complain. Try with sparse. > Hmm, no they didn't... gen6_check_and_clear_faults and gen8_check_and_clear_faults are less controversial. > There's an old thread where this was raised and how we are not clearing > the faults early enough. Ah, I found that thread [1], and as you said there, right now the for_each_engine is a nop (with this patch at least gen8+ would do the right thing). Since gen6/gen7 engines are well known and can't change, do we really need for_each_engine? [1] https://patchwork.freedesktop.org/patch/101331/
Quoting Michel Thierry (2017-11-10 23:42:31) > On 11/10/2017 12:51 PM, Chris Wilson wrote: > > Quoting Michel Thierry (2017-11-10 19:01:16) > >> Until Haswell/Baytrail, the hardware used to have a per engine fault > >> register (e.g. 0x4094 - render fault register, 0x4194 - media fault > >> register and so on). But since Broadwell, all these registers were > >> combined into a singe one and the engine id stored in bits 14:12. > >> > >> Not only we should not been reading (and writing to) registers that do > >> not exist, in platforms with VCS2 (SKL), the address that would belong > >> this engine (0x4494, VCS2_HW = 4) is already assigned to other register. > >> > >> References: IHD-OS-BDW-Vol 2c-11.15, page 75. > >> References: IHD-OS-SKL-Vol 2c-05.16, page 350. > >> Signed-off-by: Michel Thierry <michel.thierry@intel.com> > >> --- > >> drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++++++++----- > >> drivers/gpu/drm/i915/i915_gpu_error.c | 8 +++++--- > >> drivers/gpu/drm/i915/i915_reg.h | 2 ++ > >> 3 files changed, 40 insertions(+), 8 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > >> index 1e40eeb31f9d..66a907330ad2 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > >> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > >> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv) > >> return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active(); > >> } > >> > >> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) > >> +void __check_and_clear_faults(struct drm_i915_private *dev_priv) > > > > I am amazed that -Wextra doesn't complain. Try with sparse. > > > > Hmm, no they didn't... gen6_check_and_clear_faults and > gen8_check_and_clear_faults are less controversial. The missing static and lack of extern declaration. I know sparse catches it, I guess for K&R the order is just right for it not to complain. > > There's an old thread where this was raised and how we are not clearing > > the faults early enough. > > Ah, I found that thread [1], and as you said there, right now the > for_each_engine is a nop (with this patch at least gen8+ would do the > right thing). > > Since gen6/gen7 engines are well known and can't change, do we really > need for_each_engine? For consistency, yeah it would be nice to keep for_each_engine. We should be ok to do the clear around i915_driver_init_mmio (it's mmio, it has to be there ;), maybe worth pulling it into intel_engines_init_mmio() or i915_gem_init_mmio(). I'm learning towards sanitization from within intel_engines_init_mmio() -Chris
On 11/10/2017 3:50 PM, Chris Wilson wrote: > Quoting Michel Thierry (2017-11-10 23:42:31) >> On 11/10/2017 12:51 PM, Chris Wilson wrote: >>> Quoting Michel Thierry (2017-11-10 19:01:16) >>>> Until Haswell/Baytrail, the hardware used to have a per engine fault >>>> register (e.g. 0x4094 - render fault register, 0x4194 - media fault >>>> register and so on). But since Broadwell, all these registers were >>>> combined into a singe one and the engine id stored in bits 14:12. >>>> >>>> Not only we should not been reading (and writing to) registers that do >>>> not exist, in platforms with VCS2 (SKL), the address that would belong >>>> this engine (0x4494, VCS2_HW = 4) is already assigned to other register. >>>> >>>> References: IHD-OS-BDW-Vol 2c-11.15, page 75. >>>> References: IHD-OS-SKL-Vol 2c-05.16, page 350. >>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> >>>> --- >>>> drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++++++++----- >>>> drivers/gpu/drm/i915/i915_gpu_error.c | 8 +++++--- >>>> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >>>> 3 files changed, 40 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c >>>> index 1e40eeb31f9d..66a907330ad2 100644 >>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c >>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c >>>> @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv) >>>> return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active(); >>>> } >>>> >>>> -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) >>>> +void __check_and_clear_faults(struct drm_i915_private *dev_priv) >>> >>> I am amazed that -Wextra doesn't complain. Try with sparse. >>> >> >> Hmm, no they didn't... gen6_check_and_clear_faults and >> gen8_check_and_clear_faults are less controversial. > > The missing static and lack of extern declaration. I know sparse catches > it, I guess for K&R the order is just right for it not to complain. > >>> There's an old thread where this was raised and how we are not clearing >>> the faults early enough. >> >> Ah, I found that thread [1], and as you said there, right now the >> for_each_engine is a nop (with this patch at least gen8+ would do the >> right thing). >> >> Since gen6/gen7 engines are well known and can't change, do we really >> need for_each_engine? > > For consistency, yeah it would be nice to keep for_each_engine. We should > be ok to do the clear around i915_driver_init_mmio (it's mmio, it has to > be there ;), maybe worth pulling it into intel_engines_init_mmio() or > i915_gem_init_mmio(). I'm learning towards sanitization from within > intel_engines_init_mmio() Ok, I'll move the check_and_clear_faults from intel_uncore_init to intel_engines_init_mmio (that's the one checking at driver load time). Thanks,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 1e40eeb31f9d..66a907330ad2 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2256,16 +2256,13 @@ static bool needs_idle_maps(struct drm_i915_private *dev_priv) return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active(); } -void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) +void __check_and_clear_faults(struct drm_i915_private *dev_priv) { struct intel_engine_cs *engine; enum intel_engine_id id; - - if (INTEL_INFO(dev_priv)->gen < 6) - return; + u32 fault_reg; for_each_engine(engine, dev_priv, id) { - u32 fault_reg; fault_reg = I915_READ(RING_FAULT_REG(engine)); if (fault_reg & RING_FAULT_VALID) { DRM_DEBUG_DRIVER("Unexpected fault\n" @@ -2281,6 +2278,37 @@ void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) fault_reg & ~RING_FAULT_VALID); } } +} + +void __gen8_check_and_clear_faults(struct drm_i915_private *dev_priv) +{ + u32 fault_reg = I915_READ(GEN8_RING_FAULT_REG); + + if (fault_reg & RING_FAULT_VALID) { + DRM_DEBUG_DRIVER("Unexpected fault\n" + "\tAddr: 0x%08lx\n" + "\tEngine ID: %d\n" + "\tSource ID: %d\n" + "\tType: %d\n", + fault_reg & PAGE_MASK, + GEN8_RING_FAULT_ENGINE_ID(fault_reg), + RING_FAULT_SRCID(fault_reg), + RING_FAULT_FAULT_TYPE(fault_reg)); + I915_WRITE(GEN8_RING_FAULT_REG, + fault_reg & ~RING_FAULT_VALID); + } +} + +void i915_check_and_clear_faults(struct drm_i915_private *dev_priv) +{ + if (INTEL_GEN(dev_priv) < 6) + return; + + /* From GEN8 onwards we only have one 'All Engine Fault Register' */ + if (INTEL_GEN(dev_priv) >= 8) + __gen8_check_and_clear_faults(dev_priv); + else + __check_and_clear_faults(dev_priv); /* Engine specific init may not have been done till this point. */ if (dev_priv->engine[RCS]) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 5c2d83a838d8..7481c8e1b5a8 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1217,11 +1217,13 @@ static void error_record_engine_registers(struct i915_gpu_state *error, if (INTEL_GEN(dev_priv) >= 6) { ee->rc_psmi = I915_READ(RING_PSMI_CTL(engine->mmio_base)); - ee->fault_reg = I915_READ(RING_FAULT_REG(engine)); - if (INTEL_GEN(dev_priv) >= 8) + if (INTEL_GEN(dev_priv) >= 8) { gen8_record_semaphore_state(error, engine, ee); - else + ee->fault_reg = I915_READ(GEN8_RING_FAULT_REG); + } else { gen6_record_semaphore_state(engine, ee); + ee->fault_reg = I915_READ(RING_FAULT_REG(engine)); + } } if (INTEL_GEN(dev_priv) >= 4) { diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 6ef33422f762..7ef82ba670e9 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2329,6 +2329,8 @@ enum i915_power_well_id { #define ARB_MODE_SWIZZLE_BDW (1<<1) #define RENDER_HWS_PGA_GEN7 _MMIO(0x04080) #define RING_FAULT_REG(engine) _MMIO(0x4094 + 0x100*(engine)->hw_id) +#define GEN8_RING_FAULT_REG _MMIO(0x4094) +#define GEN8_RING_FAULT_ENGINE_ID(x) (((x) >> 12) & 0x7) #define RING_FAULT_GTTSEL_MASK (1<<11) #define RING_FAULT_SRCID(x) (((x) >> 3) & 0xff) #define RING_FAULT_FAULT_TYPE(x) (((x) >> 1) & 0x3)
Until Haswell/Baytrail, the hardware used to have a per engine fault register (e.g. 0x4094 - render fault register, 0x4194 - media fault register and so on). But since Broadwell, all these registers were combined into a singe one and the engine id stored in bits 14:12. Not only we should not been reading (and writing to) registers that do not exist, in platforms with VCS2 (SKL), the address that would belong this engine (0x4494, VCS2_HW = 4) is already assigned to other register. References: IHD-OS-BDW-Vol 2c-11.15, page 75. References: IHD-OS-SKL-Vol 2c-05.16, page 350. Signed-off-by: Michel Thierry <michel.thierry@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++++++++----- drivers/gpu/drm/i915/i915_gpu_error.c | 8 +++++--- drivers/gpu/drm/i915/i915_reg.h | 2 ++ 3 files changed, 40 insertions(+), 8 deletions(-)