Message ID | 20170418202335.35232-14-michel.thierry@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 18/04/17 13:23, Michel Thierry wrote: > From: Arun Siluvery <arun.siluvery@linux.intel.com> > > GuC expects a list of registers from the driver which are saved/restored > during engine reset. The type of value to be saved is controlled by > flags. We provide a minimal set of registers that we want GuC to save and > restore. This is not an issue in case of engine reset as driver initializes > most of them following an engine reset, but in case of media reset (aka > watchdog reset) which is completely internal to GuC (including resubmission > of hung workload), it is necessary to provide this list, otherwise GuC won't > be able to schedule further workloads after a reset. This is the minimal > set of registers identified for things to work as expected but if we see > any new issues, this register list can be expanded. > > v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE > and current value from RING_MODE reg instead; no need to preserve > head/tail either, be extra paranoid and save whitelisted registers (Daniele). > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > Signed-off-by: Jeff McGee <jeff.mcgee@intel.com> > Signed-off-by: Michel Thierry <michel.thierry@intel.com> > --- > drivers/gpu/drm/i915/i915_guc_submission.c | 60 +++++++++++++++++++++++++++++- > 1 file changed, 58 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c > index 1ea36a88d2fb..d772718861df 100644 > --- a/drivers/gpu/drm/i915/i915_guc_submission.c > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c > @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct guc_policies *policies) > policies->is_valid = 1; > } > > +/* > + * In this macro it is highly unlikely to exceed max value but even if we did > + * it is not an error so just throw a warning and continue. Only side effect > + * in continuing further means some registers won't be added to save/restore > + * list. > + */ > +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue) \ > + do { \ > + u32 __count = node->number_of_registers; \ > + if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS)) \ > + continue; \ > + node->registers[__count].offset = reg_addr.reg; \ > + node->registers[__count].flags = (_flags); \ > + if (defvalue) \ > + node->registers[__count].value = (defvalue); \ > + node->number_of_registers++; \ > + } while (0) > + > static int guc_ads_create(struct intel_guc *guc) > { > struct drm_i915_private *dev_priv = guc_to_i915(guc); > @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc) > u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE]; > } __packed *blob; > struct intel_engine_cs *engine; > + struct i915_workarounds *workarounds = &dev_priv->workarounds; > enum intel_engine_id id; > u32 base; > > @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc) > > /* MMIO reg state */ > for_each_engine(engine, dev_priv, id) { > + u32 i; > + struct guc_mmio_regset *eng_reg = > + &blob->reg_state.engine_reg[engine->guc_id]; > + > + /* > + * Provide a list of registers to be saved/restored during gpu > + * reset. This is mainly required for Media reset (aka watchdog > + * timeout) which is completely under the control of GuC > + * (resubmission of hung workload is handled inside GuC). > + */ > + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base), > + GUC_REGSET_ENGINERESET | > + GUC_REGSET_SAVE_CURRENT_VALUE, 0); > + > + /* > + * Workaround the guc issue with masked registers, note that > + * at this point guc submission is still disabled and the mode > + * register doesnt have the irq_steering bit set, which we > + * need to fwd irqs to GuC. > + */ > + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine), > + GUC_REGSET_ENGINERESET | > + GUC_REGSET_SAVE_DEFAULT_VALUE, > + I915_READ(RING_MODE_GEN7(engine)) | > + GFX_INTERRUPT_STEERING | (0xFFFF<<16)); > + > + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base), > + GUC_REGSET_ENGINERESET | > + GUC_REGSET_SAVE_CURRENT_VALUE, 0); > + I might just be too paranoid, but I think that we also have to add the registers that we use for WAs via mmio (i.e. not using an LRI in the ringbuffer). I did a quick test for the registers in the gen9_init_workarounds and skl_init_workarounds functions that we write using I915_WRITE and it looks like some of them lose their value after and RCS media reset: REG WA BITS VAL PRE-MR VAL POST-MR 0x20D4 (1<<2) 0x00000004 0x00000000 0xb11c (1<<2) 0x00000005 0x00000001 0x4090 (1<<8) (1<<25) 0x02000100 0x02000100 0xb118 (1<<21) 0x40600000 0x40400000 0x20e0 (1<<14) 0x00004000 0x00000000 0xB004 (1<<7) 0x29124180 0x29124100 For some reason the media reset count in debugfs is still showing zero though, so I might be doing something wrong. Can you double check what happens to those registers? Thanks, Daniele > + DRM_DEBUG_DRIVER("%s register save/restore count: %u\n", > + engine->name, eng_reg->number_of_registers); > + > blob->reg_state.white_list[engine->guc_id].mmio_start = > i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0)); > > @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc) > * inconsistencies with the handling of FORCE_TO_NONPRIV > * registers. > */ > - blob->reg_state.white_list[engine->guc_id].count = 0; > + blob->reg_state.white_list[engine->guc_id].count = > + workarounds->hw_whitelist_count[id]; > > - /* Nothing to be saved or restored for now. */ > + for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) { > + blob->reg_state.white_list[engine->guc_id].offsets[i] = > + I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i)); > + } > } > > /* >
On 18/04/17 17:26, Daniele Ceraolo Spurio wrote: > > > On 18/04/17 13:23, Michel Thierry wrote: >> From: Arun Siluvery <arun.siluvery@linux.intel.com> >> >> GuC expects a list of registers from the driver which are saved/restored >> during engine reset. The type of value to be saved is controlled by >> flags. We provide a minimal set of registers that we want GuC to save and >> restore. This is not an issue in case of engine reset as driver >> initializes >> most of them following an engine reset, but in case of media reset (aka >> watchdog reset) which is completely internal to GuC (including >> resubmission >> of hung workload), it is necessary to provide this list, otherwise GuC >> won't >> be able to schedule further workloads after a reset. This is the minimal >> set of registers identified for things to work as expected but if we see >> any new issues, this register list can be expanded. >> >> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE >> and current value from RING_MODE reg instead; no need to preserve >> head/tail either, be extra paranoid and save whitelisted registers >> (Daniele). >> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com> >> Signed-off-by: Michel Thierry <michel.thierry@intel.com> >> --- >> drivers/gpu/drm/i915/i915_guc_submission.c | 60 >> +++++++++++++++++++++++++++++- >> 1 file changed, 58 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c >> b/drivers/gpu/drm/i915/i915_guc_submission.c >> index 1ea36a88d2fb..d772718861df 100644 >> --- a/drivers/gpu/drm/i915/i915_guc_submission.c >> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c >> @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct >> guc_policies *policies) >> policies->is_valid = 1; >> } >> >> +/* >> + * In this macro it is highly unlikely to exceed max value but even >> if we did >> + * it is not an error so just throw a warning and continue. Only side >> effect >> + * in continuing further means some registers won't be added to >> save/restore >> + * list. >> + */ >> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue) \ >> + do { \ >> + u32 __count = node->number_of_registers; \ >> + if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS)) \ >> + continue; \ >> + node->registers[__count].offset = reg_addr.reg; \ >> + node->registers[__count].flags = (_flags); \ >> + if (defvalue) \ >> + node->registers[__count].value = (defvalue); \ >> + node->number_of_registers++; \ >> + } while (0) >> + >> static int guc_ads_create(struct intel_guc *guc) >> { >> struct drm_i915_private *dev_priv = guc_to_i915(guc); >> @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc) >> u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE]; >> } __packed *blob; >> struct intel_engine_cs *engine; >> + struct i915_workarounds *workarounds = &dev_priv->workarounds; >> enum intel_engine_id id; >> u32 base; >> >> @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc) >> >> /* MMIO reg state */ >> for_each_engine(engine, dev_priv, id) { >> + u32 i; >> + struct guc_mmio_regset *eng_reg = >> + &blob->reg_state.engine_reg[engine->guc_id]; >> + >> + /* >> + * Provide a list of registers to be saved/restored during gpu >> + * reset. This is mainly required for Media reset (aka watchdog >> + * timeout) which is completely under the control of GuC >> + * (resubmission of hung workload is handled inside GuC). >> + */ >> + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base), >> + GUC_REGSET_ENGINERESET | >> + GUC_REGSET_SAVE_CURRENT_VALUE, 0); >> + >> + /* >> + * Workaround the guc issue with masked registers, note that >> + * at this point guc submission is still disabled and the mode >> + * register doesnt have the irq_steering bit set, which we >> + * need to fwd irqs to GuC. >> + */ >> + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine), >> + GUC_REGSET_ENGINERESET | >> + GUC_REGSET_SAVE_DEFAULT_VALUE, >> + I915_READ(RING_MODE_GEN7(engine)) | >> + GFX_INTERRUPT_STEERING | (0xFFFF<<16)); >> + >> + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base), >> + GUC_REGSET_ENGINERESET | >> + GUC_REGSET_SAVE_CURRENT_VALUE, 0); >> + > > I might just be too paranoid, but I think that we also have to add the > registers that we use for WAs via mmio (i.e. not using an LRI in the > ringbuffer). I did a quick test for the registers in the > gen9_init_workarounds and skl_init_workarounds functions that we write > using I915_WRITE and it looks like some of them lose their value after > and RCS media reset: > > REG WA BITS VAL PRE-MR VAL POST-MR > 0x20D4 (1<<2) 0x00000004 0x00000000 > 0xb11c (1<<2) 0x00000005 0x00000001 > 0x4090 (1<<8) (1<<25) 0x02000100 0x02000100 > 0xb118 (1<<21) 0x40600000 0x40400000 > 0x20e0 (1<<14) 0x00004000 0x00000000 > 0xB004 (1<<7) 0x29124180 0x29124100 > Indeed. Kind of makes sense since i915 won't be aware of the reset at all (compared to the non-guc version). Btw it only seems to happen if the watchdog is triggered in the render engine. > For some reason the media reset count in debugfs is still showing zero > though, so I might be doing something wrong. Can you double check what > happens to those registers? > You need fw version 9.x+ > Thanks, > Daniele > >> + DRM_DEBUG_DRIVER("%s register save/restore count: %u\n", >> + engine->name, eng_reg->number_of_registers); >> + >> blob->reg_state.white_list[engine->guc_id].mmio_start = >> >> i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0)); >> >> @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc) >> * inconsistencies with the handling of FORCE_TO_NONPRIV >> * registers. >> */ >> - blob->reg_state.white_list[engine->guc_id].count = 0; >> + blob->reg_state.white_list[engine->guc_id].count = >> + workarounds->hw_whitelist_count[id]; >> >> - /* Nothing to be saved or restored for now. */ >> + for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) { >> + blob->reg_state.white_list[engine->guc_id].offsets[i] = >> + I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i)); >> + } >> } >> >> /* >>
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index 1ea36a88d2fb..d772718861df 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct guc_policies *policies) policies->is_valid = 1; } +/* + * In this macro it is highly unlikely to exceed max value but even if we did + * it is not an error so just throw a warning and continue. Only side effect + * in continuing further means some registers won't be added to save/restore + * list. + */ +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue) \ + do { \ + u32 __count = node->number_of_registers; \ + if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS)) \ + continue; \ + node->registers[__count].offset = reg_addr.reg; \ + node->registers[__count].flags = (_flags); \ + if (defvalue) \ + node->registers[__count].value = (defvalue); \ + node->number_of_registers++; \ + } while (0) + static int guc_ads_create(struct intel_guc *guc) { struct drm_i915_private *dev_priv = guc_to_i915(guc); @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc) u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE]; } __packed *blob; struct intel_engine_cs *engine; + struct i915_workarounds *workarounds = &dev_priv->workarounds; enum intel_engine_id id; u32 base; @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc) /* MMIO reg state */ for_each_engine(engine, dev_priv, id) { + u32 i; + struct guc_mmio_regset *eng_reg = + &blob->reg_state.engine_reg[engine->guc_id]; + + /* + * Provide a list of registers to be saved/restored during gpu + * reset. This is mainly required for Media reset (aka watchdog + * timeout) which is completely under the control of GuC + * (resubmission of hung workload is handled inside GuC). + */ + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base), + GUC_REGSET_ENGINERESET | + GUC_REGSET_SAVE_CURRENT_VALUE, 0); + + /* + * Workaround the guc issue with masked registers, note that + * at this point guc submission is still disabled and the mode + * register doesnt have the irq_steering bit set, which we + * need to fwd irqs to GuC. + */ + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine), + GUC_REGSET_ENGINERESET | + GUC_REGSET_SAVE_DEFAULT_VALUE, + I915_READ(RING_MODE_GEN7(engine)) | + GFX_INTERRUPT_STEERING | (0xFFFF<<16)); + + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base), + GUC_REGSET_ENGINERESET | + GUC_REGSET_SAVE_CURRENT_VALUE, 0); + + DRM_DEBUG_DRIVER("%s register save/restore count: %u\n", + engine->name, eng_reg->number_of_registers); + blob->reg_state.white_list[engine->guc_id].mmio_start = i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0)); @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc) * inconsistencies with the handling of FORCE_TO_NONPRIV * registers. */ - blob->reg_state.white_list[engine->guc_id].count = 0; + blob->reg_state.white_list[engine->guc_id].count = + workarounds->hw_whitelist_count[id]; - /* Nothing to be saved or restored for now. */ + for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) { + blob->reg_state.white_list[engine->guc_id].offsets[i] = + I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i)); + } } /*