Message ID | aqoql4ri3vpe4larpkz4p6hxy76agq6pmn6gunt5xv56hxdbye@72ilwk7rpiu5 (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] i915/gt: Reapply workarounds in case the previous attempt failed. | expand |
On Thu, Dec 05, 2024 at 03:47:35PM +0000, Sebastian Brzezinka wrote: > `wa_verify`sporadically detects lost workaround on application; this > is unusual behavior since wa are applied at `intel_gt_init_hw` and > verified right away by `intel_gt_verify_workarounds`, and `wa_verify` > doesn't fail on initialization as one might suspect would happen. > > One approach that may be somewhat beneficial is to reapply workarounds > in the event of failure, or even get rid of verify on application, > since it's redundant to `intel_gt_verify_workarounds`. > > This patch aims to resolve: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 It should be: Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 > > Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c > index 570c91878189..4ee623448223 100644 > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > @@ -1761,6 +1761,17 @@ static void wa_list_apply(const struct i915_wa_list *wal) > intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : > intel_uncore_read_fw(uncore, wa->reg); > > + if ((val ^ wa->set) & wa->read) { > + if (wa->is_mcr) > + intel_gt_mcr_multicast_write_fw(gt, wa->mcr_reg, val); > + else > + intel_uncore_write_fw(uncore, wa->reg, val); > + } instead of duplicating the code you should extract that to an aux function to write it... > + > + val = wa->is_mcr ? > + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : > + intel_uncore_read_fw(uncore, wa->reg); and another one to read it... > + > wa_verify(gt, wa, val, wal->name, "application"); However my biggest concern with this patch is the brute force solution and only on CONFIG_DRM_I915_DEBUG_GEM case... and as duplication because I see that the second write attempt is already happening above if (val != old || !wa->clr) So, something is not quite right in here and this deserves another alternative.. > } > } > -- > 2.34.1 >
Hi Rodrigo, On Fri, Dec 06, 2024 at 10:38:24AM -0500, Rodrigo Vivi wrote: > On Thu, Dec 05, 2024 at 03:47:35PM +0000, Sebastian Brzezinka wrote: > > `wa_verify`sporadically detects lost workaround on application; this > > is unusual behavior since wa are applied at `intel_gt_init_hw` and > > verified right away by `intel_gt_verify_workarounds`, and `wa_verify` > > doesn't fail on initialization as one might suspect would happen. > > > > One approach that may be somewhat beneficial is to reapply workarounds > > in the event of failure, or even get rid of verify on application, > > since it's redundant to `intel_gt_verify_workarounds`. > > > > This patch aims to resolve: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 > > It should be: > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 aapart from the formatting issues this was suggested by me. We have observed some sporadic vailures in applying the specific workaround added by Ville (now cc'ed to the thread) in commit 0ddae025ab6c ("drm/i915: Disable compression tricks on JSL"). Because it's sporadic, we could give it one more chance and try to re-apply it. > > > > Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > index 570c91878189..4ee623448223 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > @@ -1761,6 +1761,17 @@ static void wa_list_apply(const struct i915_wa_list *wal) > > intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : > > intel_uncore_read_fw(uncore, wa->reg); > > > > + if ((val ^ wa->set) & wa->read) { > > + if (wa->is_mcr) > > + intel_gt_mcr_multicast_write_fw(gt, wa->mcr_reg, val); > > + else > > + intel_uncore_write_fw(uncore, wa->reg, val); > > + } > > instead of duplicating the code you should extract that to an aux function > to write it... a for loop can decrease the amount of duplicated code. > > + > > + val = wa->is_mcr ? > > + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : > > + intel_uncore_read_fw(uncore, wa->reg); > > and another one to read it... this, indeed it's just reading, but we are trying to re-write. If we wrote the unwanted value, we will keep reading the unwanted value. > > + > > wa_verify(gt, wa, val, wal->name, "application"); > > However my biggest concern with this patch is the brute force solution > and only on CONFIG_DRM_I915_DEBUG_GEM case... this is a good point, indeed, I don't understand why the confirmation should be within the DEBUG section. Andi > and as duplication because I see that the second write attempt is > already happening above if (val != old || !wa->clr) > > So, something is not quite right in here and this deserves another alternative.. > > > > } > > } > > -- > > 2.34.1 > >
On Thu, Dec 12, 2024 at 03:51:12PM +0100, Andi Shyti wrote: > Hi Rodrigo, > > On Fri, Dec 06, 2024 at 10:38:24AM -0500, Rodrigo Vivi wrote: > > On Thu, Dec 05, 2024 at 03:47:35PM +0000, Sebastian Brzezinka wrote: > > > `wa_verify`sporadically detects lost workaround on application; this > > > is unusual behavior since wa are applied at `intel_gt_init_hw` and > > > verified right away by `intel_gt_verify_workarounds`, and `wa_verify` > > > doesn't fail on initialization as one might suspect would happen. > > > > > > One approach that may be somewhat beneficial is to reapply workarounds > > > in the event of failure, or even get rid of verify on application, > > > since it's redundant to `intel_gt_verify_workarounds`. > > > > > > This patch aims to resolve: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 > > > > It should be: > > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 > > aapart from the formatting issues this was suggested by me. We > have observed some sporadic vailures in applying the specific > workaround added by Ville (now cc'ed to the thread) in commit > 0ddae025ab6c ("drm/i915: Disable compression tricks on JSL"). > > Because it's sporadic, we could give it one more chance and try > to re-apply it. That sounds like it's just papering over the issue without really figuring out what's truly going on. Looking at the current implementation, it looks like at least one possible problem is that it was implemented in rcs_engine_wa_init, but the CACHE_MODE_0 register itself is part of the LRC (according to bspec 18907). So we want to move it to icl_ctx_workarounds_init() instead to make sure it gets recorded in the golden context image. Our initialization and reset handling for workarounds touching registers in the context are different from those that aren't. BTW, I'm a bit surprised to see us needing to implement this workaround in the kernel at all. CACHE_MODE_0 is a register that's under userspace control (according to bspec 14181), so we usually expect the userspace drivers to own implementing any workarounds dealing with the registers they control. Indeed, it looks like Mesa's Iris driver already has an implementation of this workaround in iris_state.c: if (devinfo->disable_ccs_repack) { iris_emit_reg(batch, GENX(CACHE_MODE_0), reg) { reg.DisableRepackingforCompression = true; reg.DisableRepackingforCompressionMask = true; } } and that workaround was added back in mid-2019 so it should be in all recent Mesa releases. Matt > > > > > > > Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> > > > --- > > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > index 570c91878189..4ee623448223 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > @@ -1761,6 +1761,17 @@ static void wa_list_apply(const struct i915_wa_list *wal) > > > intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : > > > intel_uncore_read_fw(uncore, wa->reg); > > > > > > + if ((val ^ wa->set) & wa->read) { > > > + if (wa->is_mcr) > > > + intel_gt_mcr_multicast_write_fw(gt, wa->mcr_reg, val); > > > + else > > > + intel_uncore_write_fw(uncore, wa->reg, val); > > > + } > > > > instead of duplicating the code you should extract that to an aux function > > to write it... > > a for loop can decrease the amount of duplicated code. > > > > + > > > + val = wa->is_mcr ? > > > + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : > > > + intel_uncore_read_fw(uncore, wa->reg); > > > > and another one to read it... > > this, indeed it's just reading, but we are trying to re-write. If > we wrote the unwanted value, we will keep reading the unwanted > value. > > > > + > > > wa_verify(gt, wa, val, wal->name, "application"); > > > > However my biggest concern with this patch is the brute force solution > > and only on CONFIG_DRM_I915_DEBUG_GEM case... > > this is a good point, indeed, I don't understand why the > confirmation should be within the DEBUG section. > > Andi > > > and as duplication because I see that the second write attempt is > > already happening above if (val != old || !wa->clr) > > > > So, something is not quite right in here and this deserves another alternative.. > > > > > > > } > > > } > > > -- > > > 2.34.1 > > >
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 570c91878189..4ee623448223 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -1761,6 +1761,17 @@ static void wa_list_apply(const struct i915_wa_list *wal) intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : intel_uncore_read_fw(uncore, wa->reg); + if ((val ^ wa->set) & wa->read) { + if (wa->is_mcr) + intel_gt_mcr_multicast_write_fw(gt, wa->mcr_reg, val); + else + intel_uncore_write_fw(uncore, wa->reg, val); + } + + val = wa->is_mcr ? + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : + intel_uncore_read_fw(uncore, wa->reg); + wa_verify(gt, wa, val, wal->name, "application"); } }
`wa_verify`sporadically detects lost workaround on application; this is unusual behavior since wa are applied at `intel_gt_init_hw` and verified right away by `intel_gt_verify_workarounds`, and `wa_verify` doesn't fail on initialization as one might suspect would happen. One approach that may be somewhat beneficial is to reapply workarounds in the event of failure, or even get rid of verify on application, since it's redundant to `intel_gt_verify_workarounds`. This patch aims to resolve: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++ 1 file changed, 11 insertions(+)