Message ID | 20241205115736.1420991-1-nitin.r.gote@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1,1/1] drm/i915/gt: Increase a time to retry RING_HEAD reset | expand |
Hi Nitin, On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote: > Issue is seen again where engine resets fails because the engine resumes > from an incorrect RING_HEAD. So, increase a time if at first the > write doesn't succeed and retry again. > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get sticks") Is this a real fix or is it more of a fine tuning? > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806 > Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> ... > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine) > set_pp_dir(engine); > > /* First wake the ring up to an empty/idle ring */ > - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); > + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC); Where is this 50 coming from? Thanks, Andi > ktime_before(ktime_get(), (kt)); cpu_relax()) { > /* > * In case of resets fails because engine resumes from > -- > 2.25.1
Hi Andi, > -----Original Message----- > From: Andi Shyti <andi.shyti@linux.intel.com> > Sent: Thursday, December 5, 2024 6:35 PM > To: Gote, Nitin R <nitin.r.gote@intel.com> > Cc: intel-gfx@lists.freedesktop.org; Wilson, Chris P <chris.p.wilson@intel.com> > Subject: Re: [PATCH v1 1/1] drm/i915/gt: Increase a time to retry RING_HEAD > reset > > Hi Nitin, > > On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote: > > Issue is seen again where engine resets fails because the engine > > resumes from an incorrect RING_HEAD. So, increase a time if at first > > the write doesn't succeed and retry again. > > > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get > > sticks") > > Is this a real fix or is it more of a fine tuning? Here we can say this for more fine tuning as issue seen again and that's why I have added fixes : 6ef0e3ef2662. > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806 > > Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> > > ... > > > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine) > > set_pp_dir(engine); > > > > /* First wake the ring up to an empty/idle ring */ > > - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); > > + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC); > > Where is this 50 coming from? Well, here HEAD is still not 0 even after writing in it. So, it could be the timing issue. I discussed this with Chris and we thought It is better to add 50ms instead of 2ms delay here to let HEAD write complete. I tested this on trybot for Haswell/Ivybridge platform https://patchwork.freedesktop.org/series/141779/ and I see BAT is successful and shards issues are not related. > > Thanks, > Andi > > > ktime_before(ktime_get(), (kt)); cpu_relax()) { > > /* > > * In case of resets fails because engine resumes from > > -- > > 2.25.1
Hi Nitin, On Thu, Dec 05, 2024 at 04:03:38PM +0000, Gote, Nitin R wrote: > > On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote: > > > Issue is seen again where engine resets fails because the engine > > > resumes from an incorrect RING_HEAD. So, increase a time if at first > > > the write doesn't succeed and retry again. > > > > > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get > > > sticks") > > > > Is this a real fix or is it more of a fine tuning? > > Here we can say this for more fine tuning as issue seen again and > that's why I have added fixes : 6ef0e3ef2662. I thinkt he Fixes: tag is not necessary, here. You are not really fixing a bug. > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806 > > > Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> > > > > ... > > > > > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine) > > > set_pp_dir(engine); > > > > > > /* First wake the ring up to an empty/idle ring */ > > > - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); > > > + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC); > > > > Where is this 50 coming from? > > Well, here HEAD is still not 0 even after writing in it. > So, it could be the timing issue. I discussed this with Chris and we thought > It is better to add 50ms instead of 2ms delay here to let HEAD write complete. > I tested this on trybot for Haswell/Ivybridge platform https://patchwork.freedesktop.org/series/141779/ and > I see BAT is successful and shards issues are not related. Can you please add a comment explaining why you are using 50ms? You can also mention that you experimented with different values and determined that 50ms works best based on testing. Andi
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 32f3b52a183a..bc9f28524713 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine) set_pp_dir(engine); /* First wake the ring up to an empty/idle ring */ - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC); ktime_before(ktime_get(), (kt)); cpu_relax()) { /* * In case of resets fails because engine resumes from
Issue is seen again where engine resets fails because the engine resumes from an incorrect RING_HEAD. So, increase a time if at first the write doesn't succeed and retry again. Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get sticks") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806 Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> --- drivers/gpu/drm/i915/gt/intel_ring_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)