Message ID | 20240523235853.171796-1-andi.shyti@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] drm/i915: Increase FLR timeout from 3s to 9s | expand |
On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote: > Following the guidelines it takes 3 seconds to perform an FLR > reset. Let's give it a bit more slack because this time can > change depending on the platform and on the firmware But did we see any issue with that? if that changes per platform and per firmware, shouldn't it all be explicit in the spec as well? > > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> > --- > Hi, > > In this second version I removed patch 2 that was ignoring the > FLR reset timeouts, until we develop a proper patch. > > This first patch is basically the same as v1. Thanks Nirmoy for > your review. > > Andi > > drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 729409a4bada..2eba289d88ad 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -2614,11 +2614,18 @@ void intel_uncore_prune_engine_fw_domains(struct intel_uncore *uncore, > static void driver_initiated_flr(struct intel_uncore *uncore) > { > struct drm_i915_private *i915 = uncore->i915; > - const unsigned int flr_timeout_ms = 3000; /* specs recommend a 3s wait */ > + unsigned int flr_timeout_ms; > int ret; > > drm_dbg(&i915->drm, "Triggering Driver-FLR\n"); > > + /* > + * The specification recommends a 3 seconds FLR reset timeout. To be > + * cautious, we will extend this to 9 seconds, three times the specified > + * timeout. > + */ > + flr_timeout_ms = 9000; > + > /* > * Make sure any pending FLR requests have cleared by waiting for the > * FLR trigger bit to go to zero. Also clear GU_DEBUG's DRIVERFLR_STATUS > -- > 2.45.1 >
On Fri, May 24, 2024 at 10:07:44AM -0400, Rodrigo Vivi wrote: > On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote: > > Following the guidelines it takes 3 seconds to perform an FLR > > reset. Let's give it a bit more slack because this time can > > change depending on the platform and on the firmware > > But did we see any issue with that? yes, we have some FLR expiration timeouts that apparently are not able to bring up the device and the memory is not accessible anymore. It's worth giving it a bit more time. > if that changes per platform and per firmware, shouldn't it all > be explicit in the spec as well? Is it always documented? We might anyway die after the FLR reset failure, so that I see it quite safe to wait and pray a little more. Andi
On Mon, May 27, 2024 at 11:47:49AM +0100, Andi Shyti wrote: > On Fri, May 24, 2024 at 10:07:44AM -0400, Rodrigo Vivi wrote: > > On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote: > > > Following the guidelines it takes 3 seconds to perform an FLR > > > reset. Let's give it a bit more slack because this time can > > > change depending on the platform and on the firmware > > > > But did we see any issue with that? > > yes, we have some FLR expiration timeouts that apparently are not > able to bring up the device and the memory is not accessible > anymore. It's worth giving it a bit more time. > > > if that changes per platform and per firmware, shouldn't it all > > be explicit in the spec as well? > > Is it always documented? We might anyway die after the FLR reset > failure, so that I see it quite safe to wait and pray a little > more. if needed I can improve the log with the dmesg error print. Andi
On Mon, May 27, 2024 at 04:00:25PM +0100, Andi Shyti wrote: > On Mon, May 27, 2024 at 11:47:49AM +0100, Andi Shyti wrote: > > On Fri, May 24, 2024 at 10:07:44AM -0400, Rodrigo Vivi wrote: > > > On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote: > > > > Following the guidelines it takes 3 seconds to perform an FLR > > > > reset. Let's give it a bit more slack because this time can > > > > change depending on the platform and on the firmware > > > > > > But did we see any issue with that? > > > > yes, we have some FLR expiration timeouts that apparently are not > > able to bring up the device and the memory is not accessible > > anymore. It's worth giving it a bit more time. > > > > > if that changes per platform and per firmware, shouldn't it all > > > be explicit in the spec as well? > > > > Is it always documented? We might anyway die after the FLR reset > > failure, so that I see it quite safe to wait and pray a little > > more. > > if needed I can improve the log with the dmesg error print. very good points indeed. I believe it would be worth some mentions about the faced issues and good idea about dmesg as well. but up to you. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > Andi
On 5/24/2024 1:58 AM, Andi Shyti wrote: > Following the guidelines it takes 3 seconds to perform an FLR > reset. Let's give it a bit more slack because this time can > change depending on the platform and on the firmware > > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> > --- > Hi, > > In this second version I removed patch 2 that was ignoring the > FLR reset timeouts, until we develop a proper patch. > > This first patch is basically the same as v1. Thanks Nirmoy for > your review. > > Andi > > drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 729409a4bada..2eba289d88ad 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -2614,11 +2614,18 @@ void intel_uncore_prune_engine_fw_domains(struct intel_uncore *uncore, > static void driver_initiated_flr(struct intel_uncore *uncore) > { > struct drm_i915_private *i915 = uncore->i915; > - const unsigned int flr_timeout_ms = 3000; /* specs recommend a 3s wait */ > + unsigned int flr_timeout_ms; > int ret; > > drm_dbg(&i915->drm, "Triggering Driver-FLR\n"); > > + /* > + * The specification recommends a 3 seconds FLR reset timeout. To be > + * cautious, we will extend this to 9 seconds, three times the specified > + * timeout. > + */ > + flr_timeout_ms = 9000; > + > /* > * Make sure any pending FLR requests have cleared by waiting for the > * FLR trigger bit to go to zero. Also clear GU_DEBUG's DRIVERFLR_STATUS
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 729409a4bada..2eba289d88ad 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2614,11 +2614,18 @@ void intel_uncore_prune_engine_fw_domains(struct intel_uncore *uncore, static void driver_initiated_flr(struct intel_uncore *uncore) { struct drm_i915_private *i915 = uncore->i915; - const unsigned int flr_timeout_ms = 3000; /* specs recommend a 3s wait */ + unsigned int flr_timeout_ms; int ret; drm_dbg(&i915->drm, "Triggering Driver-FLR\n"); + /* + * The specification recommends a 3 seconds FLR reset timeout. To be + * cautious, we will extend this to 9 seconds, three times the specified + * timeout. + */ + flr_timeout_ms = 9000; + /* * Make sure any pending FLR requests have cleared by waiting for the * FLR trigger bit to go to zero. Also clear GU_DEBUG's DRIVERFLR_STATUS
Following the guidelines it takes 3 seconds to perform an FLR reset. Let's give it a bit more slack because this time can change depending on the platform and on the firmware Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> --- Hi, In this second version I removed patch 2 that was ignoring the FLR reset timeouts, until we develop a proper patch. This first patch is basically the same as v1. Thanks Nirmoy for your review. Andi drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)