diff mbox series

[v2] drm/i915: Increase FLR timeout from 3s to 9s

Message ID 20240523235853.171796-1-andi.shyti@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/i915: Increase FLR timeout from 3s to 9s | expand

Commit Message

Andi Shyti May 23, 2024, 11:58 p.m. UTC
Following the guidelines it takes 3 seconds to perform an FLR
reset. Let's give it a bit more slack because this time can
change depending on the platform and on the firmware

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
Hi,

In this second version I removed patch 2 that was ignoring the
FLR reset timeouts, until we develop a proper patch.

This first patch is basically the same as v1. Thanks Nirmoy for
your review.

Andi

 drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Rodrigo Vivi May 24, 2024, 2:07 p.m. UTC | #1
On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote:
> Following the guidelines it takes 3 seconds to perform an FLR
> reset. Let's give it a bit more slack because this time can
> change depending on the platform and on the firmware

But did we see any issue with that?

if that changes per platform and per firmware, shouldn't it all
be explicit in the spec as well?

> 
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
> Hi,
> 
> In this second version I removed patch 2 that was ignoring the
> FLR reset timeouts, until we develop a proper patch.
> 
> This first patch is basically the same as v1. Thanks Nirmoy for
> your review.
> 
> Andi
> 
>  drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 729409a4bada..2eba289d88ad 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2614,11 +2614,18 @@ void intel_uncore_prune_engine_fw_domains(struct intel_uncore *uncore,
>  static void driver_initiated_flr(struct intel_uncore *uncore)
>  {
>  	struct drm_i915_private *i915 = uncore->i915;
> -	const unsigned int flr_timeout_ms = 3000; /* specs recommend a 3s wait */
> +	unsigned int flr_timeout_ms;
>  	int ret;
>  
>  	drm_dbg(&i915->drm, "Triggering Driver-FLR\n");
>  
> +	/*
> +	 * The specification recommends a 3 seconds FLR reset timeout. To be
> +	 * cautious, we will extend this to 9 seconds, three times the specified
> +	 * timeout.
> +	 */
> +	flr_timeout_ms = 9000;
> +
>  	/*
>  	 * Make sure any pending FLR requests have cleared by waiting for the
>  	 * FLR trigger bit to go to zero. Also clear GU_DEBUG's DRIVERFLR_STATUS
> -- 
> 2.45.1
>
Andi Shyti May 27, 2024, 10:47 a.m. UTC | #2
On Fri, May 24, 2024 at 10:07:44AM -0400, Rodrigo Vivi wrote:
> On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote:
> > Following the guidelines it takes 3 seconds to perform an FLR
> > reset. Let's give it a bit more slack because this time can
> > change depending on the platform and on the firmware
> 
> But did we see any issue with that?

yes, we have some FLR expiration timeouts that apparently are not
able to bring up the device and the memory is not accessible
anymore. It's worth giving it a bit more time.

> if that changes per platform and per firmware, shouldn't it all
> be explicit in the spec as well?

Is it always documented? We might anyway die after the FLR reset
failure, so that I see it quite safe to wait and pray a little
more.

Andi
Andi Shyti May 27, 2024, 3 p.m. UTC | #3
On Mon, May 27, 2024 at 11:47:49AM +0100, Andi Shyti wrote:
> On Fri, May 24, 2024 at 10:07:44AM -0400, Rodrigo Vivi wrote:
> > On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote:
> > > Following the guidelines it takes 3 seconds to perform an FLR
> > > reset. Let's give it a bit more slack because this time can
> > > change depending on the platform and on the firmware
> > 
> > But did we see any issue with that?
> 
> yes, we have some FLR expiration timeouts that apparently are not
> able to bring up the device and the memory is not accessible
> anymore. It's worth giving it a bit more time.
> 
> > if that changes per platform and per firmware, shouldn't it all
> > be explicit in the spec as well?
> 
> Is it always documented? We might anyway die after the FLR reset
> failure, so that I see it quite safe to wait and pray a little
> more.

if needed I can improve the log with the dmesg error print.

Andi
Rodrigo Vivi May 28, 2024, 4:36 p.m. UTC | #4
On Mon, May 27, 2024 at 04:00:25PM +0100, Andi Shyti wrote:
> On Mon, May 27, 2024 at 11:47:49AM +0100, Andi Shyti wrote:
> > On Fri, May 24, 2024 at 10:07:44AM -0400, Rodrigo Vivi wrote:
> > > On Fri, May 24, 2024 at 01:58:53AM +0200, Andi Shyti wrote:
> > > > Following the guidelines it takes 3 seconds to perform an FLR
> > > > reset. Let's give it a bit more slack because this time can
> > > > change depending on the platform and on the firmware
> > > 
> > > But did we see any issue with that?
> > 
> > yes, we have some FLR expiration timeouts that apparently are not
> > able to bring up the device and the memory is not accessible
> > anymore. It's worth giving it a bit more time.
> > 
> > > if that changes per platform and per firmware, shouldn't it all
> > > be explicit in the spec as well?
> > 
> > Is it always documented? We might anyway die after the FLR reset
> > failure, so that I see it quite safe to wait and pray a little
> > more.
> 
> if needed I can improve the log with the dmesg error print.

very good points indeed. I believe it would be worth some mentions
about the faced issues and good idea about dmesg as well. but up to you.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> 
> Andi
Nirmoy Das May 29, 2024, 8:42 a.m. UTC | #5
On 5/24/2024 1:58 AM, Andi Shyti wrote:
> Following the guidelines it takes 3 seconds to perform an FLR
> reset. Let's give it a bit more slack because this time can
> change depending on the platform and on the firmware
>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
> Hi,
>
> In this second version I removed patch 2 that was ignoring the
> FLR reset timeouts, until we develop a proper patch.
>
> This first patch is basically the same as v1. Thanks Nirmoy for
> your review.
>
> Andi
>
>   drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 729409a4bada..2eba289d88ad 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2614,11 +2614,18 @@ void intel_uncore_prune_engine_fw_domains(struct intel_uncore *uncore,
>   static void driver_initiated_flr(struct intel_uncore *uncore)
>   {
>   	struct drm_i915_private *i915 = uncore->i915;
> -	const unsigned int flr_timeout_ms = 3000; /* specs recommend a 3s wait */
> +	unsigned int flr_timeout_ms;
>   	int ret;
>   
>   	drm_dbg(&i915->drm, "Triggering Driver-FLR\n");
>   
> +	/*
> +	 * The specification recommends a 3 seconds FLR reset timeout. To be
> +	 * cautious, we will extend this to 9 seconds, three times the specified
> +	 * timeout.
> +	 */
> +	flr_timeout_ms = 9000;
> +
>   	/*
>   	 * Make sure any pending FLR requests have cleared by waiting for the
>   	 * FLR trigger bit to go to zero. Also clear GU_DEBUG's DRIVERFLR_STATUS
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 729409a4bada..2eba289d88ad 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2614,11 +2614,18 @@  void intel_uncore_prune_engine_fw_domains(struct intel_uncore *uncore,
 static void driver_initiated_flr(struct intel_uncore *uncore)
 {
 	struct drm_i915_private *i915 = uncore->i915;
-	const unsigned int flr_timeout_ms = 3000; /* specs recommend a 3s wait */
+	unsigned int flr_timeout_ms;
 	int ret;
 
 	drm_dbg(&i915->drm, "Triggering Driver-FLR\n");
 
+	/*
+	 * The specification recommends a 3 seconds FLR reset timeout. To be
+	 * cautious, we will extend this to 9 seconds, three times the specified
+	 * timeout.
+	 */
+	flr_timeout_ms = 9000;
+
 	/*
 	 * Make sure any pending FLR requests have cleared by waiting for the
 	 * FLR trigger bit to go to zero. Also clear GU_DEBUG's DRIVERFLR_STATUS