diff mbox series

drm/i915/gt: Bump the reset-failure timeout to 60s

Message ID 20220916204823.1897089-1-ashutosh.dixit@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Bump the reset-failure timeout to 60s | expand

Commit Message

Dixit, Ashutosh Sept. 16, 2022, 8:48 p.m. UTC
From: Chris Wilson <chris@chris-wilson.co.uk>

If attempting to perform a GT reset takes long than 5 seconds (including
resetting the display for gen3/4), then we declare all hope lost and
discard all user work and wedge the device to prevent further
misbehaviour. 5 seconds is too short a time for such drastic action, as
we may be stuck on other timeouts and watchdogs. If we allow a little
bit longer before hitting the big red button, we should at the very
least capture other hung task indicators pointing towards the reason why
the reset was hanging; and allow more marginal cases the extra headroom
to complete the reset without further collateral damage.

Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/6448
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matt Roper Sept. 19, 2022, 11:53 p.m. UTC | #1
On Fri, Sep 16, 2022 at 01:48:23PM -0700, Ashutosh Dixit wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> If attempting to perform a GT reset takes long than 5 seconds (including
> resetting the display for gen3/4), then we declare all hope lost and
> discard all user work and wedge the device to prevent further
> misbehaviour. 5 seconds is too short a time for such drastic action, as
> we may be stuck on other timeouts and watchdogs. If we allow a little
> bit longer before hitting the big red button, we should at the very
> least capture other hung task indicators pointing towards the reason why
> the reset was hanging; and allow more marginal cases the extra headroom
> to complete the reset without further collateral damage.
> 
> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/6448
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Seems reasonable.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index b36674356986..3159df6cdd49 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1278,7 +1278,7 @@ static void intel_gt_reset_global(struct intel_gt *gt,
>  	kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
>  
>  	/* Use a watchdog to ensure that our reset completes */
> -	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
> +	intel_wedge_on_timeout(&w, gt, 60 * HZ) {
>  		intel_display_prepare_reset(gt->i915);
>  
>  		intel_gt_reset(gt, engine_mask, reason);
> -- 
> 2.34.1
>
Rodrigo Vivi Sept. 27, 2022, 8:44 p.m. UTC | #2
On Fri, Sep 16, 2022 at 01:48:23PM -0700, Ashutosh Dixit wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> If attempting to perform a GT reset takes long than 5 seconds (including
> resetting the display for gen3/4), then we declare all hope lost and
> discard all user work and wedge the device to prevent further
> misbehaviour. 5 seconds is too short a time for such drastic action, as
> we may be stuck on other timeouts and watchdogs. If we allow a little
> bit longer before hitting the big red button, we should at the very
> least capture other hung task indicators pointing towards the reason why
> the reset was hanging; and allow more marginal cases the extra headroom
> to complete the reset without further collateral damage.
> 
> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/6448
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

When handling someone's else patch, please add your signed-off-by here
as well.

> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index b36674356986..3159df6cdd49 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1278,7 +1278,7 @@ static void intel_gt_reset_global(struct intel_gt *gt,
>  	kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
>  
>  	/* Use a watchdog to ensure that our reset completes */
> -	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
> +	intel_wedge_on_timeout(&w, gt, 60 * HZ) {
>  		intel_display_prepare_reset(gt->i915);
>  
>  		intel_gt_reset(gt, engine_mask, reason);
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index b36674356986..3159df6cdd49 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1278,7 +1278,7 @@  static void intel_gt_reset_global(struct intel_gt *gt,
 	kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
 
 	/* Use a watchdog to ensure that our reset completes */
-	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
+	intel_wedge_on_timeout(&w, gt, 60 * HZ) {
 		intel_display_prepare_reset(gt->i915);
 
 		intel_gt_reset(gt, engine_mask, reason);