diff mbox series

drm/i915: Skip error capture when wedged on init

Message ID 20211109122037.171128-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Skip error capture when wedged on init | expand

Commit Message

Tvrtko Ursulin Nov. 9, 2021, 12:20 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Trying to capture uninitialised engines when we wedged on init ends in
tears. Skip that together with uC capture, since failure to initialise the
latter can actually be one of the reasons for wedging on init.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Matthew Auld Nov. 10, 2021, 10:48 a.m. UTC | #1
On Tue, 9 Nov 2021 at 12:20, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Trying to capture uninitialised engines when we wedged on init ends in
> tears. Skip that together with uC capture, since failure to initialise the
> latter can actually be one of the reasons for wedging on init.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This fixes the issue with missing GuC wedging the GPU and then blowing
up when trying to use the driver?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2a2d7643b551..aa2b3aad9643 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
>                 }
>
>                 gt_record_info(error->gt);
> -               gt_record_engines(error->gt, engine_mask, compress);
>
> -               if (INTEL_INFO(i915)->has_gt_uc)
> -                       error->gt->uc = gt_record_uc(error->gt, compress);
> +               if (!intel_gt_has_unrecoverable_error(gt)) {
> +                       gt_record_engines(error->gt, engine_mask, compress);
> +
> +                       if (INTEL_INFO(i915)->has_gt_uc)
> +                               error->gt->uc = gt_record_uc(error->gt,
> +                                                            compress);
> +               }
>
>                 i915_vma_capture_finish(error->gt, compress);
>
> --
> 2.30.2
>
Tvrtko Ursulin Nov. 10, 2021, 11:34 a.m. UTC | #2
On 10/11/2021 10:48, Matthew Auld wrote:
> On Tue, 9 Nov 2021 at 12:20, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Trying to capture uninitialised engines when we wedged on init ends in
>> tears. Skip that together with uC capture, since failure to initialise the
>> latter can actually be one of the reasons for wedging on init.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> This fixes the issue with missing GuC wedging the GPU and then blowing
> up when trying to use the driver?

Probably does not blow up when using the driver, but definitely does 
when accessing error state. Someone suggested it would instead be better 
to call i915_disable_error_state from wedge on init/fini, and I think 
indeed it would, so I plan to send v2 looking like that.

Regards,

Tvrtko

> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> 
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 2a2d7643b551..aa2b3aad9643 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
>>                  }
>>
>>                  gt_record_info(error->gt);
>> -               gt_record_engines(error->gt, engine_mask, compress);
>>
>> -               if (INTEL_INFO(i915)->has_gt_uc)
>> -                       error->gt->uc = gt_record_uc(error->gt, compress);
>> +               if (!intel_gt_has_unrecoverable_error(gt)) {
>> +                       gt_record_engines(error->gt, engine_mask, compress);
>> +
>> +                       if (INTEL_INFO(i915)->has_gt_uc)
>> +                               error->gt->uc = gt_record_uc(error->gt,
>> +                                                            compress);
>> +               }
>>
>>                  i915_vma_capture_finish(error->gt, compress);
>>
>> --
>> 2.30.2
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2a2d7643b551..aa2b3aad9643 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1866,10 +1866,14 @@  i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 		}
 
 		gt_record_info(error->gt);
-		gt_record_engines(error->gt, engine_mask, compress);
 
-		if (INTEL_INFO(i915)->has_gt_uc)
-			error->gt->uc = gt_record_uc(error->gt, compress);
+		if (!intel_gt_has_unrecoverable_error(gt)) {
+			gt_record_engines(error->gt, engine_mask, compress);
+
+			if (INTEL_INFO(i915)->has_gt_uc)
+				error->gt->uc = gt_record_uc(error->gt,
+							     compress);
+		}
 
 		i915_vma_capture_finish(error->gt, compress);