diff mbox series

drm/i915/guc: Change GEM_WARN_ON to guc_err to prevent taints in CI

Message ID 20240808204943.911727-1-jesus.narvaez@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: Change GEM_WARN_ON to guc_err to prevent taints in CI | expand

Commit Message

Narvaez, Jesus Aug. 8, 2024, 8:49 p.m. UTC
This warning was supposed to catch a harmless issue, but changing to
guc_error should prevent kernel taints in CI runs.

Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Cavitt, Jonathan Aug. 9, 2024, 9:39 p.m. UTC | #1
-----Original Message-----
From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Jesus Narvaez
Sent: Thursday, August 8, 2024 1:50 PM
To: intel-gfx@lists.freedesktop.org
Cc: Narvaez, Jesus <jesus.narvaez@intel.com>
Subject: [PATCH] drm/i915/guc: Change GEM_WARN_ON to guc_err to prevent taints in CI
> 
> This warning was supposed to catch a harmless issue, but changing to
> guc_error should prevent kernel taints in CI runs.

I'll defer to your judgement on this, but IMO if we just want to log when this
happens and guarantee we don't throw kernel taints in CI, we might want to
consider using guc_info instead.  Not blocking on it, though.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
-Jonathan Cavitt

> 
> Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9400d0eb682b..c3a5d9e1288e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2014,11 +2014,12 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
>  
>  	/*
>  	 * Technically possible for either of these values to be non-zero here,
> -	 * but very unlikely + harmless. Regardless let's add a warn so we can
> +	 * but very unlikely + harmless. Regardless let's add an error so we can
>  	 * see in CI if this happens frequently / a precursor to taking down the
>  	 * machine.
>  	 */
> -	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> +	if (atomic_read(&guc->outstanding_submission_g2h))
> +		guc_err(guc, "Unexpected outstanding GuC to Host in reset finish\n");
>  	atomic_set(&guc->outstanding_submission_g2h, 0);
>  
>  	intel_guc_global_policies_update(guc);
> -- 
> 2.34.1
> 
>
Narvaez, Jesus Aug. 15, 2024, 5:09 p.m. UTC | #2
On 8/9/2024 2:39 PM, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Jesus Narvaez
> Sent: Thursday, August 8, 2024 1:50 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Narvaez, Jesus <jesus.narvaez@intel.com>
> Subject: [PATCH] drm/i915/guc: Change GEM_WARN_ON to guc_err to prevent taints in CI
>> This warning was supposed to catch a harmless issue, but changing to
>> guc_error should prevent kernel taints in CI runs.
> I'll defer to your judgement on this, but IMO if we just want to log when this
> happens and guarantee we don't throw kernel taints in CI, we might want to
> consider using guc_info instead.  Not blocking on it, though.
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> -Jonathan Cavitt

Hi Jonathan,

I think using guc_err would be preferred because we still want to catch 
the error when it happens, along with preventing kernel taints in CI.

- Jesus Narvaez

>
>> Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 9400d0eb682b..c3a5d9e1288e 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -2014,11 +2014,12 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
>>   
>>   	/*
>>   	 * Technically possible for either of these values to be non-zero here,
>> -	 * but very unlikely + harmless. Regardless let's add a warn so we can
>> +	 * but very unlikely + harmless. Regardless let's add an error so we can
>>   	 * see in CI if this happens frequently / a precursor to taking down the
>>   	 * machine.
>>   	 */
>> -	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
>> +	if (atomic_read(&guc->outstanding_submission_g2h))
>> +		guc_err(guc, "Unexpected outstanding GuC to Host in reset finish\n");
>>   	atomic_set(&guc->outstanding_submission_g2h, 0);
>>   
>>   	intel_guc_global_policies_update(guc);
>> -- 
>> 2.34.1
>>
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9400d0eb682b..c3a5d9e1288e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2014,11 +2014,12 @@  void intel_guc_submission_reset_finish(struct intel_guc *guc)
 
 	/*
 	 * Technically possible for either of these values to be non-zero here,
-	 * but very unlikely + harmless. Regardless let's add a warn so we can
+	 * but very unlikely + harmless. Regardless let's add an error so we can
 	 * see in CI if this happens frequently / a precursor to taking down the
 	 * machine.
 	 */
-	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
+	if (atomic_read(&guc->outstanding_submission_g2h))
+		guc_err(guc, "Unexpected outstanding GuC to Host in reset finish\n");
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
 	intel_guc_global_policies_update(guc);