diff mbox series

drm/i915/ttm: Abort suspend on i915_ttm_backup failure

Message ID 20220829120409.24260-1-nirmoy.das@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/ttm: Abort suspend on i915_ttm_backup failure | expand

Commit Message

Nirmoy Das Aug. 29, 2022, 12:04 p.m. UTC
On system suspend when system memory is low then i915_gem_obj_copy_ttm()
could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
suspend shouldn't continue if i915_ttm_backup() throws an error.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6529
Suggested-by: Chris P Wilson <chris.p.wilson@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Matthew Auld Aug. 31, 2022, 3:50 p.m. UTC | #1
On 29/08/2022 13:04, Nirmoy Das wrote:
> On system suspend when system memory is low then i915_gem_obj_copy_ttm()
> could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
> suspend shouldn't continue if i915_ttm_backup() throws an error.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6529

Does this fix it? Does CI not complain about the drm_err? Also do we 
know what the actual error was?

> Suggested-by: Chris P Wilson <chris.p.wilson@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>

Passing the error along seems reasonable to me,
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
> index 9aad84059d56..6f5d5c0909b4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
> @@ -79,7 +79,12 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
>   		goto out_no_populate;
>   
>   	err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, false);
> -	GEM_WARN_ON(err);
> +	if (err) {
> +		drm_err(&i915->drm,
> +			"Unable to copy from device to system memory, err:%d\n",
> +			err);
> +		goto out_no_populate;
> +	}
>   	ttm_bo_wait_ctx(backup_bo, &ctx);
>   
>   	obj->ttm.backup = backup;
Nirmoy Das Aug. 31, 2022, 4:09 p.m. UTC | #2
On 8/31/2022 5:50 PM, Matthew Auld wrote:
> On 29/08/2022 13:04, Nirmoy Das wrote:
>> On system suspend when system memory is low then i915_gem_obj_copy_ttm()
>> could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
>> suspend shouldn't continue if i915_ttm_backup() throws an error.
>>
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6529
>
> Does this fix it? Does CI not complain about the drm_err? Also do we 
> know what the actual error was?


The error isn't reoccurring so the best guess is: large framebuffer copy 
took long time and  wait_for_suspend()

timed out. This needs more coverage from IGT and I am looking into 
that.  Let's ignore the "Closes" tag from this

patch till I come up a IGT test for this.


Nirmoy


>
>> Suggested-by: Chris P Wilson <chris.p.wilson@intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>
> Passing the error along seems reasonable to me,
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
>> index 9aad84059d56..6f5d5c0909b4 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
>> @@ -79,7 +79,12 @@ static int i915_ttm_backup(struct 
>> i915_gem_apply_to_region *apply,
>>           goto out_no_populate;
>>         err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, 
>> false);
>> -    GEM_WARN_ON(err);
>> +    if (err) {
>> +        drm_err(&i915->drm,
>> +            "Unable to copy from device to system memory, err:%d\n",
>> +            err);
>> +        goto out_no_populate;
>> +    }
>>       ttm_bo_wait_ctx(backup_bo, &ctx);
>>         obj->ttm.backup = backup;
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 9aad84059d56..6f5d5c0909b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -79,7 +79,12 @@  static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
 		goto out_no_populate;
 
 	err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, false);
-	GEM_WARN_ON(err);
+	if (err) {
+		drm_err(&i915->drm,
+			"Unable to copy from device to system memory, err:%d\n",
+			err);
+		goto out_no_populate;
+	}
 	ttm_bo_wait_ctx(backup_bo, &ctx);
 
 	obj->ttm.backup = backup;