diff mbox series

[RFC,1/5] drm/amdgpu: Fix migration rate limiting accounting

Message ID 20240508180946.96863-2-tursulin@igalia.com (mailing list archive)
State New, archived
Headers show
Series Discussion around eviction improvements | expand

Commit Message

Tvrtko Ursulin May 8, 2024, 6:09 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

The logic assumed any migration attempt worked and therefore would over-
account the amount of data migrated during buffer re-validation. As a
consequence client can be unfairly penalised by incorrectly considering
its migration budget spent.

Fix it by looking at the before and after buffer object backing store and
only account if there was a change.

FIXME:
I think this needs a better solution to account for migrations between
VRAM visible and non-visible portions.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Comments

Friedrich Vock May 8, 2024, 7:08 p.m. UTC | #1
On 08.05.24 20:09, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> The logic assumed any migration attempt worked and therefore would over-
> account the amount of data migrated during buffer re-validation. As a
> consequence client can be unfairly penalised by incorrectly considering
> its migration budget spent.

If the migration failed but data was still moved (which I think could be
the case when we try evicting everything but it still doesn't work?),
shouldn't the eviction movements count towards the ratelimit too?

>
> Fix it by looking at the before and after buffer object backing store and
> only account if there was a change.
>
> FIXME:
> I think this needs a better solution to account for migrations between
> VRAM visible and non-visible portions.

FWIW, I have some WIP patches (not posted on any MLs yet though) that
attempt to solve this issue (+actually enforcing ratelimits) by moving
the ratelimit accounting/enforcement to TTM entirely.

By moving the accounting to TTM we can count moved bytes when we move
them, and don't have to rely on comparing resources to determine whether
moving actually happened. This should address your FIXME as well.

Regards,
Friedrich

> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
>   1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index ec888fc6ead8..22708954ae68 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   		.no_wait_gpu = false,
>   		.resv = bo->tbo.base.resv
>   	};
> +	struct ttm_resource *old_res;
>   	uint32_t domain;
>   	int r;
>
>   	if (bo->tbo.pin_count)
>   		return 0;
>
> +	old_res = bo->tbo.resource;
> +
>   	/* Don't move this buffer if we have depleted our allowance
>   	 * to move it. Don't move anything if the threshold is zero.
>   	 */
> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   	amdgpu_bo_placement_from_domain(bo, domain);
>   	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>
> -	p->bytes_moved += ctx.bytes_moved;
> -	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> -	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> -		p->bytes_moved_vis += ctx.bytes_moved;
> -
>   	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>   		domain = bo->allowed_domains;
>   		goto retry;
>   	}
>
> +	if (!r) {
> +		struct ttm_resource *new_res = bo->tbo.resource;
> +		bool moved = true;
> +
> +		if (old_res == new_res)
> +			moved = false;
> +		else if (old_res && new_res &&
> +			 old_res->mem_type == new_res->mem_type)
> +			moved = false;
> +
> +		if (moved) {
> +			p->bytes_moved += ctx.bytes_moved;
> +			if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> +			    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> +				p->bytes_moved_vis += ctx.bytes_moved;
> +		}
> +	}
> +
>   	return r;
>   }
>
Tvrtko Ursulin May 9, 2024, 9:19 a.m. UTC | #2
On 08/05/2024 20:08, Friedrich Vock wrote:
> On 08.05.24 20:09, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> The logic assumed any migration attempt worked and therefore would over-
>> account the amount of data migrated during buffer re-validation. As a
>> consequence client can be unfairly penalised by incorrectly considering
>> its migration budget spent.
> 
> If the migration failed but data was still moved (which I think could be
> the case when we try evicting everything but it still doesn't work?),
> shouldn't the eviction movements count towards the ratelimit too?

Possibly, which path would that be?

I mean there are definitely more migration which *should not* be counted 
which I think your mini-series approaches more accurately. What this 
patch achieves, in its current RFC form, is reduces the "false-positive" 
migration budget depletions.

So larger improvements aside, point of the series was to illustrate that 
even the things which were said to be working do not seem to. See cover 
letter to see what I thought does not work either well or at all.
>> Fix it by looking at the before and after buffer object backing store and
>> only account if there was a change.
>>
>> FIXME:
>> I think this needs a better solution to account for migrations between
>> VRAM visible and non-visible portions.
> 
> FWIW, I have some WIP patches (not posted on any MLs yet though) that
> attempt to solve this issue (+actually enforcing ratelimits) by moving
> the ratelimit accounting/enforcement to TTM entirely.
> 
> By moving the accounting to TTM we can count moved bytes when we move
> them, and don't have to rely on comparing resources to determine whether
> moving actually happened. This should address your FIXME as well.

Yep, I've seen them. They are not necessarily conflicting with this 
series, potentialy TTM placement flag aside. *If* something like this 
can be kept small and still manage to fix up a few simple things which 
do not appear to work at all at the moment.

For the larger re-work it is quite, well, large and it is not easy to be 
certain the end result would work as expected. IMO it would be best to 
sketch out a larger series which brings some practical and masurable 
change in behaviour before commiting to merge things piecemeal.

For instance I have a niggling feeling the runtime games driver plays 
with placements and domains are not great and wonder if things could be 
cleaner if simplified by letting TTM manage things more, more 
explicitly, and having the list of placements more static. Thinking 
about it seems a step too far for now though.

Regards,

Tvrtko

> 
> Regards,
> Friedrich
> 
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index ec888fc6ead8..22708954ae68 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, 
>> struct amdgpu_bo *bo)
>>           .no_wait_gpu = false,
>>           .resv = bo->tbo.base.resv
>>       };
>> +    struct ttm_resource *old_res;
>>       uint32_t domain;
>>       int r;
>>
>>       if (bo->tbo.pin_count)
>>           return 0;
>>
>> +    old_res = bo->tbo.resource;
>> +
>>       /* Don't move this buffer if we have depleted our allowance
>>        * to move it. Don't move anything if the threshold is zero.
>>        */
>> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, 
>> struct amdgpu_bo *bo)
>>       amdgpu_bo_placement_from_domain(bo, domain);
>>       r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>>
>> -    p->bytes_moved += ctx.bytes_moved;
>> -    if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> -        amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> -        p->bytes_moved_vis += ctx.bytes_moved;
>> -
>>       if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>>           domain = bo->allowed_domains;
>>           goto retry;
>>       }
>>
>> +    if (!r) {
>> +        struct ttm_resource *new_res = bo->tbo.resource;
>> +        bool moved = true;
>> +
>> +        if (old_res == new_res)
>> +            moved = false;
>> +        else if (old_res && new_res &&
>> +             old_res->mem_type == new_res->mem_type)
>> +            moved = false;
>> +
>> +        if (moved) {
>> +            p->bytes_moved += ctx.bytes_moved;
>> +            if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> +                amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> +                p->bytes_moved_vis += ctx.bytes_moved;
>> +        }
>> +    }
>> +
>>       return r;
>>   }
>>
Friedrich Vock May 13, 2024, 2:36 p.m. UTC | #3
On 09.05.24 11:19, Tvrtko Ursulin wrote:
>
> On 08/05/2024 20:08, Friedrich Vock wrote:
>> On 08.05.24 20:09, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>
>>> The logic assumed any migration attempt worked and therefore would
>>> over-
>>> account the amount of data migrated during buffer re-validation. As a
>>> consequence client can be unfairly penalised by incorrectly considering
>>> its migration budget spent.
>>
>> If the migration failed but data was still moved (which I think could be
>> the case when we try evicting everything but it still doesn't work?),
>> shouldn't the eviction movements count towards the ratelimit too?
>
> Possibly, which path would that be?
>
Thinking about it more, the only case where allocation still won't
succeed after evicting everything from a place is the edge case when the
buffer is larger than the place's size.

The most likely condition for this to happen (without the submission
failing entirely because the buffer just doesn't fit anywhere) would be
if the app tries creating a 256MB+ visible-VRAM buffer if resizeable BAR
is disabled.
This case could potentially trigger an allocation failure when trying
with preferred_domains, but retrying with allowed_domains, which
includes GTT, could subsequently work.

> I mean there are definitely more migration which *should not* be
> counted which I think your mini-series approaches more accurately.
> What this patch achieves, in its current RFC form, is reduces the
> "false-positive" migration budget depletions.
>
> So larger improvements aside, point of the series was to illustrate
> that even the things which were said to be working do not seem to. See
> cover letter to see what I thought does not work either well or at all.
Fair point. If this patchset does "wrong"/inaccurate accounting in a
different way that improves the experience, then it's still an improvement.
>>> Fix it by looking at the before and after buffer object backing
>>> store and
>>> only account if there was a change.
>>>
>>> FIXME:
>>> I think this needs a better solution to account for migrations between
>>> VRAM visible and non-visible portions.
>>
>> FWIW, I have some WIP patches (not posted on any MLs yet though) that
>> attempt to solve this issue (+actually enforcing ratelimits) by moving
>> the ratelimit accounting/enforcement to TTM entirely.
>>
>> By moving the accounting to TTM we can count moved bytes when we move
>> them, and don't have to rely on comparing resources to determine whether
>> moving actually happened. This should address your FIXME as well.
>
> Yep, I've seen them. They are not necessarily conflicting with this
> series, potentialy TTM placement flag aside. *If* something like this
> can be kept small and still manage to fix up a few simple things which
> do not appear to work at all at the moment.
>
> For the larger re-work it is quite, well, large and it is not easy to
> be certain the end result would work as expected. IMO it would be best
> to sketch out a larger series which brings some practical and
> masurable change in behaviour before commiting to merge things piecemeal.
>
Yeah, fully agree. Getting something working and iterating on that based
on the results you get seems like the best way forward, that's what I'll
be focusing on for now.

Thanks,
Friedrich

> For instance I have a niggling feeling the runtime games driver plays
> with placements and domains are not great and wonder if things could
> be cleaner if simplified by letting TTM manage things more, more
> explicitly, and having the list of placements more static. Thinking
> about it seems a step too far for now though.
>
> Regards,
>
> Tvrtko
>
>>
>> Regards,
>> Friedrich
>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26
>>> +++++++++++++++++++++-----
>>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index ec888fc6ead8..22708954ae68 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param,
>>> struct amdgpu_bo *bo)
>>>           .no_wait_gpu = false,
>>>           .resv = bo->tbo.base.resv
>>>       };
>>> +    struct ttm_resource *old_res;
>>>       uint32_t domain;
>>>       int r;
>>>
>>>       if (bo->tbo.pin_count)
>>>           return 0;
>>>
>>> +    old_res = bo->tbo.resource;
>>> +
>>>       /* Don't move this buffer if we have depleted our allowance
>>>        * to move it. Don't move anything if the threshold is zero.
>>>        */
>>> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param,
>>> struct amdgpu_bo *bo)
>>>       amdgpu_bo_placement_from_domain(bo, domain);
>>>       r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>>>
>>> -    p->bytes_moved += ctx.bytes_moved;
>>> -    if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>>> -        amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>>> -        p->bytes_moved_vis += ctx.bytes_moved;
>>> -
>>>       if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>>>           domain = bo->allowed_domains;
>>>           goto retry;
>>>       }
>>>
>>> +    if (!r) {
>>> +        struct ttm_resource *new_res = bo->tbo.resource;
>>> +        bool moved = true;
>>> +
>>> +        if (old_res == new_res)
>>> +            moved = false;
>>> +        else if (old_res && new_res &&
>>> +             old_res->mem_type == new_res->mem_type)
>>> +            moved = false;
>>> +
>>> +        if (moved) {
>>> +            p->bytes_moved += ctx.bytes_moved;
>>> +            if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>>> +                amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>>> +                p->bytes_moved_vis += ctx.bytes_moved;
>>> +        }
>>> +    }
>>> +
>>>       return r;
>>>   }
>>>
Christian König May 15, 2024, 7:14 a.m. UTC | #4
Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> The logic assumed any migration attempt worked and therefore would over-
> account the amount of data migrated during buffer re-validation. As a
> consequence client can be unfairly penalised by incorrectly considering
> its migration budget spent.
>
> Fix it by looking at the before and after buffer object backing store and
> only account if there was a change.
>
> FIXME:
> I think this needs a better solution to account for migrations between
> VRAM visible and non-visible portions.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
>   1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index ec888fc6ead8..22708954ae68 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   		.no_wait_gpu = false,
>   		.resv = bo->tbo.base.resv
>   	};
> +	struct ttm_resource *old_res;
>   	uint32_t domain;
>   	int r;
>   
>   	if (bo->tbo.pin_count)
>   		return 0;
>   
> +	old_res = bo->tbo.resource;
> +
>   	/* Don't move this buffer if we have depleted our allowance
>   	 * to move it. Don't move anything if the threshold is zero.
>   	 */
> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   	amdgpu_bo_placement_from_domain(bo, domain);
>   	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>   
> -	p->bytes_moved += ctx.bytes_moved;
> -	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> -	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> -		p->bytes_moved_vis += ctx.bytes_moved;
> -
>   	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>   		domain = bo->allowed_domains;
>   		goto retry;
>   	}
>   
> +	if (!r) {
> +		struct ttm_resource *new_res = bo->tbo.resource;
> +		bool moved = true;
> +
> +		if (old_res == new_res)
> +			moved = false;
> +		else if (old_res && new_res &&
> +			 old_res->mem_type == new_res->mem_type)
> +			moved = false;

The old resource might already be destroyed after you return from 
validation. So this here won't work.

Apart from that even when a migration attempt fails the moved bytes 
should be accounted.

When the validation attempt doesn't caused any moves then the bytecount 
here would be zero.

So as far as I can see that is as fair as you can get.

Regards,
Christian.

PS: Looks like our mail servers are once more not very reliable.

If you get mails from me multiple times please just ignore it.

> +
> +		if (moved) {
> +			p->bytes_moved += ctx.bytes_moved;
> +			if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> +			    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> +				p->bytes_moved_vis += ctx.bytes_moved;
> +		}
> +	}
> +
>   	return r;
>   }
>
Tvrtko Ursulin May 15, 2024, 10:51 a.m. UTC | #5
On 15/05/2024 08:14, Christian König wrote:
> Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> The logic assumed any migration attempt worked and therefore would over-
>> account the amount of data migrated during buffer re-validation. As a
>> consequence client can be unfairly penalised by incorrectly considering
>> its migration budget spent.
>>
>> Fix it by looking at the before and after buffer object backing store and
>> only account if there was a change.
>>
>> FIXME:
>> I think this needs a better solution to account for migrations between
>> VRAM visible and non-visible portions.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index ec888fc6ead8..22708954ae68 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, 
>> struct amdgpu_bo *bo)
>>           .no_wait_gpu = false,
>>           .resv = bo->tbo.base.resv
>>       };
>> +    struct ttm_resource *old_res;
>>       uint32_t domain;
>>       int r;
>>       if (bo->tbo.pin_count)
>>           return 0;
>> +    old_res = bo->tbo.resource;
>> +
>>       /* Don't move this buffer if we have depleted our allowance
>>        * to move it. Don't move anything if the threshold is zero.
>>        */
>> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, 
>> struct amdgpu_bo *bo)
>>       amdgpu_bo_placement_from_domain(bo, domain);
>>       r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>> -    p->bytes_moved += ctx.bytes_moved;
>> -    if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> -        amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> -        p->bytes_moved_vis += ctx.bytes_moved;
>> -
>>       if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>>           domain = bo->allowed_domains;
>>           goto retry;
>>       }
>> +    if (!r) {
>> +        struct ttm_resource *new_res = bo->tbo.resource;
>> +        bool moved = true;
>> +
>> +        if (old_res == new_res)
>> +            moved = false;
>> +        else if (old_res && new_res &&
>> +             old_res->mem_type == new_res->mem_type)
>> +            moved = false;
> 
> The old resource might already be destroyed after you return from 
> validation. So this here won't work.
> 
> Apart from that even when a migration attempt fails the moved bytes 
> should be accounted.
> 
> When the validation attempt doesn't caused any moves then the bytecount 
> here would be zero.
> 
> So as far as I can see that is as fair as you can get.

Right, I think I suffered a bit of tunnel vision here and completely 
ignore the _ctx_.moved_bytes part. Scratch this one too then.

Regards,

Tvrtko

> 
> Regards,
> Christian.
> 
> PS: Looks like our mail servers are once more not very reliable.
> 
> If you get mails from me multiple times please just ignore it.
> 
>> +
>> +        if (moved) {
>> +            p->bytes_moved += ctx.bytes_moved;
>> +            if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> +                amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> +                p->bytes_moved_vis += ctx.bytes_moved;
>> +        }
>> +    }
>> +
>>       return r;
>>   }
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..22708954ae68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,12 +784,15 @@  static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 		.no_wait_gpu = false,
 		.resv = bo->tbo.base.resv
 	};
+	struct ttm_resource *old_res;
 	uint32_t domain;
 	int r;
 
 	if (bo->tbo.pin_count)
 		return 0;
 
+	old_res = bo->tbo.resource;
+
 	/* Don't move this buffer if we have depleted our allowance
 	 * to move it. Don't move anything if the threshold is zero.
 	 */
@@ -817,16 +820,29 @@  static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 	amdgpu_bo_placement_from_domain(bo, domain);
 	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
 
-	p->bytes_moved += ctx.bytes_moved;
-	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
-		p->bytes_moved_vis += ctx.bytes_moved;
-
 	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
 		domain = bo->allowed_domains;
 		goto retry;
 	}
 
+	if (!r) {
+		struct ttm_resource *new_res = bo->tbo.resource;
+		bool moved = true;
+
+		if (old_res == new_res)
+			moved = false;
+		else if (old_res && new_res &&
+			 old_res->mem_type == new_res->mem_type)
+			moved = false;
+
+		if (moved) {
+			p->bytes_moved += ctx.bytes_moved;
+			if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
+			    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
+				p->bytes_moved_vis += ctx.bytes_moved;
+		}
+	}
+
 	return r;
 }