diff mbox series

[5/6] drm/amdgpu: always enable move threshold for BOs

Message ID 20240604160503.43359-6-christian.koenig@amd.com (mailing list archive)
State New, archived
Headers show
Series [1/6] drm/amdgpu: cleanup MES command submission | expand

Commit Message

Christian König June 4, 2024, 4:05 p.m. UTC
This should prevent buffer moves when the threshold is reached during
CS.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 36 ++++++++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 +++++++++----
 2 files changed, 29 insertions(+), 29 deletions(-)

Comments

Tvrtko Ursulin June 11, 2024, 4:24 p.m. UTC | #1
Hi Christian,

On 04/06/2024 17:05, Christian König wrote:
> This should prevent buffer moves when the threshold is reached during
> CS.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 36 ++++++++--------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 +++++++++----
>   2 files changed, 29 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index ec888fc6ead8..9a217932a4fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -784,7 +784,6 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   		.no_wait_gpu = false,
>   		.resv = bo->tbo.base.resv
>   	};
> -	uint32_t domain;
>   	int r;
>   
>   	if (bo->tbo.pin_count)
> @@ -796,37 +795,28 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   	if (p->bytes_moved < p->bytes_moved_threshold &&
>   	    (!bo->tbo.base.dma_buf ||
>   	    list_empty(&bo->tbo.base.dma_buf->attachments))) {
> +
> +		/* And don't move a CPU_ACCESS_REQUIRED BO to limited
> +		 * visible VRAM if we've depleted our allowance to do
> +		 * that.
> +		 */
>   		if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> -		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
> -			/* And don't move a CPU_ACCESS_REQUIRED BO to limited
> -			 * visible VRAM if we've depleted our allowance to do
> -			 * that.
> -			 */
> -			if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
> -				domain = bo->preferred_domains;
> -			else
> -				domain = bo->allowed_domains;
> -		} else {
> -			domain = bo->preferred_domains;
> -		}
> -	} else {
> -		domain = bo->allowed_domains;
> +		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
> +		    p->bytes_moved_vis < p->bytes_moved_vis_threshold)
> +			ctx.move_threshold = p->bytes_moved_vis_threshold -
> +				p->bytes_moved_vis;
> +		else
> +			ctx.move_threshold = p->bytes_moved_vis_threshold -
> +				p->bytes_moved;
>   	}
>   
> -retry:
> -	amdgpu_bo_placement_from_domain(bo, domain);
> +	amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
>   	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>   
>   	p->bytes_moved += ctx.bytes_moved;
>   	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>   	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>   		p->bytes_moved_vis += ctx.bytes_moved;
> -
> -	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
> -		domain = bo->allowed_domains;
> -		goto retry;
> -	}
> -
>   	return r;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 8c92065c2d52..cae1a5420c58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -168,13 +168,23 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>   			abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
>   			AMDGPU_PL_PREEMPT : TTM_PL_TT;
>   		places[c].flags = 0;
> -		/*
> -		 * When GTT is just an alternative to VRAM make sure that we
> -		 * only use it as fallback and still try to fill up VRAM first.
> -		 */
> +
>   		if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
> -		    !(adev->flags & AMD_IS_APU))
> -			places[c].flags |= TTM_PL_FLAG_FALLBACK;
> +		    !(adev->flags & AMD_IS_APU)) {
> +			/*
> +			 * When GTT is just an alternative to VRAM make sure that we
> +			 * only use it as fallback and still try to fill up VRAM first.
> +			*/
> +			if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
> +				places[c].flags |= TTM_PL_FLAG_FALLBACK;
> +
> +			/*
> +			 * Enable GTT when the threshold of moved bytes is
> +			 * reached. This prevents any non essential buffer move
> +			 * when the links are already saturated.
> +			 */
> +			places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
> +		}

For the APU case I *think* this works, but for discrete I am not sure yet.

As a side note and disclaimer, the TTM "resource compatible" logic has a 
half-life of about one week in my brain until I need to almost re-figure 
it all out. I don't know if it just me, but I find it really 
non-intuitive and almost like double, triple, or even quadruple negation 
way of thinking about things.

It is not helping that with this proposal you set threshold on just one 
of the possible object placements which further increases the asymmetry. 
For me intuitive thing would be that thresholds apply to the act of 
changing the current placement directly. Not indirectly via playing with 
one of the placement flags dynamically.

Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will 
be considered compatible while under the migration budget?

(Side note, the fact both flags are set I also find very difficult to 
mentally model.)

Say a buffer was evicted to GTT already. What then brings it back to VRAM?

The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine 
(applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't 
that the opposite of what would be required and causes nothing to be 
migrated back in? What am I missing?

Regards,

Tvrtko
Christian König June 14, 2024, 9:53 a.m. UTC | #2
>>           if (domain & abo->preferred_domains & 
>> AMDGPU_GEM_DOMAIN_VRAM &&
>> -            !(adev->flags & AMD_IS_APU))
>> -            places[c].flags |= TTM_PL_FLAG_FALLBACK;
>> +            !(adev->flags & AMD_IS_APU)) {
>> +            /*
>> +             * When GTT is just an alternative to VRAM make sure 
>> that we
>> +             * only use it as fallback and still try to fill up VRAM 
>> first.
>> +            */
>> +            if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
>> +                places[c].flags |= TTM_PL_FLAG_FALLBACK;
>> +
>> +            /*
>> +             * Enable GTT when the threshold of moved bytes is
>> +             * reached. This prevents any non essential buffer move
>> +             * when the links are already saturated.
>> +             */
>> +            places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
>> +        }
>
> For the APU case I *think* this works, but for discrete I am not sure 
> yet.

Agree, APUs are basically already fine as they are. VRAM is just used so 
that it isn't wasted there.

>
> As a side note and disclaimer, the TTM "resource compatible" logic has 
> a half-life of about one week in my brain until I need to almost 
> re-figure it all out. I don't know if it just me, but I find it really 
> non-intuitive and almost like double, triple, or even quadruple 
> negation way of thinking about things.

Yeah I was also going back and forth between the different approaches 
multiple times and just ended up in this implementation because it 
seemed to do what I wanted to have.

It's certainly not very intuitive what's going on here.

>
> It is not helping that with this proposal you set threshold on just 
> one of the possible object placements which further increases the 
> asymmetry. For me intuitive thing would be that thresholds apply to 
> the act of changing the current placement directly. Not indirectly via 
> playing with one of the placement flags dynamically.

Interesting idea, how would the handling then be? Currently we have only 
the stages - 'don't evict' and 'evict'. Should we make it something more 
like 'don't move', 'move', 'evict' ?

>
>
> Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
> TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will 
> be considered compatible while under the migration budget?
>
> (Side note, the fact both flags are set I also find very difficult to 
> mentally model.)
>
> Say a buffer was evicted to GTT already. What then brings it back to 
> VRAM?
>
> The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine 
> (applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't 
> that the opposite of what would be required and causes nothing to be 
> migrated back in? What am I missing?

The flag says that GTT is fine when ctx->bytes_moved >= 
ctx->move_threshold. The logic is exactly inverted to what you described.

This way a BO will be moved back into VRAM as long as bytes moved 
doesn't exceed the threshold.

Setting both flags has the effect of saying: It's ok that the BO stays 
in GTT when you either above the move threshold or would have to evict 
something.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
Tvrtko Ursulin June 14, 2024, 4:06 p.m. UTC | #3
On 14/06/2024 10:53, Christian König wrote:
> 
>>>           if (domain & abo->preferred_domains & 
>>> AMDGPU_GEM_DOMAIN_VRAM &&
>>> -            !(adev->flags & AMD_IS_APU))
>>> -            places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>> +            !(adev->flags & AMD_IS_APU)) {
>>> +            /*
>>> +             * When GTT is just an alternative to VRAM make sure 
>>> that we
>>> +             * only use it as fallback and still try to fill up VRAM 
>>> first.
>>> +            */
>>> +            if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
>>> +                places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>> +
>>> +            /*
>>> +             * Enable GTT when the threshold of moved bytes is
>>> +             * reached. This prevents any non essential buffer move
>>> +             * when the links are already saturated.
>>> +             */
>>> +            places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
>>> +        }
>>
>> For the APU case I *think* this works, but for discrete I am not sure 
>> yet.
> 
> Agree, APUs are basically already fine as they are. VRAM is just used so 
> that it isn't wasted there.

Well yeah it works, but because re-validation is broken so it cannot hit 
the broken migration budget. ;)

>> As a side note and disclaimer, the TTM "resource compatible" logic has 
>> a half-life of about one week in my brain until I need to almost 
>> re-figure it all out. I don't know if it just me, but I find it really 
>> non-intuitive and almost like double, triple, or even quadruple 
>> negation way of thinking about things.
> 
> Yeah I was also going back and forth between the different approaches 
> multiple times and just ended up in this implementation because it 
> seemed to do what I wanted to have.
> 
> It's certainly not very intuitive what's going on here.
> 
>>
>> It is not helping that with this proposal you set threshold on just 
>> one of the possible object placements which further increases the 
>> asymmetry. For me intuitive thing would be that thresholds apply to 
>> the act of changing the current placement directly. Not indirectly via 
>> playing with one of the placement flags dynamically.
> 
> Interesting idea, how would the handling then be? Currently we have only 
> the stages - 'don't evict' and 'evict'. Should we make it something more 
> like 'don't move', 'move', 'evict' ?

Intuitively I would think "don't move" aligns with the "out of migration 
budget" concept.

Since in this patch you add move_threshold to ttm_operation_context 
could it simply be used as the overall criteria if it is set?

In a way like:

  1. If the current placement is from the list of userspace supplied 
valid ones, and
  2. Migration limit has been set, and
  3. It is spent.

-> Then just don't migrate, return "all is good" from ttm_bo_validate.

Though I am not sure at the moment how that would interact with the 
amdgpu_evict_flags and placements userspace did not specify.

>> Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
>> TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will 
>> be considered compatible while under the migration budget?
>>
>> (Side note, the fact both flags are set I also find very difficult to 
>> mentally model.)
>>
>> Say a buffer was evicted to GTT already. What then brings it back to 
>> VRAM?
>>
>> The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine 
>> (applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't 
>> that the opposite of what would be required and causes nothing to be 
>> migrated back in? What am I missing?
> 
> The flag says that GTT is fine when ctx->bytes_moved >= 
> ctx->move_threshold. The logic is exactly inverted to what you described.
> 
> This way a BO will be moved back into VRAM as long as bytes moved 
> doesn't exceed the threshold.

I'm afraid I need to sketch it out... If buffer is currently in GTT and 
placements are VRAM+GTT.

ttm_bo_validate(evicting=false)

1st iteration:
res=GTT != place=VRAM
    continue

2nd iteration:
res=GTT == place=GTT+FALLBACK+THRESHOLD

ttm_place_applicable(GTT)
   moved < threshold
     return true

Buffer stays in GTT while under migration budget -> wrong, no? Or am I 
still confused?

Regards,

Tvrtko

> Setting both flags has the effect of saying: It's ok that the BO stays 
> in GTT when you either above the move threshold or would have to evict 
> something.
> 
> Regards,
> Christian.
> 
>>
>> Regards,
>>
>> Tvrtko
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..9a217932a4fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,7 +784,6 @@  static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 		.no_wait_gpu = false,
 		.resv = bo->tbo.base.resv
 	};
-	uint32_t domain;
 	int r;
 
 	if (bo->tbo.pin_count)
@@ -796,37 +795,28 @@  static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 	if (p->bytes_moved < p->bytes_moved_threshold &&
 	    (!bo->tbo.base.dma_buf ||
 	    list_empty(&bo->tbo.base.dma_buf->attachments))) {
+
+		/* And don't move a CPU_ACCESS_REQUIRED BO to limited
+		 * visible VRAM if we've depleted our allowance to do
+		 * that.
+		 */
 		if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
-			/* And don't move a CPU_ACCESS_REQUIRED BO to limited
-			 * visible VRAM if we've depleted our allowance to do
-			 * that.
-			 */
-			if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
-				domain = bo->preferred_domains;
-			else
-				domain = bo->allowed_domains;
-		} else {
-			domain = bo->preferred_domains;
-		}
-	} else {
-		domain = bo->allowed_domains;
+		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
+		    p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+			ctx.move_threshold = p->bytes_moved_vis_threshold -
+				p->bytes_moved_vis;
+		else
+			ctx.move_threshold = p->bytes_moved_vis_threshold -
+				p->bytes_moved;
 	}
 
-retry:
-	amdgpu_bo_placement_from_domain(bo, domain);
+	amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
 	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
 
 	p->bytes_moved += ctx.bytes_moved;
 	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
 	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
 		p->bytes_moved_vis += ctx.bytes_moved;
-
-	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
-		domain = bo->allowed_domains;
-		goto retry;
-	}
-
 	return r;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8c92065c2d52..cae1a5420c58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -168,13 +168,23 @@  void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 			abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
 			AMDGPU_PL_PREEMPT : TTM_PL_TT;
 		places[c].flags = 0;
-		/*
-		 * When GTT is just an alternative to VRAM make sure that we
-		 * only use it as fallback and still try to fill up VRAM first.
-		 */
+
 		if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
-		    !(adev->flags & AMD_IS_APU))
-			places[c].flags |= TTM_PL_FLAG_FALLBACK;
+		    !(adev->flags & AMD_IS_APU)) {
+			/*
+			 * When GTT is just an alternative to VRAM make sure that we
+			 * only use it as fallback and still try to fill up VRAM first.
+			*/
+			if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
+				places[c].flags |= TTM_PL_FLAG_FALLBACK;
+
+			/*
+			 * Enable GTT when the threshold of moved bytes is
+			 * reached. This prevents any non essential buffer move
+			 * when the links are already saturated.
+			 */
+			places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
+		}
 		c++;
 	}