diff mbox series

[1/9] drm/amdgpu: generally allow over-commit during BO allocation

Message ID 20221125102137.1801-1-christian.koenig@amd.com (mailing list archive)
State New, archived
Headers show
Series [1/9] drm/amdgpu: generally allow over-commit during BO allocation | expand

Commit Message

Christian König Nov. 25, 2022, 10:21 a.m. UTC
We already fallback to a dummy BO with no backing store when we
allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.

Drop all those workarounds and generalize this for GTT as well. This
fixes ENOMEM issues with runaway applications which try to allocate/free
GTT in a loop and are otherwise only limited by the CPU speed.

The CS will wait for the cleanup of freed up BOs to satisfy the
various domain specific limits and so effectively throttle those
buggy applications down to a sane allocation behavior again.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
 2 files changed, 4 insertions(+), 18 deletions(-)

Comments

Alex Deucher Nov. 25, 2022, 6:18 p.m. UTC | #1
On Fri, Nov 25, 2022 at 5:21 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> We already fallback to a dummy BO with no backing store when we
> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>
> Drop all those workarounds and generalize this for GTT as well. This
> fixes ENOMEM issues with runaway applications which try to allocate/free
> GTT in a loop and are otherwise only limited by the CPU speed.
>
> The CS will wait for the cleanup of freed up BOs to satisfy the
> various domain specific limits and so effectively throttle those
> buggy applications down to a sane allocation behavior again.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

This looks like a good bug fix and unrelated to the rest of this series.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>  2 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a0780a4e3e61..62e98f1ad770 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
>         bp.resv = resv;
>         bp.preferred_domain = initial_domain;
>         bp.flags = flags;
> -       bp.domain = initial_domain;
> +       bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>         bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>
>         r = amdgpu_bo_create_user(adev, &bp, &ubo);
> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>         }
>
>         initial_domain = (u32)(0xffffffff & args->in.domains);
> -retry:
>         r = amdgpu_gem_object_create(adev, size, args->in.alignment,
> -                                    initial_domain,
> -                                    flags, ttm_bo_type_device, resv, &gobj);
> +                                    initial_domain, flags, ttm_bo_type_device,
> +                                    resv, &gobj);
>         if (r && r != -ERESTARTSYS) {
> -               if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> -                       flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> -                       goto retry;
> -               }
> -
> -               if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -                       initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
> -                       goto retry;
> -               }
>                 DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
>                                 size, initial_domain, args->in.alignment, r);
>         }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 974e85d8b6cc..919bbea2e3ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>                 bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>
>         bo->tbo.bdev = &adev->mman.bdev;
> -       if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
> -                         AMDGPU_GEM_DOMAIN_GDS))
> -               amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
> -       else
> -               amdgpu_bo_placement_from_domain(bo, bp->domain);
> +       amdgpu_bo_placement_from_domain(bo, bp->domain);
>         if (bp->type == ttm_bo_type_kernel)
>                 bo->tbo.priority = 1;
>
> --
> 2.34.1
>
Paneer Selvam, Arunpravin Nov. 28, 2022, 6 a.m. UTC | #2
Hi Christian,

Looks good to me.
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
for the series.

Regards,
Arun.

On 11/25/2022 3:51 PM, Christian König wrote:
> We already fallback to a dummy BO with no backing store when we
> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>
> Drop all those workarounds and generalize this for GTT as well. This
> fixes ENOMEM issues with runaway applications which try to allocate/free
> GTT in a loop and are otherwise only limited by the CPU speed.
>
> The CS will wait for the cleanup of freed up BOs to satisfy the
> various domain specific limits and so effectively throttle those
> buggy applications down to a sane allocation behavior again.
>
> Signed-off-by: Christian König<christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>   2 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a0780a4e3e61..62e98f1ad770 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
>   	bp.resv = resv;
>   	bp.preferred_domain = initial_domain;
>   	bp.flags = flags;
> -	bp.domain = initial_domain;
> +	bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>   	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>   
>   	r = amdgpu_bo_create_user(adev, &bp, &ubo);
> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>   	}
>   
>   	initial_domain = (u32)(0xffffffff & args->in.domains);
> -retry:
>   	r = amdgpu_gem_object_create(adev, size, args->in.alignment,
> -				     initial_domain,
> -				     flags, ttm_bo_type_device, resv, &gobj);
> +				     initial_domain, flags, ttm_bo_type_device,
> +				     resv, &gobj);
>   	if (r && r != -ERESTARTSYS) {
> -		if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> -			flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> -			goto retry;
> -		}
> -
> -		if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -			initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
> -			goto retry;
> -		}
>   		DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
>   				size, initial_domain, args->in.alignment, r);
>   	}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 974e85d8b6cc..919bbea2e3ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>   		bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>   
>   	bo->tbo.bdev = &adev->mman.bdev;
> -	if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
> -			  AMDGPU_GEM_DOMAIN_GDS))
> -		amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
> -	else
> -		amdgpu_bo_placement_from_domain(bo, bp->domain);
> +	amdgpu_bo_placement_from_domain(bo, bp->domain);
>   	if (bp->type == ttm_bo_type_kernel)
>   		bo->tbo.priority = 1;
>
Christian König Dec. 5, 2022, 1:41 p.m. UTC | #3
Am 25.11.22 um 19:18 schrieb Alex Deucher:
> On Fri, Nov 25, 2022 at 5:21 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> We already fallback to a dummy BO with no backing store when we
>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>
>> Drop all those workarounds and generalize this for GTT as well. This
>> fixes ENOMEM issues with runaway applications which try to allocate/free
>> GTT in a loop and are otherwise only limited by the CPU speed.
>>
>> The CS will wait for the cleanup of freed up BOs to satisfy the
>> various domain specific limits and so effectively throttle those
>> buggy applications down to a sane allocation behavior again.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> This looks like a good bug fix and unrelated to the rest of this series.
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

Yeah, this was just in the tree because I tried to address some bug report.

The TTM changes mitigated the bugs, but this patch here is the real 
underlying fix.

I've cherry picked it over to amd-staging-drm-next and pushed it.

Thanks,
Christian.

>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>>   2 files changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a0780a4e3e61..62e98f1ad770 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
>>          bp.resv = resv;
>>          bp.preferred_domain = initial_domain;
>>          bp.flags = flags;
>> -       bp.domain = initial_domain;
>> +       bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>          bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>
>>          r = amdgpu_bo_create_user(adev, &bp, &ubo);
>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>>          }
>>
>>          initial_domain = (u32)(0xffffffff & args->in.domains);
>> -retry:
>>          r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>> -                                    initial_domain,
>> -                                    flags, ttm_bo_type_device, resv, &gobj);
>> +                                    initial_domain, flags, ttm_bo_type_device,
>> +                                    resv, &gobj);
>>          if (r && r != -ERESTARTSYS) {
>> -               if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>> -                       flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> -                       goto retry;
>> -               }
>> -
>> -               if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -                       initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>> -                       goto retry;
>> -               }
>>                  DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
>>                                  size, initial_domain, args->in.alignment, r);
>>          }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 974e85d8b6cc..919bbea2e3ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>                  bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>
>>          bo->tbo.bdev = &adev->mman.bdev;
>> -       if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>> -                         AMDGPU_GEM_DOMAIN_GDS))
>> -               amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>> -       else
>> -               amdgpu_bo_placement_from_domain(bo, bp->domain);
>> +       amdgpu_bo_placement_from_domain(bo, bp->domain);
>>          if (bp->type == ttm_bo_type_kernel)
>>                  bo->tbo.priority = 1;
>>
>> --
>> 2.34.1
>>
Felix Kuehling Dec. 10, 2022, 6:15 a.m. UTC | #4
On 2022-11-25 05:21, Christian König wrote:
> We already fallback to a dummy BO with no backing store when we
> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>
> Drop all those workarounds and generalize this for GTT as well. This
> fixes ENOMEM issues with runaway applications which try to allocate/free
> GTT in a loop and are otherwise only limited by the CPU speed.
>
> The CS will wait for the cleanup of freed up BOs to satisfy the
> various domain specific limits and so effectively throttle those
> buggy applications down to a sane allocation behavior again.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench 
sees a huge VRAM allocation slow-down. And 
KFDMemoryTest.LargestVramBufferTest can only allocate half the available 
memory.

This seems to be caused by initially validating VRAM BOs in the CPU 
domain, which allocates a ttm_tt. A subsequent validation in the VRAM 
domain involves a copy from GTT to VRAM.

After that, freeing of BOs can get delayed by the ghost object of a 
previous migration, which delays calling release notifiers and causes 
problems for KFDs available memory accounting.

I experimented with a workaround that validates BOs immediately after 
allocation, but that only moves around the delays and doesn't solve the 
problem. During those experiments I may also have stumbled over a bug in 
ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move before 
initializing and locking fbo->base.base._resv. This results in a flood 
of warnings because ttm_bo_set_bulk_move expects the reservation to be 
locked.

Right now I'd like to remove the bp.domain = initial_domain | 
AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.

Regards,
   Felix


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>   2 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a0780a4e3e61..62e98f1ad770 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
>   	bp.resv = resv;
>   	bp.preferred_domain = initial_domain;
>   	bp.flags = flags;
> -	bp.domain = initial_domain;
> +	bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>   	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>   
>   	r = amdgpu_bo_create_user(adev, &bp, &ubo);
> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>   	}
>   
>   	initial_domain = (u32)(0xffffffff & args->in.domains);
> -retry:
>   	r = amdgpu_gem_object_create(adev, size, args->in.alignment,
> -				     initial_domain,
> -				     flags, ttm_bo_type_device, resv, &gobj);
> +				     initial_domain, flags, ttm_bo_type_device,
> +				     resv, &gobj);
>   	if (r && r != -ERESTARTSYS) {
> -		if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> -			flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> -			goto retry;
> -		}
> -
> -		if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -			initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
> -			goto retry;
> -		}
>   		DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
>   				size, initial_domain, args->in.alignment, r);
>   	}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 974e85d8b6cc..919bbea2e3ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>   		bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>   
>   	bo->tbo.bdev = &adev->mman.bdev;
> -	if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
> -			  AMDGPU_GEM_DOMAIN_GDS))
> -		amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
> -	else
> -		amdgpu_bo_placement_from_domain(bo, bp->domain);
> +	amdgpu_bo_placement_from_domain(bo, bp->domain);
>   	if (bp->type == ttm_bo_type_kernel)
>   		bo->tbo.priority = 1;
>
Christian König Dec. 10, 2022, 2:12 p.m. UTC | #5
Am 10.12.22 um 07:15 schrieb Felix Kuehling:
> On 2022-11-25 05:21, Christian König wrote:
>> We already fallback to a dummy BO with no backing store when we
>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>
>> Drop all those workarounds and generalize this for GTT as well. This
>> fixes ENOMEM issues with runaway applications which try to allocate/free
>> GTT in a loop and are otherwise only limited by the CPU speed.
>>
>> The CS will wait for the cleanup of freed up BOs to satisfy the
>> various domain specific limits and so effectively throttle those
>> buggy applications down to a sane allocation behavior again.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>
> This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench 
> sees a huge VRAM allocation slow-down. And 
> KFDMemoryTest.LargestVramBufferTest can only allocate half the 
> available memory.

Mhm, I wasn't expecting that we use this for the KFD as well.

>
> This seems to be caused by initially validating VRAM BOs in the CPU 
> domain, which allocates a ttm_tt. A subsequent validation in the VRAM 
> domain involves a copy from GTT to VRAM.

The idea was to initially create the BOs without any backing store.

>
> After that, freeing of BOs can get delayed by the ghost object of a 
> previous migration, which delays calling release notifiers and causes 
> problems for KFDs available memory accounting.
>
> I experimented with a workaround that validates BOs immediately after 
> allocation, but that only moves around the delays and doesn't solve 
> the problem. During those experiments I may also have stumbled over a 
> bug in ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move 
> before initializing and locking fbo->base.base._resv. This results in 
> a flood of warnings because ttm_bo_set_bulk_move expects the 
> reservation to be locked.
>
> Right now I'd like to remove the bp.domain = initial_domain | 
> AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.

Yeah, let's revert and investigate this first.

Thanks,
Christian.

>
> Regards,
>   Felix
>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>>   2 files changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a0780a4e3e61..62e98f1ad770 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
>> *adev, unsigned long size,
>>       bp.resv = resv;
>>       bp.preferred_domain = initial_domain;
>>       bp.flags = flags;
>> -    bp.domain = initial_domain;
>> +    bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>         r = amdgpu_bo_create_user(adev, &bp, &ubo);
>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device 
>> *dev, void *data,
>>       }
>>         initial_domain = (u32)(0xffffffff & args->in.domains);
>> -retry:
>>       r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>> -                     initial_domain,
>> -                     flags, ttm_bo_type_device, resv, &gobj);
>> +                     initial_domain, flags, ttm_bo_type_device,
>> +                     resv, &gobj);
>>       if (r && r != -ERESTARTSYS) {
>> -        if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>> -            flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> -            goto retry;
>> -        }
>> -
>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>> -            goto retry;
>> -        }
>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
>> %d)\n",
>>                   size, initial_domain, args->in.alignment, r);
>>       }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 974e85d8b6cc..919bbea2e3ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>           bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>         bo->tbo.bdev = &adev->mman.bdev;
>> -    if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>> -              AMDGPU_GEM_DOMAIN_GDS))
>> -        amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>> -    else
>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>> +    amdgpu_bo_placement_from_domain(bo, bp->domain);
>>       if (bp->type == ttm_bo_type_kernel)
>>           bo->tbo.priority = 1;
Felix Kuehling Dec. 11, 2022, 1:13 a.m. UTC | #6
Am 2022-12-10 um 09:12 schrieb Christian König:
> Am 10.12.22 um 07:15 schrieb Felix Kuehling:
>> On 2022-11-25 05:21, Christian König wrote:
>>> We already fallback to a dummy BO with no backing store when we
>>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>>
>>> Drop all those workarounds and generalize this for GTT as well. This
>>> fixes ENOMEM issues with runaway applications which try to 
>>> allocate/free
>>> GTT in a loop and are otherwise only limited by the CPU speed.
>>>
>>> The CS will wait for the cleanup of freed up BOs to satisfy the
>>> various domain specific limits and so effectively throttle those
>>> buggy applications down to a sane allocation behavior again.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>
>> This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench 
>> sees a huge VRAM allocation slow-down. And 
>> KFDMemoryTest.LargestVramBufferTest can only allocate half the 
>> available memory.
>
> Mhm, I wasn't expecting that we use this for the KFD as well.

Yeah, we use amdgpu_gem_object_create. I guess we could duplicate its 
functionality or add a "no_overcommit" or "greedy" parameter for our needs.


>
>>
>> This seems to be caused by initially validating VRAM BOs in the CPU 
>> domain, which allocates a ttm_tt. A subsequent validation in the VRAM 
>> domain involves a copy from GTT to VRAM.
>
> The idea was to initially create the BOs without any backing store.

I thought about it a bit more. I believe the BO creation without backing 
store is working as expected. But amdgpu_bo_move can't move the 
uninitialized BO directly from system to VRAM. It returns -EMULTIHOP. So 
the BO gets moved to GTT first (allocating system memory) before it can 
be migrated to VRAM. That adds a bunch of overhead with unnecessary 
system memory allocation and forces all VRAM to be zero-initialized on 
the CPU and copied through PCIe. I think your idea would work with 
almost no overhead if amdgpu_bo_move could directly move a BO without 
backing store to VRAM with ttm_bo_move_null.

Regards,
   Felix


>
>>
>> After that, freeing of BOs can get delayed by the ghost object of a 
>> previous migration, which delays calling release notifiers and causes 
>> problems for KFDs available memory accounting.
>>
>> I experimented with a workaround that validates BOs immediately after 
>> allocation, but that only moves around the delays and doesn't solve 
>> the problem. During those experiments I may also have stumbled over a 
>> bug in ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move 
>> before initializing and locking fbo->base.base._resv. This results in 
>> a flood of warnings because ttm_bo_set_bulk_move expects the 
>> reservation to be locked.
>>
>> Right now I'd like to remove the bp.domain = initial_domain | 
>> AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.
>
> Yeah, let's revert and investigate this first.
>
> Thanks,
> Christian.
>
>>
>> Regards,
>>   Felix
>>
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>>>   2 files changed, 4 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index a0780a4e3e61..62e98f1ad770 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct 
>>> amdgpu_device *adev, unsigned long size,
>>>       bp.resv = resv;
>>>       bp.preferred_domain = initial_domain;
>>>       bp.flags = flags;
>>> -    bp.domain = initial_domain;
>>> +    bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>>       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>>         r = amdgpu_bo_create_user(adev, &bp, &ubo);
>>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device 
>>> *dev, void *data,
>>>       }
>>>         initial_domain = (u32)(0xffffffff & args->in.domains);
>>> -retry:
>>>       r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>>> -                     initial_domain,
>>> -                     flags, ttm_bo_type_device, resv, &gobj);
>>> +                     initial_domain, flags, ttm_bo_type_device,
>>> +                     resv, &gobj);
>>>       if (r && r != -ERESTARTSYS) {
>>> -        if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>>> -            flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>>> -            goto retry;
>>> -        }
>>> -
>>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>>> -            goto retry;
>>> -        }
>>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
>>> %d)\n",
>>>                   size, initial_domain, args->in.alignment, r);
>>>       }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 974e85d8b6cc..919bbea2e3ac 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>           bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>>         bo->tbo.bdev = &adev->mman.bdev;
>>> -    if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>>> -              AMDGPU_GEM_DOMAIN_GDS))
>>> -        amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>>> -    else
>>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>>> +    amdgpu_bo_placement_from_domain(bo, bp->domain);
>>>       if (bp->type == ttm_bo_type_kernel)
>>>           bo->tbo.priority = 1;
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a0780a4e3e61..62e98f1ad770 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -113,7 +113,7 @@  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
 	bp.resv = resv;
 	bp.preferred_domain = initial_domain;
 	bp.flags = flags;
-	bp.domain = initial_domain;
+	bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
 	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
 
 	r = amdgpu_bo_create_user(adev, &bp, &ubo);
@@ -332,20 +332,10 @@  int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
 	}
 
 	initial_domain = (u32)(0xffffffff & args->in.domains);
-retry:
 	r = amdgpu_gem_object_create(adev, size, args->in.alignment,
-				     initial_domain,
-				     flags, ttm_bo_type_device, resv, &gobj);
+				     initial_domain, flags, ttm_bo_type_device,
+				     resv, &gobj);
 	if (r && r != -ERESTARTSYS) {
-		if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
-			flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
-			goto retry;
-		}
-
-		if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-			initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-			goto retry;
-		}
 		DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
 				size, initial_domain, args->in.alignment, r);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 974e85d8b6cc..919bbea2e3ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -581,11 +581,7 @@  int amdgpu_bo_create(struct amdgpu_device *adev,
 		bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
 
 	bo->tbo.bdev = &adev->mman.bdev;
-	if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
-			  AMDGPU_GEM_DOMAIN_GDS))
-		amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
-	else
-		amdgpu_bo_placement_from_domain(bo, bp->domain);
+	amdgpu_bo_placement_from_domain(bo, bp->domain);
 	if (bp->type == ttm_bo_type_kernel)
 		bo->tbo.priority = 1;