Message ID | 20220221101239.2863-1-qiang.yu@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/amdgpu: check vm ready by evicting | expand |
Am 21.02.22 um 11:12 schrieb Qiang Yu: > Workstation application ANSA/META get this error dmesg: > [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16) > > This is caused by: > 1. create a 256MB buffer in invisible VRAM > 2. CPU map the buffer and access it causes vm_fault and try to move > it to visible VRAM > 3. force visible VRAM space and traverse all VRAM bos to check if > evicting this bo is valuable > 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() > will set amdgpu_vm->evicting, but latter due to not in visible > VRAM, won't really evict it so not add it to amdgpu_vm->evicted > 5. before next CS to clear the amdgpu_vm->evicting, user VM ops > ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) > but fail in amdgpu_vm_bo_update_mapping() (check > amdgpu_vm->evicting) and get this error log > > This error won't affect functionality as next CS will finish the > waiting VM ops. But we'd better clear the error log by check the > evicting flag which really stop VM ops latter. > > Signed-off-by: Qiang Yu <qiang.yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Good work. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 37acd8911168..2cd9f1a2e5fa 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -770,11 +770,16 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm, > * Check if all VM PDs/PTs are ready for updates > * > * Returns: > - * True if eviction list is empty. > + * True if VM is not evicting. > */ > bool amdgpu_vm_ready(struct amdgpu_vm *vm) > { > - return list_empty(&vm->evicted); > + bool ret; > + > + amdgpu_vm_eviction_lock(vm); > + ret = !vm->evicting; > + amdgpu_vm_eviction_unlock(vm); > + return ret; > } > > /**
Dear Qiang Yu, Am 21.02.22 um 11:12 schrieb Qiang Yu: Thank you for your patch. Reading the commit message summary, I have no idea what “check vm ready by evicting” means. Can you please rephrase it? > Workstation application ANSA/META get this error dmesg: What version, and how can this be reproduced exactly? Just by starting the application? > [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16) > > This is caused by: > 1. create a 256MB buffer in invisible VRAM > 2. CPU map the buffer and access it causes vm_fault and try to move > it to visible VRAM > 3. force visible VRAM space and traverse all VRAM bos to check if > evicting this bo is valuable > 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() > will set amdgpu_vm->evicting, but latter due to not in visible > VRAM, won't really evict it so not add it to amdgpu_vm->evicted > 5. before next CS to clear the amdgpu_vm->evicting, user VM ops > ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) > but fail in amdgpu_vm_bo_update_mapping() (check > amdgpu_vm->evicting) and get this error log > > This error won't affect functionality as next CS will finish the > waiting VM ops. But we'd better clear the error log by check the s/check/checking/ > evicting flag which really stop VM ops latter. stop*s*? Can you please elaborate. Christian’s and your discussions was quite long, so adding a summary, why this approach works and what possible regressions there are going to be might be warranted. Kind regards, Paul > Signed-off-by: Qiang Yu <qiang.yu@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 37acd8911168..2cd9f1a2e5fa 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -770,11 +770,16 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm, > * Check if all VM PDs/PTs are ready for updates > * > * Returns: > - * True if eviction list is empty. > + * True if VM is not evicting. > */ > bool amdgpu_vm_ready(struct amdgpu_vm *vm) > { > - return list_empty(&vm->evicted); > + bool ret; > + > + amdgpu_vm_eviction_lock(vm); > + ret = !vm->evicting; > + amdgpu_vm_eviction_unlock(vm); > + return ret; > } > > /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 37acd8911168..2cd9f1a2e5fa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -770,11 +770,16 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm, * Check if all VM PDs/PTs are ready for updates * * Returns: - * True if eviction list is empty. + * True if VM is not evicting. */ bool amdgpu_vm_ready(struct amdgpu_vm *vm) { - return list_empty(&vm->evicted); + bool ret; + + amdgpu_vm_eviction_lock(vm); + ret = !vm->evicting; + amdgpu_vm_eviction_unlock(vm); + return ret; } /**
Workstation application ANSA/META get this error dmesg: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16) This is caused by: 1. create a 256MB buffer in invisible VRAM 2. CPU map the buffer and access it causes vm_fault and try to move it to visible VRAM 3. force visible VRAM space and traverse all VRAM bos to check if evicting this bo is valuable 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() will set amdgpu_vm->evicting, but latter due to not in visible VRAM, won't really evict it so not add it to amdgpu_vm->evicted 5. before next CS to clear the amdgpu_vm->evicting, user VM ops ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) but fail in amdgpu_vm_bo_update_mapping() (check amdgpu_vm->evicting) and get this error log This error won't affect functionality as next CS will finish the waiting VM ops. But we'd better clear the error log by check the evicting flag which really stop VM ops latter. Signed-off-by: Qiang Yu <qiang.yu@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)