diff mbox series

drm/ttm: Put BO in its memory manager's lru list

Message ID 20211109111954.41968-1-xinhui.pan@amd.com (mailing list archive)
State New, archived
Headers show
Series drm/ttm: Put BO in its memory manager's lru list | expand

Commit Message

Pan, Xinhui Nov. 9, 2021, 11:19 a.m. UTC
After we move BO to a new memory region, we should put it to
the new memory manager's lru list regardless we unlock the resv or not.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Christian König Nov. 9, 2021, 12:20 p.m. UTC | #1
Am 09.11.21 um 12:19 schrieb xinhui pan:
> After we move BO to a new memory region, we should put it to
> the new memory manager's lru list regardless we unlock the resv or not.
>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>

Interesting find, did you trigger that somehow or did you just stumbled 
over it by reading the code?

Patch is Reviewed-by: Christian König <christian.koenig@amd.com>, I will 
pick that up for drm-misc-next.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index f1367107925b..e307004f0b28 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -701,6 +701,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>   	ret = ttm_bo_evict(bo, ctx);
>   	if (locked)
>   		ttm_bo_unreserve(bo);
> +	else
> +		ttm_bo_move_to_lru_tail_unlocked(bo);
>   
>   	ttm_bo_put(bo);
>   	return ret;
Pan, Xinhui Nov. 9, 2021, 12:28 p.m. UTC | #2
[AMD Official Use Only]

I hit vulkan cts test hang with navi23.

dmesg says gmc page fault with address 0x0, 0x1000, 0x2000....
And some debug log also says amdgu copy one BO from system Domain to system Domain which is really weird.
Christian König Nov. 9, 2021, 12:35 p.m. UTC | #3
Mhm, I'm not sure what the rational behind that is.

Not moving the BO would make things less efficient, but should never 
cause a crash.

Maybe we should add a CC: stable tag and push it to -fixes instead?

Christian.

Am 09.11.21 um 13:28 schrieb Pan, Xinhui:
> [AMD Official Use Only]
>
> I hit vulkan cts test hang with navi23.
>
> dmesg says gmc page fault with address 0x0, 0x1000, 0x2000....
> And some debug log also says amdgu copy one BO from system Domain to system Domain which is really weird.
> ________________________________________
> 发件人: Koenig, Christian <Christian.Koenig@amd.com>
> 发送时间: 2021年11月9日 20:20
> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
> 抄送: dri-devel@lists.freedesktop.org
> 主题: Re: [PATCH] drm/ttm: Put BO in its memory manager's lru list
>
> Am 09.11.21 um 12:19 schrieb xinhui pan:
>> After we move BO to a new memory region, we should put it to
>> the new memory manager's lru list regardless we unlock the resv or not.
>>
>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> Interesting find, did you trigger that somehow or did you just stumbled
> over it by reading the code?
>
> Patch is Reviewed-by: Christian König <christian.koenig@amd.com>, I will
> pick that up for drm-misc-next.
>
> Thanks,
> Christian.
>
>> ---
>>    drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
>>    1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index f1367107925b..e307004f0b28 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -701,6 +701,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>        ret = ttm_bo_evict(bo, ctx);
>>        if (locked)
>>                ttm_bo_unreserve(bo);
>> +     else
>> +             ttm_bo_move_to_lru_tail_unlocked(bo);
>>
>>        ttm_bo_put(bo);
>>        return ret;
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f1367107925b..e307004f0b28 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -701,6 +701,8 @@  int ttm_mem_evict_first(struct ttm_device *bdev,
 	ret = ttm_bo_evict(bo, ctx);
 	if (locked)
 		ttm_bo_unreserve(bo);
+	else
+		ttm_bo_move_to_lru_tail_unlocked(bo);
 
 	ttm_bo_put(bo);
 	return ret;