diff mbox series

[v2] drm/ttm: update bulk move object of ghost BO

Message ID 20220906084619.2545456-1-zhenguo.yin@amd.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/ttm: update bulk move object of ghost BO | expand

Commit Message

ZhenGuo Yin Sept. 6, 2022, 8:46 a.m. UTC
[Why]
Ghost BO is released with non-empty bulk move object. There is a
warning trace:
WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
Call Trace:
  amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
  amdttm_bo_put+0x28/0x30 [amdttm]
  amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
  amdgpu_bo_move+0x1a8/0x770 [amdgpu]
  ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
  amdttm_bo_validate+0xbf/0x100 [amdttm]

[How]
The resource of ghost BO should be moved to LRU directly, instead of
using bulk move. The bulk move object of ghost BO should set to NULL
before function ttm_bo_move_to_lru_tail_unlocked.

v2: set bulk move to NULL manually if no resource associated with ghost BO

Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Christian König Sept. 6, 2022, 8:53 a.m. UTC | #1
Am 06.09.22 um 10:46 schrieb ZhenGuo Yin:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
> Call Trace:
>    amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>    amdttm_bo_put+0x28/0x30 [amdttm]
>    amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>    amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>    ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>    amdttm_bo_validate+0xbf/0x100 [amdttm]
>
> [How]
> The resource of ghost BO should be moved to LRU directly, instead of
> using bulk move. The bulk move object of ghost BO should set to NULL
> before function ttm_bo_move_to_lru_tail_unlocked.
>
> v2: set bulk move to NULL manually if no resource associated with ghost BO
>
> Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2")
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Going to push that to drm-misc-fixes in a minute.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 1cbfb00c1d65..57a27847206f 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>   	if (fbo->base.resource) {
>   		ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>   		bo->resource = NULL;
> +		ttm_bo_set_bulk_move(&fbo->base, NULL);
> +	} else {
> +		fbo->base.bulk_move = NULL;
>   	}
>   
>   	dma_resv_init(&fbo->base.base._resv);
Matthew Auld Sept. 7, 2022, 9:21 a.m. UTC | #2
On Tue, 6 Sept 2022 at 09:54, Christian König <christian.koenig@amd.com> wrote:
>
> Am 06.09.22 um 10:46 schrieb ZhenGuo Yin:
> > [Why]
> > Ghost BO is released with non-empty bulk move object. There is a
> > warning trace:
> > WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
> > Call Trace:
> >    amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
> >    amdttm_bo_put+0x28/0x30 [amdttm]
> >    amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
> >    amdgpu_bo_move+0x1a8/0x770 [amdgpu]
> >    ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
> >    amdttm_bo_validate+0xbf/0x100 [amdttm]
> >
> > [How]
> > The resource of ghost BO should be moved to LRU directly, instead of
> > using bulk move. The bulk move object of ghost BO should set to NULL
> > before function ttm_bo_move_to_lru_tail_unlocked.
> >
> > v2: set bulk move to NULL manually if no resource associated with ghost BO
> >
> > Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2")
> > Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Going to push that to drm-misc-fixes in a minute.
>
> Thanks,
> Christian.
>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > index 1cbfb00c1d65..57a27847206f 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
> >       if (fbo->base.resource) {
> >               ttm_resource_set_bo(fbo->base.resource, &fbo->base);
> >               bo->resource = NULL;
> > +             ttm_bo_set_bulk_move(&fbo->base, NULL);

This appears to blow up quite badly in i915. See here for an example trace:
https://gitlab.freedesktop.org/drm/intel/-/issues/6744

Do you know if amdgpu is also hitting this, or is this somehow i915 specific?

> > +     } else {
> > +             fbo->base.bulk_move = NULL;
> >       }
> >
> >       dma_resv_init(&fbo->base.base._resv);
>
Christian König Sept. 7, 2022, 9:33 a.m. UTC | #3
Am 07.09.22 um 11:21 schrieb Matthew Auld:
> On Tue, 6 Sept 2022 at 09:54, Christian König <christian.koenig@amd.com> wrote:
>> Am 06.09.22 um 10:46 schrieb ZhenGuo Yin:
>>> [Why]
>>> Ghost BO is released with non-empty bulk move object. There is a
>>> warning trace:
>>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
>>> Call Trace:
>>>     amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
>>>     amdttm_bo_put+0x28/0x30 [amdttm]
>>>     amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
>>>     amdgpu_bo_move+0x1a8/0x770 [amdgpu]
>>>     ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
>>>     amdttm_bo_validate+0xbf/0x100 [amdttm]
>>>
>>> [How]
>>> The resource of ghost BO should be moved to LRU directly, instead of
>>> using bulk move. The bulk move object of ghost BO should set to NULL
>>> before function ttm_bo_move_to_lru_tail_unlocked.
>>>
>>> v2: set bulk move to NULL manually if no resource associated with ghost BO
>>>
>>> Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2")
>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Going to push that to drm-misc-fixes in a minute.
>>
>> Thanks,
>> Christian.
>>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index 1cbfb00c1d65..57a27847206f 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>>>        if (fbo->base.resource) {
>>>                ttm_resource_set_bo(fbo->base.resource, &fbo->base);
>>>                bo->resource = NULL;
>>> +             ttm_bo_set_bulk_move(&fbo->base, NULL);
> This appears to blow up quite badly in i915. See here for an example trace:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Fintel%2F-%2Fissues%2F6744&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C2020e04c603d4641d05308da90b25e1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637981393013966600%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=iY%2FNdAihQFpOhgP0gcLCpYtStUd1XS%2BPP46DFVPQhSk%3D&amp;reserved=0
>
> Do you know if amdgpu is also hitting this, or is this somehow i915 specific?

At least a quick test on amdgpu worked fine, but that was without 
lockdep enabled.

I think I see the problem. The move of the resource and removal of the 
bulk_move must come after the dma_resv_trylock() or otherwise the 
dma_resv object isn't locked.

Going to provide a patch.

Christian.

>
>>> +     } else {
>>> +             fbo->base.bulk_move = NULL;
>>>        }
>>>
>>>        dma_resv_init(&fbo->base.base._resv);
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1cbfb00c1d65..57a27847206f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -239,6 +239,9 @@  static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	if (fbo->base.resource) {
 		ttm_resource_set_bo(fbo->base.resource, &fbo->base);
 		bo->resource = NULL;
+		ttm_bo_set_bulk_move(&fbo->base, NULL);
+	} else {
+		fbo->base.bulk_move = NULL;
 	}
 
 	dma_resv_init(&fbo->base.base._resv);