Message ID | 20220906084619.2545456-1-zhenguo.yin@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] drm/ttm: update bulk move object of ghost BO | expand |
Am 06.09.22 um 10:46 schrieb ZhenGuo Yin: > [Why] > Ghost BO is released with non-empty bulk move object. There is a > warning trace: > WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm] > Call Trace: > amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl] > amdttm_bo_put+0x28/0x30 [amdttm] > amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm] > amdgpu_bo_move+0x1a8/0x770 [amdgpu] > ttm_bo_handle_move_mem+0xb0/0x140 [amdttm] > amdttm_bo_validate+0xbf/0x100 [amdttm] > > [How] > The resource of ghost BO should be moved to LRU directly, instead of > using bulk move. The bulk move object of ghost BO should set to NULL > before function ttm_bo_move_to_lru_tail_unlocked. > > v2: set bulk move to NULL manually if no resource associated with ghost BO > > Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2") > Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Going to push that to drm-misc-fixes in a minute. Thanks, Christian. > --- > drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > index 1cbfb00c1d65..57a27847206f 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, > if (fbo->base.resource) { > ttm_resource_set_bo(fbo->base.resource, &fbo->base); > bo->resource = NULL; > + ttm_bo_set_bulk_move(&fbo->base, NULL); > + } else { > + fbo->base.bulk_move = NULL; > } > > dma_resv_init(&fbo->base.base._resv);
On Tue, 6 Sept 2022 at 09:54, Christian König <christian.koenig@amd.com> wrote: > > Am 06.09.22 um 10:46 schrieb ZhenGuo Yin: > > [Why] > > Ghost BO is released with non-empty bulk move object. There is a > > warning trace: > > WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm] > > Call Trace: > > amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl] > > amdttm_bo_put+0x28/0x30 [amdttm] > > amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm] > > amdgpu_bo_move+0x1a8/0x770 [amdgpu] > > ttm_bo_handle_move_mem+0xb0/0x140 [amdttm] > > amdttm_bo_validate+0xbf/0x100 [amdttm] > > > > [How] > > The resource of ghost BO should be moved to LRU directly, instead of > > using bulk move. The bulk move object of ghost BO should set to NULL > > before function ttm_bo_move_to_lru_tail_unlocked. > > > > v2: set bulk move to NULL manually if no resource associated with ghost BO > > > > Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2") > > Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> > > Reviewed-by: Christian König <christian.koenig@amd.com> > > Going to push that to drm-misc-fixes in a minute. > > Thanks, > Christian. > > > --- > > drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > > index 1cbfb00c1d65..57a27847206f 100644 > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > > @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, > > if (fbo->base.resource) { > > ttm_resource_set_bo(fbo->base.resource, &fbo->base); > > bo->resource = NULL; > > + ttm_bo_set_bulk_move(&fbo->base, NULL); This appears to blow up quite badly in i915. See here for an example trace: https://gitlab.freedesktop.org/drm/intel/-/issues/6744 Do you know if amdgpu is also hitting this, or is this somehow i915 specific? > > + } else { > > + fbo->base.bulk_move = NULL; > > } > > > > dma_resv_init(&fbo->base.base._resv); >
Am 07.09.22 um 11:21 schrieb Matthew Auld: > On Tue, 6 Sept 2022 at 09:54, Christian König <christian.koenig@amd.com> wrote: >> Am 06.09.22 um 10:46 schrieb ZhenGuo Yin: >>> [Why] >>> Ghost BO is released with non-empty bulk move object. There is a >>> warning trace: >>> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm] >>> Call Trace: >>> amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl] >>> amdttm_bo_put+0x28/0x30 [amdttm] >>> amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm] >>> amdgpu_bo_move+0x1a8/0x770 [amdgpu] >>> ttm_bo_handle_move_mem+0xb0/0x140 [amdttm] >>> amdttm_bo_validate+0xbf/0x100 [amdttm] >>> >>> [How] >>> The resource of ghost BO should be moved to LRU directly, instead of >>> using bulk move. The bulk move object of ghost BO should set to NULL >>> before function ttm_bo_move_to_lru_tail_unlocked. >>> >>> v2: set bulk move to NULL manually if no resource associated with ghost BO >>> >>> Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2") >>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> >> Reviewed-by: Christian König <christian.koenig@amd.com> >> >> Going to push that to drm-misc-fixes in a minute. >> >> Thanks, >> Christian. >> >>> --- >>> drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c >>> index 1cbfb00c1d65..57a27847206f 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c >>> @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, >>> if (fbo->base.resource) { >>> ttm_resource_set_bo(fbo->base.resource, &fbo->base); >>> bo->resource = NULL; >>> + ttm_bo_set_bulk_move(&fbo->base, NULL); > This appears to blow up quite badly in i915. See here for an example trace: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Fintel%2F-%2Fissues%2F6744&data=05%7C01%7Cchristian.koenig%40amd.com%7C2020e04c603d4641d05308da90b25e1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637981393013966600%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iY%2FNdAihQFpOhgP0gcLCpYtStUd1XS%2BPP46DFVPQhSk%3D&reserved=0 > > Do you know if amdgpu is also hitting this, or is this somehow i915 specific? At least a quick test on amdgpu worked fine, but that was without lockdep enabled. I think I see the problem. The move of the resource and removal of the bulk_move must come after the dma_resv_trylock() or otherwise the dma_resv object isn't locked. Going to provide a patch. Christian. > >>> + } else { >>> + fbo->base.bulk_move = NULL; >>> } >>> >>> dma_resv_init(&fbo->base.base._resv);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 1cbfb00c1d65..57a27847206f 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, if (fbo->base.resource) { ttm_resource_set_bo(fbo->base.resource, &fbo->base); bo->resource = NULL; + ttm_bo_set_bulk_move(&fbo->base, NULL); + } else { + fbo->base.bulk_move = NULL; } dma_resv_init(&fbo->base.base._resv);
[Why] Ghost BO is released with non-empty bulk move object. There is a warning trace: WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm] Call Trace: amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl] amdttm_bo_put+0x28/0x30 [amdttm] amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm] amdgpu_bo_move+0x1a8/0x770 [amdgpu] ttm_bo_handle_move_mem+0xb0/0x140 [amdttm] amdttm_bo_validate+0xbf/0x100 [amdttm] [How] The resource of ghost BO should be moved to LRU directly, instead of using bulk move. The bulk move object of ghost BO should set to NULL before function ttm_bo_move_to_lru_tail_unlocked. v2: set bulk move to NULL manually if no resource associated with ghost BO Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2") Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> --- drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++ 1 file changed, 3 insertions(+)