Message ID | 20220603104604.456991-1-christian.koenig@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ttm: fix missing NULL check in ttm_device_swapout | expand |
On 2022-06-03 06:46, Christian König wrote: > Resources about to be destructed are not tied to BOs any more. I've been seeing a backtrace in that area with a patch series I'm working on, but didn't have enough time to track it down yet. I'll try if this patch fixes it. Regards, Felix > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c > index a0562ab386f5..e7147e304637 100644 > --- a/drivers/gpu/drm/ttm/ttm_device.c > +++ b/drivers/gpu/drm/ttm/ttm_device.c > @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, > > ttm_resource_manager_for_each_res(man, &cursor, res) { > struct ttm_buffer_object *bo = res->bo; > - uint32_t num_pages = PFN_UP(bo->base.size); > + uint32_t num_pages; > > + if (!bo) > + continue; > + > + num_pages = PFN_UP(bo->base.size); > ret = ttm_bo_swapout(bo, ctx, gfp_flags); > /* ttm_bo_swapout has dropped the lru_lock */ > if (!ret)
[+amd-gfx] On 2022-06-03 15:37, Felix Kuehling wrote: > > On 2022-06-03 06:46, Christian König wrote: >> Resources about to be destructed are not tied to BOs any more. > I've been seeing a backtrace in that area with a patch series I'm > working on, but didn't have enough time to track it down yet. I'll try > if this patch fixes it. The patch doesn't apply on amd-staging-drm-next. I made the following change instead, which fixes my problem (and I do see the pr_err being triggered): --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -157,6 +157,10 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, list_for_each_entry(bo, &man->lru[j], lru) { uint32_t num_pages = PFN_UP(bo->base.size); + if (!bo->resource) { + pr_err("### bo->resource is NULL\n"); + continue; + } ret = ttm_bo_swapout(bo, ctx, gfp_flags); /* ttm_bo_swapout has dropped the lru_lock */ if (!ret) > > Regards, > Felix > > >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >> b/drivers/gpu/drm/ttm/ttm_device.c >> index a0562ab386f5..e7147e304637 100644 >> --- a/drivers/gpu/drm/ttm/ttm_device.c >> +++ b/drivers/gpu/drm/ttm/ttm_device.c >> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, >> struct ttm_operation_ctx *ctx, >> ttm_resource_manager_for_each_res(man, &cursor, res) { >> struct ttm_buffer_object *bo = res->bo; >> - uint32_t num_pages = PFN_UP(bo->base.size); >> + uint32_t num_pages; >> + if (!bo) >> + continue; >> + >> + num_pages = PFN_UP(bo->base.size); >> ret = ttm_bo_swapout(bo, ctx, gfp_flags); >> /* ttm_bo_swapout has dropped the lru_lock */ >> if (!ret)
Am 04.06.22 um 00:44 schrieb Felix Kuehling: > [+amd-gfx] > > > On 2022-06-03 15:37, Felix Kuehling wrote: >> >> On 2022-06-03 06:46, Christian König wrote: >>> Resources about to be destructed are not tied to BOs any more. >> I've been seeing a backtrace in that area with a patch series I'm >> working on, but didn't have enough time to track it down yet. I'll >> try if this patch fixes it. > > The patch doesn't apply on amd-staging-drm-next. I made the following > change instead, which fixes my problem (and I do see the pr_err being > triggered): > > --- a/drivers/gpu/drm/ttm/ttm_device.c > +++ b/drivers/gpu/drm/ttm/ttm_device.c > @@ -157,6 +157,10 @@ int ttm_device_swapout(struct ttm_device *bdev, > struct ttm_operation_ctx *ctx, > list_for_each_entry(bo, &man->lru[j], lru) { > uint32_t num_pages = > PFN_UP(bo->base.size); > > + if (!bo->resource) { > + pr_err("### bo->resource is > NULL\n"); > + continue; > + } Yeah, that should be functional identical. Can I get an rb for that? Going to provide backports to older kernels as well then. Regards, Christian. > ret = ttm_bo_swapout(bo, ctx, gfp_flags); > /* ttm_bo_swapout has dropped the > lru_lock */ > if (!ret) > >> >> Regards, >> Felix >> >> >>> >>> Signed-off-by: Christian König <christian.koenig@amd.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >>> b/drivers/gpu/drm/ttm/ttm_device.c >>> index a0562ab386f5..e7147e304637 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_device.c >>> +++ b/drivers/gpu/drm/ttm/ttm_device.c >>> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, >>> struct ttm_operation_ctx *ctx, >>> ttm_resource_manager_for_each_res(man, &cursor, res) { >>> struct ttm_buffer_object *bo = res->bo; >>> - uint32_t num_pages = PFN_UP(bo->base.size); >>> + uint32_t num_pages; >>> + if (!bo) >>> + continue; >>> + >>> + num_pages = PFN_UP(bo->base.size); >>> ret = ttm_bo_swapout(bo, ctx, gfp_flags); >>> /* ttm_bo_swapout has dropped the lru_lock */ >>> if (!ret)
Am 2022-06-03 um 06:46 schrieb Christian König: > Resources about to be destructed are not tied to BOs any more. > > Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> > --- > drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c > index a0562ab386f5..e7147e304637 100644 > --- a/drivers/gpu/drm/ttm/ttm_device.c > +++ b/drivers/gpu/drm/ttm/ttm_device.c > @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, > > ttm_resource_manager_for_each_res(man, &cursor, res) { > struct ttm_buffer_object *bo = res->bo; > - uint32_t num_pages = PFN_UP(bo->base.size); > + uint32_t num_pages; > > + if (!bo) > + continue; > + > + num_pages = PFN_UP(bo->base.size); > ret = ttm_bo_swapout(bo, ctx, gfp_flags); > /* ttm_bo_swapout has dropped the lru_lock */ > if (!ret)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index a0562ab386f5..e7147e304637 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, ttm_resource_manager_for_each_res(man, &cursor, res) { struct ttm_buffer_object *bo = res->bo; - uint32_t num_pages = PFN_UP(bo->base.size); + uint32_t num_pages; + if (!bo) + continue; + + num_pages = PFN_UP(bo->base.size); ret = ttm_bo_swapout(bo, ctx, gfp_flags); /* ttm_bo_swapout has dropped the lru_lock */ if (!ret)
Resources about to be destructed are not tied to BOs any more. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)