Message ID | 20220411085603.58156-1-matthew.auld@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup | expand |
Hi, Matthew On 4/11/22 10:56, Matthew Auld wrote: > If we hit the sync case, like when skipping clearing for kernel internal > objects, or when falling back to cpu clearing, like in i915, we end up > trying to add a NULL fence, but with some recent changes in this area > this now just results in NULL deref in dma_resv_add_fence: > > <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008 > <1>[ 5.466384] #PF: supervisor read access in kernel mode > <1>[ 5.466385] #PF: error_code(0x0000) - not-present page > <6>[ 5.466386] PGD 0 P4D 0 > <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1 > <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 > <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82 > <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 > <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff > <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087 > <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001 > <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000 > <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600 > <4>[ 5.466401] FS: 00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000 > <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0 > <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > <4>[ 5.466404] Call Trace: > <4>[ 5.466405] <TASK> > <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] > <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] > <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] > <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 > <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm] > <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] > <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] > <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] > <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] > <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] > <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] > <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] > <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] > <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] > <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 > <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915] > <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] > > In the ttm_bo_move_sync_cleanup() case it seems we only really care > about calling ttm_bo_wait_free_node(), so let's instead just call that > directly. > > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Lucas De Marchi <lucas.demarchi@intel.com> > Cc: Nirmoy Das <nirmoy.das@linux.intel.com> > --- > drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ > include/drm/ttm/ttm_bo_driver.h | 11 +++-------- > 2 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > index bc5190340b9c..1cbfb00c1d65 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > } > EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); > > +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > + struct ttm_resource *new_mem) > +{ > + struct ttm_device *bdev = bo->bdev; > + struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); > + int ret; > + > + ret = ttm_bo_wait_free_node(bo, man->use_tt); > + if (WARN_ON(ret)) > + return; > + > + ttm_bo_assign_mem(bo, new_mem); > +} > +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); I don't think this will help in the case where we call ttm_bo_move_accel_cleanup() with a NULL fence.... Perhaps we need to fix it there. /Thomas > + > /** > * ttm_bo_pipeline_gutting - purge the contents of a bo > * @bo: The buffer object > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 059a595e14e5..897b88f0bd59 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > struct ttm_resource *new_mem); > > /** > - * ttm_bo_move_accel_cleanup. > + * ttm_bo_move_sync_cleanup. > * > * @bo: A pointer to a struct ttm_buffer_object. > * @new_mem: struct ttm_resource indicating where to move. > @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed > * by the caller to be idle. Typically used after memcpy buffer moves. > */ > -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > - struct ttm_resource *new_mem) > -{ > - int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem); > - > - WARN_ON(ret); > -} > +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > + struct ttm_resource *new_mem); > > /** > * ttm_bo_pipeline_gutting.
On 11/04/2022 10:56, Thomas Hellström wrote: > Hi, Matthew > > On 4/11/22 10:56, Matthew Auld wrote: >> If we hit the sync case, like when skipping clearing for kernel internal >> objects, or when falling back to cpu clearing, like in i915, we end up >> trying to add a NULL fence, but with some recent changes in this area >> this now just results in NULL deref in dma_resv_add_fence: >> >> <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: >> 0000000000000008 >> <1>[ 5.466384] #PF: supervisor read access in kernel mode >> <1>[ 5.466385] #PF: error_code(0x0000) - not-present page >> <6>[ 5.466386] PGD 0 P4D 0 >> <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI >> <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted >> 5.18.0-rc2-CI-CI_DRM_11481+ #1 >> <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 >> <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 >> 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 >> 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 >> a5 0a 82 >> <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 >> <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: >> 00000000ffffffff >> <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: >> ffffffff8233f087 >> <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: >> 0000000000000001 >> <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: >> 0000000000000000 >> <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: >> ffff88810a88d600 >> <4>[ 5.466401] FS: 00007f5fa1193540(0000) >> GS:ffff88845d880000(0000) knlGS:0000000000000000 >> <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: >> 00000000003706e0 >> <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >> 0000000000000400 >> <4>[ 5.466404] Call Trace: >> <4>[ 5.466405] <TASK> >> <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] >> <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] >> <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] >> <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 >> <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm] >> <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] >> <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] >> <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] >> <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] >> <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] >> <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] >> <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] >> <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] >> <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] >> <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 >> <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 >> [i915] >> <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] >> >> In the ttm_bo_move_sync_cleanup() case it seems we only really care >> about calling ttm_bo_wait_free_node(), so let's instead just call that >> directly. >> >> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> >> Cc: Christian König <christian.koenig@amd.com> >> Cc: Lucas De Marchi <lucas.demarchi@intel.com> >> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> >> --- >> drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ >> include/drm/ttm/ttm_bo_driver.h | 11 +++-------- >> 2 files changed, 18 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c >> b/drivers/gpu/drm/ttm/ttm_bo_util.c >> index bc5190340b9c..1cbfb00c1d65 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c >> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct >> ttm_buffer_object *bo, >> } >> EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); >> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, >> + struct ttm_resource *new_mem) >> +{ >> + struct ttm_device *bdev = bo->bdev; >> + struct ttm_resource_manager *man = ttm_manager_type(bdev, >> new_mem->mem_type); >> + int ret; >> + >> + ret = ttm_bo_wait_free_node(bo, man->use_tt); >> + if (WARN_ON(ret)) >> + return; >> + >> + ttm_bo_assign_mem(bo, new_mem); >> +} >> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); > > I don't think this will help in the case where we call > ttm_bo_move_accel_cleanup() with a NULL fence.... Hmm, do you know if that case actually exists? I thought the only current user passing fence == NULL was ttm_bo_move_sync_cleanup(). > > Perhaps we need to fix it there. > > /Thomas > > > >> + >> /** >> * ttm_bo_pipeline_gutting - purge the contents of a bo >> * @bo: The buffer object >> diff --git a/include/drm/ttm/ttm_bo_driver.h >> b/include/drm/ttm/ttm_bo_driver.h >> index 059a595e14e5..897b88f0bd59 100644 >> --- a/include/drm/ttm/ttm_bo_driver.h >> +++ b/include/drm/ttm/ttm_bo_driver.h >> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct >> ttm_buffer_object *bo, >> struct ttm_resource *new_mem); >> /** >> - * ttm_bo_move_accel_cleanup. >> + * ttm_bo_move_sync_cleanup. >> * >> * @bo: A pointer to a struct ttm_buffer_object. >> * @new_mem: struct ttm_resource indicating where to move. >> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct >> ttm_buffer_object *bo, >> * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed >> * by the caller to be idle. Typically used after memcpy buffer moves. >> */ >> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object >> *bo, >> - struct ttm_resource *new_mem) >> -{ >> - int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem); >> - >> - WARN_ON(ret); >> -} >> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, >> + struct ttm_resource *new_mem); >> /** >> * ttm_bo_pipeline_gutting.
On 4/11/22 12:06, Matthew Auld wrote: > On 11/04/2022 10:56, Thomas Hellström wrote: >> Hi, Matthew >> >> On 4/11/22 10:56, Matthew Auld wrote: >>> If we hit the sync case, like when skipping clearing for kernel >>> internal >>> objects, or when falling back to cpu clearing, like in i915, we end up >>> trying to add a NULL fence, but with some recent changes in this area >>> this now just results in NULL deref in dma_resv_add_fence: >>> >>> <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: >>> 0000000000000008 >>> <1>[ 5.466384] #PF: supervisor read access in kernel mode >>> <1>[ 5.466385] #PF: error_code(0x0000) - not-present page >>> <6>[ 5.466386] PGD 0 P4D 0 >>> <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI >>> <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted >>> 5.18.0-rc2-CI-CI_DRM_11481+ #1 >>> <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 >>> <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 >>> 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 >>> 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d >>> 60 a5 0a 82 >>> <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 >>> <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: >>> 00000000ffffffff >>> <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: >>> ffffffff8233f087 >>> <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: >>> 0000000000000001 >>> <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: >>> 0000000000000000 >>> <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: >>> ffff88810a88d600 >>> <4>[ 5.466401] FS: 00007f5fa1193540(0000) >>> GS:ffff88845d880000(0000) knlGS:0000000000000000 >>> <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: >>> 00000000003706e0 >>> <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>> 0000000000000400 >>> <4>[ 5.466404] Call Trace: >>> <4>[ 5.466405] <TASK> >>> <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] >>> <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] >>> <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] >>> <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 >>> <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 >>> [ttm] >>> <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] >>> <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] >>> <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] >>> <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] >>> <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] >>> <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] >>> <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] >>> <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] >>> <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] >>> <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 >>> <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 >>> [i915] >>> <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] >>> >>> In the ttm_bo_move_sync_cleanup() case it seems we only really care >>> about calling ttm_bo_wait_free_node(), so let's instead just call that >>> directly. >>> >>> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> >>> Cc: Christian König <christian.koenig@amd.com> >>> Cc: Lucas De Marchi <lucas.demarchi@intel.com> >>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ >>> include/drm/ttm/ttm_bo_driver.h | 11 +++-------- >>> 2 files changed, 18 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c >>> b/drivers/gpu/drm/ttm/ttm_bo_util.c >>> index bc5190340b9c..1cbfb00c1d65 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c >>> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct >>> ttm_buffer_object *bo, >>> } >>> EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); >>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, >>> + struct ttm_resource *new_mem) >>> +{ >>> + struct ttm_device *bdev = bo->bdev; >>> + struct ttm_resource_manager *man = ttm_manager_type(bdev, >>> new_mem->mem_type); >>> + int ret; >>> + >>> + ret = ttm_bo_wait_free_node(bo, man->use_tt); >>> + if (WARN_ON(ret)) >>> + return; >>> + >>> + ttm_bo_assign_mem(bo, new_mem); >>> +} >>> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); >> >> I don't think this will help in the case where we call >> ttm_bo_move_accel_cleanup() with a NULL fence.... > > Hmm, do you know if that case actually exists? I thought the only > current user passing fence == NULL was ttm_bo_move_sync_cleanup(). Yes, indeed, you're right. Not sure if any other driver hits that, though. /Thomas
Am 11.04.22 um 10:56 schrieb Matthew Auld: > If we hit the sync case, like when skipping clearing for kernel internal > objects, or when falling back to cpu clearing, like in i915, we end up > trying to add a NULL fence, but with some recent changes in this area > this now just results in NULL deref in dma_resv_add_fence: > > <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008 > <1>[ 5.466384] #PF: supervisor read access in kernel mode > <1>[ 5.466385] #PF: error_code(0x0000) - not-present page > <6>[ 5.466386] PGD 0 P4D 0 > <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1 > <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 > <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82 > <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 > <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff > <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087 > <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001 > <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000 > <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600 > <4>[ 5.466401] FS: 00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000 > <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0 > <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > <4>[ 5.466404] Call Trace: > <4>[ 5.466405] <TASK> > <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] > <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] > <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] > <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 > <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm] > <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] > <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] > <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] > <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] > <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] > <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] > <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] > <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] > <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] > <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 > <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915] > <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] > > In the ttm_bo_move_sync_cleanup() case it seems we only really care > about calling ttm_bo_wait_free_node(), so let's instead just call that > directly. > > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Lucas De Marchi <lucas.demarchi@intel.com> > Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Ideally we wouldn't export that to drivers, but that's a different problem. Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ > include/drm/ttm/ttm_bo_driver.h | 11 +++-------- > 2 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > index bc5190340b9c..1cbfb00c1d65 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > } > EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); > > +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > + struct ttm_resource *new_mem) > +{ > + struct ttm_device *bdev = bo->bdev; > + struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); > + int ret; > + > + ret = ttm_bo_wait_free_node(bo, man->use_tt); > + if (WARN_ON(ret)) > + return; > + > + ttm_bo_assign_mem(bo, new_mem); > +} > +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); > + > /** > * ttm_bo_pipeline_gutting - purge the contents of a bo > * @bo: The buffer object > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 059a595e14e5..897b88f0bd59 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > struct ttm_resource *new_mem); > > /** > - * ttm_bo_move_accel_cleanup. > + * ttm_bo_move_sync_cleanup. > * > * @bo: A pointer to a struct ttm_buffer_object. > * @new_mem: struct ttm_resource indicating where to move. > @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed > * by the caller to be idle. Typically used after memcpy buffer moves. > */ > -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > - struct ttm_resource *new_mem) > -{ > - int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem); > - > - WARN_ON(ret); > -} > +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > + struct ttm_resource *new_mem); > > /** > * ttm_bo_pipeline_gutting.
Am 11.04.22 um 12:45 schrieb Thomas Hellström: > > On 4/11/22 12:06, Matthew Auld wrote: >> On 11/04/2022 10:56, Thomas Hellström wrote: >>> Hi, Matthew >>> >>> On 4/11/22 10:56, Matthew Auld wrote: >>>> If we hit the sync case, like when skipping clearing for kernel >>>> internal >>>> objects, or when falling back to cpu clearing, like in i915, we end up >>>> trying to add a NULL fence, but with some recent changes in this area >>>> this now just results in NULL deref in dma_resv_add_fence: >>>> >>>> <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: >>>> 0000000000000008 >>>> <1>[ 5.466384] #PF: supervisor read access in kernel mode >>>> <1>[ 5.466385] #PF: error_code(0x0000) - not-present page >>>> <6>[ 5.466386] PGD 0 P4D 0 >>>> <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI >>>> <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted >>>> 5.18.0-rc2-CI-CI_DRM_11481+ #1 >>>> <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 >>>> <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 >>>> 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 >>>> 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 >>>> 48 3d 60 a5 0a 82 >>>> <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 >>>> <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: >>>> 00000000ffffffff >>>> <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: >>>> ffffffff8233f087 >>>> <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: >>>> 0000000000000001 >>>> <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: >>>> 0000000000000000 >>>> <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: >>>> ffff88810a88d600 >>>> <4>[ 5.466401] FS: 00007f5fa1193540(0000) >>>> GS:ffff88845d880000(0000) knlGS:0000000000000000 >>>> <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: >>>> 00000000003706e0 >>>> <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>> 0000000000000000 >>>> <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>> 0000000000000400 >>>> <4>[ 5.466404] Call Trace: >>>> <4>[ 5.466405] <TASK> >>>> <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] >>>> <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] >>>> <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] >>>> <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 >>>> <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 >>>> [ttm] >>>> <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] >>>> <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] >>>> <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] >>>> <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] >>>> <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] >>>> <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] >>>> <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] >>>> <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] >>>> <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] >>>> <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 >>>> <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 >>>> [i915] >>>> <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] >>>> >>>> In the ttm_bo_move_sync_cleanup() case it seems we only really care >>>> about calling ttm_bo_wait_free_node(), so let's instead just call that >>>> directly. >>>> >>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> >>>> Cc: Christian König <christian.koenig@amd.com> >>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com> >>>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> >>>> --- >>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ >>>> include/drm/ttm/ttm_bo_driver.h | 11 +++-------- >>>> 2 files changed, 18 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c >>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c >>>> index bc5190340b9c..1cbfb00c1d65 100644 >>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c >>>> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct >>>> ttm_buffer_object *bo, >>>> } >>>> EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); >>>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, >>>> + struct ttm_resource *new_mem) >>>> +{ >>>> + struct ttm_device *bdev = bo->bdev; >>>> + struct ttm_resource_manager *man = ttm_manager_type(bdev, >>>> new_mem->mem_type); >>>> + int ret; >>>> + >>>> + ret = ttm_bo_wait_free_node(bo, man->use_tt); >>>> + if (WARN_ON(ret)) >>>> + return; >>>> + >>>> + ttm_bo_assign_mem(bo, new_mem); >>>> +} >>>> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); >>> >>> I don't think this will help in the case where we call >>> ttm_bo_move_accel_cleanup() with a NULL fence.... >> >> Hmm, do you know if that case actually exists? I thought the only >> current user passing fence == NULL was ttm_bo_move_sync_cleanup(). > > > Yes, indeed, you're right. Not sure if any other driver hits that, > though. At least when I came up with that ~halve a year ago I double checked all drivers to not call ttm_bo_move_accel_cleanup() without a fence. But I also somehow missed ttm_bo_move_sync_cleanup(), so take that with a grain of salt. Christian. > > /Thomas > >
Am 11.04.22 um 10:56 schrieb Matthew Auld: > If we hit the sync case, like when skipping clearing for kernel internal > objects, or when falling back to cpu clearing, like in i915, we end up > trying to add a NULL fence, but with some recent changes in this area > this now just results in NULL deref in dma_resv_add_fence: > > <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008 > <1>[ 5.466384] #PF: supervisor read access in kernel mode > <1>[ 5.466385] #PF: error_code(0x0000) - not-present page > <6>[ 5.466386] PGD 0 P4D 0 > <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1 > <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 > <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82 > <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 > <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff > <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087 > <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001 > <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000 > <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600 > <4>[ 5.466401] FS: 00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000 > <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0 > <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > <4>[ 5.466404] Call Trace: > <4>[ 5.466405] <TASK> > <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] > <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] > <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] > <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 > <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm] > <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] > <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] > <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] > <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] > <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] > <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] > <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] > <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] > <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] > <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 > <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915] > <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] > > In the ttm_bo_move_sync_cleanup() case it seems we only really care > about calling ttm_bo_wait_free_node(), so let's instead just call that > directly. > > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Lucas De Marchi <lucas.demarchi@intel.com> > Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> with bochs on ppc64le. > --- > drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ > include/drm/ttm/ttm_bo_driver.h | 11 +++-------- > 2 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > index bc5190340b9c..1cbfb00c1d65 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > } > EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); > > +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > + struct ttm_resource *new_mem) > +{ > + struct ttm_device *bdev = bo->bdev; > + struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); > + int ret; > + > + ret = ttm_bo_wait_free_node(bo, man->use_tt); > + if (WARN_ON(ret)) > + return; > + > + ttm_bo_assign_mem(bo, new_mem); > +} > +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); > + > /** > * ttm_bo_pipeline_gutting - purge the contents of a bo > * @bo: The buffer object > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 059a595e14e5..897b88f0bd59 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > struct ttm_resource *new_mem); > > /** > - * ttm_bo_move_accel_cleanup. > + * ttm_bo_move_sync_cleanup. > * > * @bo: A pointer to a struct ttm_buffer_object. > * @new_mem: struct ttm_resource indicating where to move. > @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed > * by the caller to be idle. Typically used after memcpy buffer moves. > */ > -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > - struct ttm_resource *new_mem) > -{ > - int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem); > - > - WARN_ON(ret); > -} > +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, > + struct ttm_resource *new_mem); > > /** > * ttm_bo_pipeline_gutting.
On 11/04/2022 13:39, Christian König wrote: > Am 11.04.22 um 10:56 schrieb Matthew Auld: >> If we hit the sync case, like when skipping clearing for kernel internal >> objects, or when falling back to cpu clearing, like in i915, we end up >> trying to add a NULL fence, but with some recent changes in this area >> this now just results in NULL deref in dma_resv_add_fence: >> >> <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: >> 0000000000000008 >> <1>[ 5.466384] #PF: supervisor read access in kernel mode >> <1>[ 5.466385] #PF: error_code(0x0000) - not-present page >> <6>[ 5.466386] PGD 0 P4D 0 >> <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI >> <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted >> 5.18.0-rc2-CI-CI_DRM_11481+ #1 >> <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 >> <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 >> 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 >> 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 >> a5 0a 82 >> <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 >> <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: >> 00000000ffffffff >> <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: >> ffffffff8233f087 >> <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: >> 0000000000000001 >> <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: >> 0000000000000000 >> <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: >> ffff88810a88d600 >> <4>[ 5.466401] FS: 00007f5fa1193540(0000) >> GS:ffff88845d880000(0000) knlGS:0000000000000000 >> <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: >> 00000000003706e0 >> <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >> 0000000000000400 >> <4>[ 5.466404] Call Trace: >> <4>[ 5.466405] <TASK> >> <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] >> <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] >> <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] >> <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 >> <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm] >> <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] >> <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] >> <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] >> <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] >> <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] >> <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] >> <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] >> <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] >> <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] >> <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 >> <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 >> [i915] >> <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] >> >> In the ttm_bo_move_sync_cleanup() case it seems we only really care >> about calling ttm_bo_wait_free_node(), so let's instead just call that >> directly. >> >> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> >> Cc: Christian König <christian.koenig@amd.com> >> Cc: Lucas De Marchi <lucas.demarchi@intel.com> >> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> > > Ideally we wouldn't export that to drivers, but that's a different problem. > > Reviewed-by: Christian König <christian.koenig@amd.com> Thanks. Would you be able to merge this? > >> --- >> drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ >> include/drm/ttm/ttm_bo_driver.h | 11 +++-------- >> 2 files changed, 18 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c >> b/drivers/gpu/drm/ttm/ttm_bo_util.c >> index bc5190340b9c..1cbfb00c1d65 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c >> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct >> ttm_buffer_object *bo, >> } >> EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); >> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, >> + struct ttm_resource *new_mem) >> +{ >> + struct ttm_device *bdev = bo->bdev; >> + struct ttm_resource_manager *man = ttm_manager_type(bdev, >> new_mem->mem_type); >> + int ret; >> + >> + ret = ttm_bo_wait_free_node(bo, man->use_tt); >> + if (WARN_ON(ret)) >> + return; >> + >> + ttm_bo_assign_mem(bo, new_mem); >> +} >> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); >> + >> /** >> * ttm_bo_pipeline_gutting - purge the contents of a bo >> * @bo: The buffer object >> diff --git a/include/drm/ttm/ttm_bo_driver.h >> b/include/drm/ttm/ttm_bo_driver.h >> index 059a595e14e5..897b88f0bd59 100644 >> --- a/include/drm/ttm/ttm_bo_driver.h >> +++ b/include/drm/ttm/ttm_bo_driver.h >> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct >> ttm_buffer_object *bo, >> struct ttm_resource *new_mem); >> /** >> - * ttm_bo_move_accel_cleanup. >> + * ttm_bo_move_sync_cleanup. >> * >> * @bo: A pointer to a struct ttm_buffer_object. >> * @new_mem: struct ttm_resource indicating where to move. >> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct >> ttm_buffer_object *bo, >> * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed >> * by the caller to be idle. Typically used after memcpy buffer moves. >> */ >> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object >> *bo, >> - struct ttm_resource *new_mem) >> -{ >> - int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem); >> - >> - WARN_ON(ret); >> -} >> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, >> + struct ttm_resource *new_mem); >> /** >> * ttm_bo_pipeline_gutting. >
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index bc5190340b9c..1cbfb00c1d65 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, } EXPORT_SYMBOL(ttm_bo_move_accel_cleanup); +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, + struct ttm_resource *new_mem) +{ + struct ttm_device *bdev = bo->bdev; + struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); + int ret; + + ret = ttm_bo_wait_free_node(bo, man->use_tt); + if (WARN_ON(ret)) + return; + + ttm_bo_assign_mem(bo, new_mem); +} +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup); + /** * ttm_bo_pipeline_gutting - purge the contents of a bo * @bo: The buffer object diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 059a595e14e5..897b88f0bd59 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, struct ttm_resource *new_mem); /** - * ttm_bo_move_accel_cleanup. + * ttm_bo_move_sync_cleanup. * * @bo: A pointer to a struct ttm_buffer_object. * @new_mem: struct ttm_resource indicating where to move. @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed * by the caller to be idle. Typically used after memcpy buffer moves. */ -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, - struct ttm_resource *new_mem) -{ - int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem); - - WARN_ON(ret); -} +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo, + struct ttm_resource *new_mem); /** * ttm_bo_pipeline_gutting.
If we hit the sync case, like when skipping clearing for kernel internal objects, or when falling back to cpu clearing, like in i915, we end up trying to add a NULL fence, but with some recent changes in this area this now just results in NULL deref in dma_resv_add_fence: <1>[ 5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008 <1>[ 5.466384] #PF: supervisor read access in kernel mode <1>[ 5.466385] #PF: error_code(0x0000) - not-present page <6>[ 5.466386] PGD 0 P4D 0 <4>[ 5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI <4>[ 5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1 <4>[ 5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260 <4>[ 5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82 <4>[ 5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202 <4>[ 5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff <4>[ 5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087 <4>[ 5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001 <4>[ 5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000 <4>[ 5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600 <4>[ 5.466401] FS: 00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000 <4>[ 5.466402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0 <4>[ 5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[ 5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4>[ 5.466404] Call Trace: <4>[ 5.466405] <TASK> <4>[ 5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm] <4>[ 5.466411] ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915] <4>[ 5.466529] i915_ttm_move+0xfd/0x430 [i915] <4>[ 5.466833] ? dma_resv_reserve_fences+0x4e/0x320 <4>[ 5.466836] ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm] <4>[ 5.466841] ttm_bo_handle_move_mem+0xa1/0x140 [ttm] <4>[ 5.466845] ttm_bo_validate+0xee/0x160 [ttm] <4>[ 5.466849] __i915_ttm_get_pages+0x4f/0x210 [i915] <4>[ 5.466976] i915_ttm_get_pages+0xad/0x140 [i915] <4>[ 5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915] <4>[ 5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915] <4>[ 5.467323] i915_vma_get_pages+0x114/0x1d0 [i915] <4>[ 5.467446] i915_vma_pin_ww+0xd3/0xa90 [i915] <4>[ 5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915] <4>[ 5.467700] ? __mutex_unlock_slowpath+0x3e/0x2b0 <4>[ 5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915] <4>[ 5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915] In the ttm_bo_move_sync_cleanup() case it seems we only really care about calling ttm_bo_wait_free_node(), so let's instead just call that directly. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> --- drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++ include/drm/ttm/ttm_bo_driver.h | 11 +++-------- 2 files changed, 18 insertions(+), 8 deletions(-)