diff mbox series

gpu: ttm: fix GPF in ttm_bo_release

Message ID 20210707185108.3798-1-paskripkin@gmail.com (mailing list archive)
State New, archived
Headers show
Series gpu: ttm: fix GPF in ttm_bo_release | expand

Commit Message

Pavel Skripkin July 7, 2021, 6:51 p.m. UTC
My local syzbot instance hit GPF in ttm_bo_release().
Unfortunately, syzbot didn't produce a reproducer for this, but I
found out possible scenario:

drm_gem_vram_create()            <-- drm_gem_vram_object kzalloced
				     (bo embedded in this object)
  ttm_bo_init()
    ttm_bo_init_reserved()
      ttm_resource_alloc()
        man->func->alloc()       <-- allocation failure
      ttm_bo_put()
	ttm_bo_release()
	  ttm_mem_io_free()      <-- bo->resource == NULL passed
				     as second argument
	     *GPF*

So, I've added check in ttm_bo_release() to avoid passing
NULL as second argument to ttm_mem_io_free().

Fail log:

KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 PID: 10419 Comm: syz-executor.3 Not tainted 5.13.0-rc7-next-20210625 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
RIP: 0010:ttm_mem_io_free+0x28/0x170 drivers/gpu/drm/ttm/ttm_bo_util.c:66
Code: b1 90 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 e8 cd 19 24 fd 4c 8d 6b 20 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 2a 01 00 00 4c 8b 63 20 31 ff 4c 89 e6 e8 00 1f
RSP: 0018:ffffc900141df968 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90010da0000
RDX: 0000000000000004 RSI: ffffffff84513ea3 RDI: ffff888041fbc010
RBP: ffff888041fbc010 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000020 R14: ffff88806b258800 R15: ffff88806b258a38
FS:  00007fa6e9845640(0000) GS:ffff88807ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fad61265e18 CR3: 000000005ad79000 CR4: 0000000000350ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ttm_bo_release+0xd94/0x10a0 drivers/gpu/drm/ttm/ttm_bo.c:422
 kref_put include/linux/kref.h:65 [inline]
 ttm_bo_put drivers/gpu/drm/ttm/ttm_bo.c:470 [inline]
 ttm_bo_init_reserved+0x7cb/0x960 drivers/gpu/drm/ttm/ttm_bo.c:1050
 ttm_bo_init+0x105/0x270 drivers/gpu/drm/ttm/ttm_bo.c:1074
 drm_gem_vram_create+0x332/0x4c0 drivers/gpu/drm/drm_gem_vram_helper.c:228

Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Pavel Skripkin July 8, 2021, 10:09 a.m. UTC | #1
On Thu, 8 Jul 2021 11:37:01 +0300
Pavel Skripkin <paskripkin@gmail.com> wrote:

> On Thu, 8 Jul 2021 08:49:48 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 07.07.21 um 20:51 schrieb Pavel Skripkin:
> > > My local syzbot instance hit GPF in ttm_bo_release().
> > > Unfortunately, syzbot didn't produce a reproducer for this, but I
> > > found out possible scenario:
> > >
> > > drm_gem_vram_create()            <-- drm_gem_vram_object kzalloced
> > > 				     (bo embedded in this object)
> > >    ttm_bo_init()
> > >      ttm_bo_init_reserved()
> > >        ttm_resource_alloc()
> > >          man->func->alloc()       <-- allocation failure
> > >        ttm_bo_put()
> > > 	ttm_bo_release()
> > > 	  ttm_mem_io_free()      <-- bo->resource == NULL passed
> > > 				     as second argument
> > > 	     *GPF*
> > >
> > > So, I've added check in ttm_bo_release() to avoid passing
> > > NULL as second argument to ttm_mem_io_free().
> 
> Hi, Christian!
> 
> Thank you for quick feedback :)
> 
> > 
> > There is another ocassion of this a bit down before we call 
> > ttm_bo_move_to_lru_tail() apart from that good catch.
> > 
> 
> Did you mean, that ttm_bo_move_to_lru_tail() should have NULL check
> too? I checked it's realization, and, I think, NULL check is necessary
> there, since mem pointer is dereferenced w/o any checking
> 
> > But I'm wondering if we should make the functions NULL save instead
> > of the external check.
> > 
> 
> I tried to find more possible scenarios of GPF in ttm_bo_release(),
> but I didn't find one. But, yes, moving NULL check inside
> ttm_mem_io_free() is more general approach and it will defend this
> function from GPFs in the future.
> 
> 
> 
> With regards,
> Pavel Skripkin
> 

I misclicked and sent this email to Christian privately :(

Added all thread participants back, sorry.



With regards,
Pavel Skripkin
Christian König July 8, 2021, 10:56 a.m. UTC | #2
Am 08.07.21 um 12:09 schrieb Pavel Skripkin:
> On Thu, 8 Jul 2021 11:37:01 +0300
> Pavel Skripkin <paskripkin@gmail.com> wrote:
>
>> On Thu, 8 Jul 2021 08:49:48 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 07.07.21 um 20:51 schrieb Pavel Skripkin:
>>>> My local syzbot instance hit GPF in ttm_bo_release().
>>>> Unfortunately, syzbot didn't produce a reproducer for this, but I
>>>> found out possible scenario:
>>>>
>>>> drm_gem_vram_create()            <-- drm_gem_vram_object kzalloced
>>>> 				     (bo embedded in this object)
>>>>     ttm_bo_init()
>>>>       ttm_bo_init_reserved()
>>>>         ttm_resource_alloc()
>>>>           man->func->alloc()       <-- allocation failure
>>>>         ttm_bo_put()
>>>> 	ttm_bo_release()
>>>> 	  ttm_mem_io_free()      <-- bo->resource == NULL passed
>>>> 				     as second argument
>>>> 	     *GPF*
>>>>
>>>> So, I've added check in ttm_bo_release() to avoid passing
>>>> NULL as second argument to ttm_mem_io_free().
>> Hi, Christian!
>>
>> Thank you for quick feedback :)
>>
>>> There is another ocassion of this a bit down before we call
>>> ttm_bo_move_to_lru_tail() apart from that good catch.
>>>
>> Did you mean, that ttm_bo_move_to_lru_tail() should have NULL check
>> too?

Yes, exactly that.

>>   I checked it's realization, and, I think, NULL check is necessary
>> there, since mem pointer is dereferenced w/o any checking
>>
>>> But I'm wondering if we should make the functions NULL save instead
>>> of the external check.
>>>
>> I tried to find more possible scenarios of GPF in ttm_bo_release(),
>> but I didn't find one. But, yes, moving NULL check inside
>> ttm_mem_io_free() is more general approach and it will defend this
>> function from GPFs in the future.
>>
>>
>>
>> With regards,
>> Pavel Skripkin
>>
> I misclicked and sent this email to Christian privately :(
>
> Added all thread participants back, sorry.

No problem.

Do you want to update your patch or should I take care of this?

Thanks,
Christian.

>
>
>
> With regards,
> Pavel Skripkin
Pavel Skripkin July 8, 2021, 11:04 a.m. UTC | #3
On Thu, 8 Jul 2021 12:56:19 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 08.07.21 um 12:09 schrieb Pavel Skripkin:
> > On Thu, 8 Jul 2021 11:37:01 +0300
> > Pavel Skripkin <paskripkin@gmail.com> wrote:
> >
> >> On Thu, 8 Jul 2021 08:49:48 +0200
> >> Christian König <christian.koenig@amd.com> wrote:
> >>
> >>> Am 07.07.21 um 20:51 schrieb Pavel Skripkin:
> >>>> My local syzbot instance hit GPF in ttm_bo_release().
> >>>> Unfortunately, syzbot didn't produce a reproducer for this, but I
> >>>> found out possible scenario:
> >>>>
> >>>> drm_gem_vram_create()            <-- drm_gem_vram_object
> >>>> kzalloced (bo embedded in this object)
> >>>>     ttm_bo_init()
> >>>>       ttm_bo_init_reserved()
> >>>>         ttm_resource_alloc()
> >>>>           man->func->alloc()       <-- allocation failure
> >>>>         ttm_bo_put()
> >>>> 	ttm_bo_release()
> >>>> 	  ttm_mem_io_free()      <-- bo->resource == NULL passed
> >>>> 				     as second argument
> >>>> 	     *GPF*
> >>>>
> >>>> So, I've added check in ttm_bo_release() to avoid passing
> >>>> NULL as second argument to ttm_mem_io_free().
> >> Hi, Christian!
> >>
> >> Thank you for quick feedback :)
> >>
> >>> There is another ocassion of this a bit down before we call
> >>> ttm_bo_move_to_lru_tail() apart from that good catch.
> >>>
> >> Did you mean, that ttm_bo_move_to_lru_tail() should have NULL check
> >> too?
> 
> Yes, exactly that.
> 
> >>   I checked it's realization, and, I think, NULL check is necessary
> >> there, since mem pointer is dereferenced w/o any checking
> >>
> >>> But I'm wondering if we should make the functions NULL save
> >>> instead of the external check.
> >>>
> >> I tried to find more possible scenarios of GPF in ttm_bo_release(),
> >> but I didn't find one. But, yes, moving NULL check inside
> >> ttm_mem_io_free() is more general approach and it will defend this
> >> function from GPFs in the future.
> >>
> >>
> >>
> >> With regards,
> >> Pavel Skripkin
> >>
> > I misclicked and sent this email to Christian privately :(
> >
> > Added all thread participants back, sorry.
> 
> No problem.
> 
> Do you want to update your patch or should I take care of this?
> 

Yes, I will send v2 soon. Thank you!




With regards,
Pavel Skripkin
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 1b950b45cf4b..15eb97459eab 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -419,7 +419,8 @@  static void ttm_bo_release(struct kref *kref)
 			bo->bdev->funcs->release_notify(bo);
 
 		drm_vma_offset_remove(bdev->vma_manager, &bo->base.vma_node);
-		ttm_mem_io_free(bdev, bo->resource);
+		if (bo->resource)
+			ttm_mem_io_free(bdev, bo->resource);
 	}
 
 	if (!dma_resv_test_signaled(bo->base.resv, true) ||