Message ID | 20210106092900.26595-1-aik@ozlabs.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,kernel] block: initialize block_device::bd_bdi for bdev_cache | expand |
On Wed 06-01-21 20:29:00, Alexey Kardashevskiy wrote: > This is a workaround to fix a null derefence crash: > > [c00000000b01f840] c00000000b01f880 (unreliable) > [c00000000b01f880] c000000000769a3c bdev_evict_inode+0x21c/0x370 > [c00000000b01f8c0] c00000000070bacc evict+0x11c/0x230 > [c00000000b01f900] c00000000070c138 iput+0x2a8/0x4a0 > [c00000000b01f970] c0000000006ff030 dentry_unlink_inode+0x220/0x250 > [c00000000b01f9b0] c0000000007001c0 __dentry_kill+0x190/0x320 > [c00000000b01fa00] c000000000701fb8 dput+0x5e8/0x860 > [c00000000b01fa80] c000000000705848 shrink_dcache_for_umount+0x58/0x100 > [c00000000b01fb00] c0000000006cf864 generic_shutdown_super+0x54/0x200 > [c00000000b01fb80] c0000000006cfd48 kill_anon_super+0x38/0x60 > [c00000000b01fbc0] c0000000006d12cc deactivate_locked_super+0xbc/0x110 > [c00000000b01fbf0] c0000000006d13bc deactivate_super+0x9c/0xc0 > [c00000000b01fc20] c00000000071a340 cleanup_mnt+0x1b0/0x250 > [c00000000b01fc80] c000000000278fa8 task_work_run+0xf8/0x180 > [c00000000b01fcd0] c00000000002b4ac do_notify_resume+0x4dc/0x5d0 > [c00000000b01fda0] c00000000004ba0c syscall_exit_prepare+0x28c/0x370 > [c00000000b01fe10] c00000000000e06c system_call_common+0xfc/0x27c > --- Exception: c00 (System Call) at 0000000010034890 > > Is this fixed properly already somewhere? Thanks, > > Fixes: e6cb53827ed6 ("block: initialize struct block_device in bdev_alloc") I don't think it's fixed anywhere and I've seen the syzbot report and I was wondering how this can happen when bdev_alloc() initializes bdev->bd_bdi and it also wasn't clear to me whether bd_bdi is really the only field that is problematic - if we can get to bdev_evict_inode() without going through bdev_alloc(), we are probably missing initialization of other fields in that place as well... But now I've realized that probably the inode is a root inode for bdev superblock which is allocated by VFS through new_inode() and thus doesn't undergo the initialization in bdev_alloc(). And AFAICT the root inode on bdev superblock can get only to bdev_evict_inode() and bdev_free_inode(). Looking at bdev_evict_inode() the only thing that's used there from struct block_device is really bd_bdi. bdev_free_inode() will also access bdev->bd_stats and bdev->bd_meta_info. So we need to at least initialize these to NULL as well. IMO the most logical place for all these initializations is in bdev_alloc_inode()... Honza > --- > fs/block_dev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 3e5b02f6606c..86fdc28d565e 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -792,8 +792,10 @@ static void bdev_free_inode(struct inode *inode) > static void init_once(void *data) > { > struct bdev_inode *ei = data; > + struct block_device *bdev = &ei->bdev; > > inode_init_once(&ei->vfs_inode); > + bdev->bd_bdi = &noop_backing_dev_info; > } > > static void bdev_evict_inode(struct inode *inode) > -- > 2.17.1 >
On 06/01/2021 21:41, Jan Kara wrote: > On Wed 06-01-21 20:29:00, Alexey Kardashevskiy wrote: >> This is a workaround to fix a null derefence crash: >> >> [c00000000b01f840] c00000000b01f880 (unreliable) >> [c00000000b01f880] c000000000769a3c bdev_evict_inode+0x21c/0x370 >> [c00000000b01f8c0] c00000000070bacc evict+0x11c/0x230 >> [c00000000b01f900] c00000000070c138 iput+0x2a8/0x4a0 >> [c00000000b01f970] c0000000006ff030 dentry_unlink_inode+0x220/0x250 >> [c00000000b01f9b0] c0000000007001c0 __dentry_kill+0x190/0x320 >> [c00000000b01fa00] c000000000701fb8 dput+0x5e8/0x860 >> [c00000000b01fa80] c000000000705848 shrink_dcache_for_umount+0x58/0x100 >> [c00000000b01fb00] c0000000006cf864 generic_shutdown_super+0x54/0x200 >> [c00000000b01fb80] c0000000006cfd48 kill_anon_super+0x38/0x60 >> [c00000000b01fbc0] c0000000006d12cc deactivate_locked_super+0xbc/0x110 >> [c00000000b01fbf0] c0000000006d13bc deactivate_super+0x9c/0xc0 >> [c00000000b01fc20] c00000000071a340 cleanup_mnt+0x1b0/0x250 >> [c00000000b01fc80] c000000000278fa8 task_work_run+0xf8/0x180 >> [c00000000b01fcd0] c00000000002b4ac do_notify_resume+0x4dc/0x5d0 >> [c00000000b01fda0] c00000000004ba0c syscall_exit_prepare+0x28c/0x370 >> [c00000000b01fe10] c00000000000e06c system_call_common+0xfc/0x27c >> --- Exception: c00 (System Call) at 0000000010034890 >> >> Is this fixed properly already somewhere? Thanks, >> >> Fixes: e6cb53827ed6 ("block: initialize struct block_device in bdev_alloc") > > I don't think it's fixed anywhere and I've seen the syzbot report and I was > wondering how this can happen when bdev_alloc() initializes bdev->bd_bdi > and it also wasn't clear to me whether bd_bdi is really the only field that > is problematic - if we can get to bdev_evict_inode() without going through > bdev_alloc(), we are probably missing initialization of other fields in > that place as well... > > But now I've realized that probably the inode is a root inode for bdev > superblock which is allocated by VFS through new_inode() and thus doesn't > undergo the initialization in bdev_alloc(). yup, this is the case. > And AFAICT the root inode on > bdev superblock can get only to bdev_evict_inode() and bdev_free_inode(). > Looking at bdev_evict_inode() the only thing that's used there from struct > block_device is really bd_bdi. bdev_free_inode() will also access > bdev->bd_stats and bdev->bd_meta_info. So we need to at least initialize > these to NULL as well. These are all NULL. > IMO the most logical place for all these > initializations is in bdev_alloc_inode()... This works. We can also check for NULL where it crashes. But I do not know the code to make an informed decision... > > Honza > >> --- >> fs/block_dev.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 3e5b02f6606c..86fdc28d565e 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -792,8 +792,10 @@ static void bdev_free_inode(struct inode *inode) >> static void init_once(void *data) >> { >> struct bdev_inode *ei = data; >> + struct block_device *bdev = &ei->bdev; >> >> inode_init_once(&ei->vfs_inode); >> + bdev->bd_bdi = &noop_backing_dev_info; >> } >> >> static void bdev_evict_inode(struct inode *inode) >> -- >> 2.17.1 >>
On Thu, Jan 07, 2021 at 10:58:39AM +1100, Alexey Kardashevskiy wrote: >> And AFAICT the root inode on >> bdev superblock can get only to bdev_evict_inode() and bdev_free_inode(). >> Looking at bdev_evict_inode() the only thing that's used there from struct >> block_device is really bd_bdi. bdev_free_inode() will also access >> bdev->bd_stats and bdev->bd_meta_info. So we need to at least initialize >> these to NULL as well. > > These are all NULL. > >> IMO the most logical place for all these >> initializations is in bdev_alloc_inode()... > > > This works. We can also check for NULL where it crashes. But I do not know > the code to make an informed decision... The root inode is the special case, so I think moving the the initializers for everything touched in ->evict_inode and ->free_inode to bdev_alloc_inode makes most sense. Alexey, do you want to respin or should I send a patch?
On 07/01/2021 18:48, Christoph Hellwig wrote: > On Thu, Jan 07, 2021 at 10:58:39AM +1100, Alexey Kardashevskiy wrote: >>> And AFAICT the root inode on >>> bdev superblock can get only to bdev_evict_inode() and bdev_free_inode(). >>> Looking at bdev_evict_inode() the only thing that's used there from struct >>> block_device is really bd_bdi. bdev_free_inode() will also access >>> bdev->bd_stats and bdev->bd_meta_info. So we need to at least initialize >>> these to NULL as well. >> >> These are all NULL. >> >>> IMO the most logical place for all these >>> initializations is in bdev_alloc_inode()... >> >> >> This works. We can also check for NULL where it crashes. But I do not know >> the code to make an informed decision... > > The root inode is the special case, so I think moving the the initializers > for everything touched in ->evict_inode and ->free_inode to > bdev_alloc_inode makes most sense. > > Alexey, do you want to respin or should I send a patch? I really prefer you doing this as you will most likely end up fixing the commit log anyway :) Thanks,
diff --git a/fs/block_dev.c b/fs/block_dev.c index 3e5b02f6606c..86fdc28d565e 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -792,8 +792,10 @@ static void bdev_free_inode(struct inode *inode) static void init_once(void *data) { struct bdev_inode *ei = data; + struct block_device *bdev = &ei->bdev; inode_init_once(&ei->vfs_inode); + bdev->bd_bdi = &noop_backing_dev_info; } static void bdev_evict_inode(struct inode *inode)