Message ID | 20190924205044.31830-1-josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] btrfs: check page->mapping when loading free space cache | expand |
On Thu, Sep 26, 2019 at 11:53 AM Josef Bacik <josef@toxicpanda.com> wrote: > > While testing 5.2 we ran into the following panic > > [52238.017028] BUG: kernel NULL pointer dereference, address: 0000000000000001 > [52238.105608] RIP: 0010:drop_buffers+0x3d/0x150 > [52238.304051] Call Trace: > [52238.308958] try_to_free_buffers+0x15b/0x1b0 > [52238.317503] shrink_page_list+0x1164/0x1780 > [52238.325877] shrink_inactive_list+0x18f/0x3b0 > [52238.334596] shrink_node_memcg+0x23e/0x7d0 > [52238.342790] ? do_shrink_slab+0x4f/0x290 > [52238.350648] shrink_node+0xce/0x4a0 > [52238.357628] balance_pgdat+0x2c7/0x510 > [52238.365135] kswapd+0x216/0x3e0 > [52238.371425] ? wait_woken+0x80/0x80 > [52238.378412] ? balance_pgdat+0x510/0x510 > [52238.386265] kthread+0x111/0x130 > [52238.392727] ? kthread_create_on_node+0x60/0x60 > [52238.401782] ret_from_fork+0x1f/0x30 > > The page we were trying to drop had a page->private, but had no > page->mapping and so called drop_buffers, assuming that we had a > buffer_head on the page, and then panic'ed trying to deref 1, which is > our page->private for data pages. > > This is happening because we're truncating the free space cache while > we're trying to load the free space cache. This isn't supposed to > happen, and I'll fix that in a followup patch. However we still > shouldn't allow those sort of mistakes to result in messing with pages > that do not belong to us. So add the page->mapping check to verify that > we still own this page after dropping and re-acquiring the page lock. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good, thanks. > --- > fs/btrfs/free-space-cache.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index d54dcd0ab230..d86ada9c3c54 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -385,6 +385,12 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, struct inode *inode > if (uptodate && !PageUptodate(page)) { > btrfs_readpage(NULL, page); > lock_page(page); > + if (page->mapping != inode->i_mapping) { > + btrfs_err(BTRFS_I(inode)->root->fs_info, > + "free space cache page truncated"); > + io_ctl_drop_pages(io_ctl); > + return -EIO; > + } > if (!PageUptodate(page)) { > btrfs_err(BTRFS_I(inode)->root->fs_info, > "error reading free space cache"); > -- > 2.21.0 >
On 24.09.19 г. 23:50 ч., Josef Bacik wrote: > While testing 5.2 we ran into the following panic > > [52238.017028] BUG: kernel NULL pointer dereference, address: 0000000000000001 > [52238.105608] RIP: 0010:drop_buffers+0x3d/0x150 > [52238.304051] Call Trace: > [52238.308958] try_to_free_buffers+0x15b/0x1b0 > [52238.317503] shrink_page_list+0x1164/0x1780 > [52238.325877] shrink_inactive_list+0x18f/0x3b0 > [52238.334596] shrink_node_memcg+0x23e/0x7d0 > [52238.342790] ? do_shrink_slab+0x4f/0x290 > [52238.350648] shrink_node+0xce/0x4a0 > [52238.357628] balance_pgdat+0x2c7/0x510 > [52238.365135] kswapd+0x216/0x3e0 > [52238.371425] ? wait_woken+0x80/0x80 > [52238.378412] ? balance_pgdat+0x510/0x510 > [52238.386265] kthread+0x111/0x130 > [52238.392727] ? kthread_create_on_node+0x60/0x60 > [52238.401782] ret_from_fork+0x1f/0x30 > > The page we were trying to drop had a page->private, but had no > page->mapping and so called drop_buffers, assuming that we had a > buffer_head on the page, and then panic'ed trying to deref 1, which is > our page->private for data pages. > > This is happening because we're truncating the free space cache while > we're trying to load the free space cache. This isn't supposed to > happen, and I'll fix that in a followup patch. However we still > shouldn't allow those sort of mistakes to result in messing with pages > that do not belong to us. So add the page->mapping check to verify that > we still own this page after dropping and re-acquiring the page lock. Where is this page being unlocked? Is it: btrfs_readpage extent_read_full_page __extent_read_full_page __do_readpage if (!nr) unlock_page <-- nr can be 0 only if submit_extent_page returns an error, correct? That's somewhat subtle and buried inside a long call chaing and really non obvious. So please mention when is the page lock being dropped. Code-wise LGTM: Reviewed-by: Nikolay Borisov <nborisov@suse.com> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/free-space-cache.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index d54dcd0ab230..d86ada9c3c54 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -385,6 +385,12 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, struct inode *inode > if (uptodate && !PageUptodate(page)) { > btrfs_readpage(NULL, page); > lock_page(page); > + if (page->mapping != inode->i_mapping) { > + btrfs_err(BTRFS_I(inode)->root->fs_info, > + "free space cache page truncated"); > + io_ctl_drop_pages(io_ctl); > + return -EIO; > + } > if (!PageUptodate(page)) { > btrfs_err(BTRFS_I(inode)->root->fs_info, > "error reading free space cache"); >
On Tue, Sep 24, 2019 at 04:50:43PM -0400, Josef Bacik wrote: > While testing 5.2 we ran into the following panic > > [52238.017028] BUG: kernel NULL pointer dereference, address: 0000000000000001 > [52238.105608] RIP: 0010:drop_buffers+0x3d/0x150 > [52238.304051] Call Trace: > [52238.308958] try_to_free_buffers+0x15b/0x1b0 > [52238.317503] shrink_page_list+0x1164/0x1780 > [52238.325877] shrink_inactive_list+0x18f/0x3b0 > [52238.334596] shrink_node_memcg+0x23e/0x7d0 > [52238.342790] ? do_shrink_slab+0x4f/0x290 > [52238.350648] shrink_node+0xce/0x4a0 > [52238.357628] balance_pgdat+0x2c7/0x510 > [52238.365135] kswapd+0x216/0x3e0 > [52238.371425] ? wait_woken+0x80/0x80 > [52238.378412] ? balance_pgdat+0x510/0x510 > [52238.386265] kthread+0x111/0x130 > [52238.392727] ? kthread_create_on_node+0x60/0x60 > [52238.401782] ret_from_fork+0x1f/0x30 > > The page we were trying to drop had a page->private, but had no > page->mapping and so called drop_buffers, assuming that we had a > buffer_head on the page, and then panic'ed trying to deref 1, which is > our page->private for data pages. > > This is happening because we're truncating the free space cache while > we're trying to load the free space cache. This isn't supposed to > happen, and I'll fix that in a followup patch. However we still > shouldn't allow those sort of mistakes to result in messing with pages > that do not belong to us. So add the page->mapping check to verify that > we still own this page after dropping and re-acquiring the page lock. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Patches 1 and 2 moved to misc-next. Thanks.
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d54dcd0ab230..d86ada9c3c54 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -385,6 +385,12 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, struct inode *inode if (uptodate && !PageUptodate(page)) { btrfs_readpage(NULL, page); lock_page(page); + if (page->mapping != inode->i_mapping) { + btrfs_err(BTRFS_I(inode)->root->fs_info, + "free space cache page truncated"); + io_ctl_drop_pages(io_ctl); + return -EIO; + } if (!PageUptodate(page)) { btrfs_err(BTRFS_I(inode)->root->fs_info, "error reading free space cache");
While testing 5.2 we ran into the following panic [52238.017028] BUG: kernel NULL pointer dereference, address: 0000000000000001 [52238.105608] RIP: 0010:drop_buffers+0x3d/0x150 [52238.304051] Call Trace: [52238.308958] try_to_free_buffers+0x15b/0x1b0 [52238.317503] shrink_page_list+0x1164/0x1780 [52238.325877] shrink_inactive_list+0x18f/0x3b0 [52238.334596] shrink_node_memcg+0x23e/0x7d0 [52238.342790] ? do_shrink_slab+0x4f/0x290 [52238.350648] shrink_node+0xce/0x4a0 [52238.357628] balance_pgdat+0x2c7/0x510 [52238.365135] kswapd+0x216/0x3e0 [52238.371425] ? wait_woken+0x80/0x80 [52238.378412] ? balance_pgdat+0x510/0x510 [52238.386265] kthread+0x111/0x130 [52238.392727] ? kthread_create_on_node+0x60/0x60 [52238.401782] ret_from_fork+0x1f/0x30 The page we were trying to drop had a page->private, but had no page->mapping and so called drop_buffers, assuming that we had a buffer_head on the page, and then panic'ed trying to deref 1, which is our page->private for data pages. This is happening because we're truncating the free space cache while we're trying to load the free space cache. This isn't supposed to happen, and I'll fix that in a followup patch. However we still shouldn't allow those sort of mistakes to result in messing with pages that do not belong to us. So add the page->mapping check to verify that we still own this page after dropping and re-acquiring the page lock. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/free-space-cache.c | 6 ++++++ 1 file changed, 6 insertions(+)