diff mbox

Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

Message ID 1444063465-5641-1-git-send-email-chandan@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chandan Rajendra Oct. 5, 2015, 4:44 p.m. UTC
The following call trace is seen when btrfs/031 test is executed in a loop,

[  120.577208] WARNING: CPU: 3 PID: 6202 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3e6/0x729()
[  120.581521] BTRFS: Transaction aborted (error -2)
[  120.585410] Modules linked in:
[  120.587460] CPU: 3 PID: 6202 Comm: btrfs Not tainted 4.2.0-rc5+ #27
[  120.591232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  120.596134]  ffffffff81c009d0 ffff880c846ff918 ffffffff81988f34 0000000000000001
[  120.600754]  ffff880c846ff968 ffff880c846ff958 ffffffff81050315 ffff880c846ff938
[  120.605166]  ffff880c85ae6000 ffff8809b944e000 ffff880a96c5a448 0000000000000000
[  120.609589] Call Trace:
[  120.610936]  [<ffffffff81988f34>] dump_stack+0x45/0x57
[  120.613868]  [<ffffffff81050315>] warn_slowpath_common+0x85/0xc0
[  120.616812]  [<ffffffff81050391>] warn_slowpath_fmt+0x41/0x50
[  120.619681]  [<ffffffff81985a84>] create_subvol+0x3e6/0x729
[  120.622390]  [<ffffffff8112d977>] ? zone_statistics+0x77/0x90
[  120.625111]  [<ffffffff8136577e>] btrfs_mksubvol.isra.30+0x37e/0x530
[  120.628091]  [<ffffffff8111a248>] ? __alloc_pages_nodemask+0x1b8/0x950
[  120.631091]  [<ffffffff81180f6a>] ? __mnt_want_write_file+0x1a/0x30
[  120.635156]  [<ffffffff81365a31>] btrfs_ioctl_snap_create_transid+0x101/0x180
[  120.639936]  [<ffffffff81365b02>] btrfs_ioctl_snap_create+0x52/0x70
[  120.643942]  [<ffffffff813684ae>] btrfs_ioctl+0x46e/0x2450
[  120.647204]  [<ffffffff8111f926>] ? lru_cache_add_active_or_unevictable+0x26/0x80
[  120.651647]  [<ffffffff81174dd1>] do_vfs_ioctl+0x2c1/0x4a0
[  120.655111]  [<ffffffff813be198>] ? selinux_file_ioctl+0x48/0xd0
[  120.658500]  [<ffffffff813b875e>] ? security_file_ioctl+0x3e/0x60
[  120.661680]  [<ffffffff81175024>] SyS_ioctl+0x74/0x80
[  120.664464]  [<ffffffff819932d7>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  120.667824] ---[ end trace 13f2ab1e5917a256 ]---
[  120.670159] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
[  120.673872] BTRFS info (device loop0): forced readonly
[  120.701265] BTRFS info (device loop0): disk space caching is enabled
[  120.704934] BTRFS error (device loop0): Remounting read-write after error is not allowed
[  120.904385] BTRFS error (device loop0): cleaner transaction attach returned -30

This occurs because,

Mount filesystem
Create subvol with ID 257
Unmount filesystem
Mount filesystem
Delete subvol with ID 257
  btrfs_drop_snapshot()
    Add root corresponding to subvol 257 into
    btrfs_transaction->dropped_roots list
Create new subvol (i.e. create_subvol())
  257 is returned as the next free objectid
  btrfs_read_fs_root_no_name()
    Finds the btrfs_root instance corresponding to the old subvol with ID 257
    in btrfs_fs_info->fs_roots_radix.
    Returns error since btrfs_root_item->refs has the value of 0.

To fix the issue the commit initializes tree root's and subvolume root's
highest_objectid when loading the roots from disk.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/btrfs/disk-io.c   | 35 +++++++++++++++++++++++++++++++++++
 fs/btrfs/inode-map.c |  9 +--------
 fs/btrfs/inode-map.h |  1 +
 fs/btrfs/ioctl.c     |  4 ++++
 4 files changed, 41 insertions(+), 8 deletions(-)

Comments

David Sterba Oct. 7, 2015, 9:25 a.m. UTC | #1
On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
> +	if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> +		mutex_unlock(&root->objectid_mutex);
> +		ret = -ENOSPC;

ENOSPC ... I don't think it's right as this could be with a normal
enospc during subvolume creation. The problem is that theh inode number
space is exhausted, the closest error code I see is EOVERFLOW. As this
is an ioctl we can afford to define the meaning of this return value as
such (unlike for eg. creat()/open()).

> +		goto free_root_dev;
> +	}
> +
> +	mutex_unlock(&root->objectid_mutex);
> +
>  	return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chandan Rajendra Oct. 7, 2015, 2:10 p.m. UTC | #2
On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote:
> On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
> > +	if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> > +		mutex_unlock(&root->objectid_mutex);
> > +		ret = -ENOSPC;
> 
> ENOSPC ... I don't think it's right as this could be with a normal
> enospc during subvolume creation. The problem is that theh inode number
> space is exhausted, the closest error code I see is EOVERFLOW. As this
> is an ioctl we can afford to define the meaning of this return value as
> such (unlike for eg. creat()/open()).
> 
> > +		goto free_root_dev;
> > +	}
> > +
> > +	mutex_unlock(&root->objectid_mutex);
> > +
> > 
> >  	return 0;

David, Are you suggesting that we return -EOVERFLOW from within
btrfs_init_fs_root() and continue returning -ENOSPC in case of error
(i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from
open_ctree()?

If yes, btrfs_init_fs_root() gets invoked from open_ctree() via
btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when
servicing the mount() syscall.
james harvey Jan. 3, 2016, 5:02 a.m. UTC | #3
Bump.

Pretty sure I just ran into this, outside of a testing scenario.  See
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/51796

Looks like the patch was never committed.

On Wed, Oct 7, 2015 at 10:10 AM, Chandan Rajendra
<chandan@linux.vnet.ibm.com> wrote:
> On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote:
>> On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
>> > +   if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
>> > +           mutex_unlock(&root->objectid_mutex);
>> > +           ret = -ENOSPC;
>>
>> ENOSPC ... I don't think it's right as this could be with a normal
>> enospc during subvolume creation. The problem is that theh inode number
>> space is exhausted, the closest error code I see is EOVERFLOW. As this
>> is an ioctl we can afford to define the meaning of this return value as
>> such (unlike for eg. creat()/open()).
>>
>> > +           goto free_root_dev;
>> > +   }
>> > +
>> > +   mutex_unlock(&root->objectid_mutex);
>> > +
>> >
>> >     return 0;
>
> David, Are you suggesting that we return -EOVERFLOW from within
> btrfs_init_fs_root() and continue returning -ENOSPC in case of error
> (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from
> open_ctree()?
>
> If yes, btrfs_init_fs_root() gets invoked from open_ctree() via
> btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when
> servicing the mount() syscall.
>
> --
> chandan
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chandan Rajendra Jan. 5, 2016, 3:22 a.m. UTC | #4
On Sunday 03 Jan 2016 00:02:18 james harvey wrote:
> Bump.
> 
> Pretty sure I just ran into this, outside of a testing scenario.  See
> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/51796
> 
> Looks like the patch was never committed.
>

Thanks for pointing this out. I will rebase & test the patch on top of current
integration branch and post it again.
David Sterba Jan. 5, 2016, 12:12 p.m. UTC | #5
On Wed, Oct 07, 2015 at 07:40:46PM +0530, Chandan Rajendra wrote:
> On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote:
> > On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
> > > +	if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> > > +		mutex_unlock(&root->objectid_mutex);
> > > +		ret = -ENOSPC;
> > 
> > ENOSPC ... I don't think it's right as this could be with a normal
> > enospc during subvolume creation. The problem is that theh inode number
> > space is exhausted, the closest error code I see is EOVERFLOW. As this
> > is an ioctl we can afford to define the meaning of this return value as
> > such (unlike for eg. creat()/open()).
> > 
> > > +		goto free_root_dev;
> > > +	}
> > > +
> > > +	mutex_unlock(&root->objectid_mutex);
> > > +
> > > 
> > >  	return 0;
> 
> David, Are you suggesting that we return -EOVERFLOW from within
> btrfs_init_fs_root() and continue returning -ENOSPC in case of error
> (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from
> open_ctree()?
> 
> If yes, btrfs_init_fs_root() gets invoked from open_ctree() via
> btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when
> servicing the mount() syscall.

Sorry for not answering that. As you're going to resend it, please
use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow
error in the mount path.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chandan Rajendra Jan. 6, 2016, 8:37 a.m. UTC | #6
On Tuesday 05 Jan 2016 13:12:34 David Sterba wrote:
> Sorry for not answering that. As you're going to resend it, please
> use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow
> error in the mount path.
Right. Now I understand that.

David, Replacing the following code snippet instances (in both open_ctree()
and btrfs_init_fs_root()) ...

        if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
                mutex_unlock(&root->objectid_mutex);
                ret = -EOVERFLOW;
                goto free_root_dev;
        }

with ....

     ASSERT(tree_root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID);

is probably a better option?

The validation of root->highest_objectid must have been done by
btrfs_find_free_objectid() when creating the subvolume. If the parent
subvolume already has an objectid with BTRFS_LAST_FREE_OBJECTID as the value,
btrfs_find_free_objectid() would return with an error and hence we should
never have subvolumes containing other subvolumes with objectid greater than
BTRFS_LAST_FREE_OBJECTID.
diff mbox

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 807f685..3b1abe7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1573,8 +1573,27 @@  int btrfs_init_fs_root(struct btrfs_root *root)
 	ret = get_anon_bdev(&root->anon_dev);
 	if (ret)
 		goto free_writers;
+
+	mutex_lock(&root->objectid_mutex);
+	ret = btrfs_find_highest_objectid(root,
+					&root->highest_objectid);
+	if (ret) {
+		mutex_unlock(&root->objectid_mutex);
+		goto free_root_dev;
+	}
+
+	if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
+		mutex_unlock(&root->objectid_mutex);
+		ret = -ENOSPC;
+		goto free_root_dev;
+	}
+
+	mutex_unlock(&root->objectid_mutex);
+
 	return 0;
 
+free_root_dev:
+	free_anon_bdev(root->anon_dev);
 free_writers:
 	btrfs_free_subvolume_writers(root->subv_writers);
 fail:
@@ -2892,6 +2911,22 @@  retry_root_backup:
 	tree_root->commit_root = btrfs_root_node(tree_root);
 	btrfs_set_root_refs(&tree_root->root_item, 1);
 
+	mutex_lock(&tree_root->objectid_mutex);
+	ret = btrfs_find_highest_objectid(tree_root,
+					&tree_root->highest_objectid);
+	if (ret) {
+		mutex_unlock(&tree_root->objectid_mutex);
+		goto recovery_tree_root;
+	}
+
+	if (unlikely(tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
+		mutex_unlock(&tree_root->objectid_mutex);
+		ret = -ENOSPC;
+		goto recovery_tree_root;
+	}
+
+	mutex_unlock(&tree_root->objectid_mutex);
+
 	ret = btrfs_read_roots(fs_info, tree_root);
 	if (ret)
 		goto recovery_tree_root;
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index d4a582a..9f06e8b 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -515,7 +515,7 @@  out:
 	return ret;
 }
 
-static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
+int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
 {
 	struct btrfs_path *path;
 	int ret;
@@ -555,13 +555,6 @@  int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid)
 	int ret;
 	mutex_lock(&root->objectid_mutex);
 
-	if (unlikely(root->highest_objectid < BTRFS_FIRST_FREE_OBJECTID)) {
-		ret = btrfs_find_highest_objectid(root,
-						  &root->highest_objectid);
-		if (ret)
-			goto out;
-	}
-
 	if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
 		ret = -ENOSPC;
 		goto out;
diff --git a/fs/btrfs/inode-map.h b/fs/btrfs/inode-map.h
index ddb347b..c8e864b 100644
--- a/fs/btrfs/inode-map.h
+++ b/fs/btrfs/inode-map.h
@@ -9,5 +9,6 @@  int btrfs_save_ino_cache(struct btrfs_root *root,
 			 struct btrfs_trans_handle *trans);
 
 int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid);
+int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid);
 
 #endif
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0adf542..ad7422b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -568,6 +568,10 @@  static noinline int create_subvol(struct inode *dir,
 		goto fail;
 	}
 
+	mutex_lock(&new_root->objectid_mutex);
+	new_root->highest_objectid = new_dirid;
+	mutex_unlock(&new_root->objectid_mutex);
+
 	/*
 	 * insert the directory item
 	 */