[v2,3/6] btrfs-progs: check/common: Make repair_imode_common() to handle inodes in subvolume trees
diff mbox series

Message ID 20190905075800.1633-4-wqu@suse.com
State New
Headers show
Series
  • btrfs-progs: check: Repair invalid inode mode in
Related show

Commit Message

Qu Wenruo Sept. 5, 2019, 7:57 a.m. UTC
Before this patch, repair_imode_common() can only handle two types of
inodes:
- Free space cache inodes
- ROOT DIR inodes

For inodes in subvolume trees, the core complexity is how to determine the
correct imode, thus it was not implemented.

However there are more reports of incorrect imode in subvolume trees, we
need to support such fix.

So this patch adds a new function, detect_imode(), to detect imode for
inodes in subvolume trees.

That function will determine imode by:
- Search for INODE_REF
  If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
  As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
  the BTRFS_FT_* to imode, then call it a day.
  This should be the most accurate way.

- Search for DIR_INDEX/DIR_ITEM
  If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
  just after the INODE_ITEM.
  If any can be found, it's definitely a directory.

- Search for EXTENT_DATA
  If EXTENT_DATA can be found, it's either REG or LNK.
  For this case, we default to REG, as user can inspect the file to
  determine if it's a file or just a path.

- Use rdev to detect BLK/CHR
  If all above fails, but INODE_ITEM has non-zero rdev, then it's either
  a BLK or CHR file. Then we default to BLK.

- Fail out if none of above methods succeeded
  No educated guess to make things worse.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 117 insertions(+), 13 deletions(-)

Comments

Nikolay Borisov Sept. 9, 2019, 2:17 p.m. UTC | #1
On 5.09.19 г. 10:57 ч., Qu Wenruo wrote:
> Before this patch, repair_imode_common() can only handle two types of
> inodes:
> - Free space cache inodes
> - ROOT DIR inodes
> 
> For inodes in subvolume trees, the core complexity is how to determine the
> correct imode, thus it was not implemented.
> 
> However there are more reports of incorrect imode in subvolume trees, we
> need to support such fix.
> 
> So this patch adds a new function, detect_imode(), to detect imode for
> inodes in subvolume trees.
> 
> That function will determine imode by:
> - Search for INODE_REF
>   If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
>   As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
>   the BTRFS_FT_* to imode, then call it a day.
>   This should be the most accurate way.
> 
> - Search for DIR_INDEX/DIR_ITEM
>   If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
>   just after the INODE_ITEM.
>   If any can be found, it's definitely a directory.
> 
> - Search for EXTENT_DATA
>   If EXTENT_DATA can be found, it's either REG or LNK.
>   For this case, we default to REG, as user can inspect the file to
>   determine if it's a file or just a path.
> 
> - Use rdev to detect BLK/CHR
>   If all above fails, but INODE_ITEM has non-zero rdev, then it's either
>   a BLK or CHR file. Then we default to BLK.
> 
> - Fail out if none of above methods succeeded
>   No educated guess to make things worse.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 117 insertions(+), 13 deletions(-)
> 
> diff --git a/check/mode-common.c b/check/mode-common.c
> index c0ddc50a1dd0..abea2ceda4c4 100644
> --- a/check/mode-common.c
> +++ b/check/mode-common.c
> @@ -935,6 +935,113 @@ out:
>  	return ret;
>  }
>  
> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
> +			u32 *imode_ret)

I think the imode is less than u32 so it should be possible to return it
directly from the function as a positive number and error as negative?

> +{
> +	struct btrfs_key key;
> +	struct btrfs_inode_item iitem;
> +	const u32 priv = 0700;

Having this in a variable doesn't bring more clarity, just use 0700
directly at the end of the function.

> +	bool found = false;
> +	u64 ino;
> +	u32 imode;
> +	int ret = 0;
> +
> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +	ino = key.objectid;
> +	read_extent_buffer(path->nodes[0], &iitem,
> +			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> +			sizeof(iitem));
> +	/* root inode */
> +	if (ino == BTRFS_FIRST_FREE_OBJECTID) {
> +		imode = S_IFDIR;
> +		found = true;
> +		goto out;
> +	}
> +
> +	while (1) {
> +		struct btrfs_inode_ref *iref;
> +		struct extent_buffer *leaf;
> +		unsigned long cur;
> +		unsigned long end;
> +		char namebuf[BTRFS_NAME_LEN] = {0};
> +		u64 index;
> +		u32 namelen;
> +		int slot;
> +
> +		ret = btrfs_next_item(root, path);
> +		if (ret > 0) {
> +			/* falls back to rdev check */
> +			ret = 0;
> +			goto out;
> +		}
> +		if (ret < 0)
> +			goto out;
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		btrfs_item_key_to_cpu(leaf, &key, slot);
> +		if (key.objectid != ino)
> +			goto out;
> +
> +		/*
> +		 * We ignore some types to make life easier:
> +		 * - XATTR
> +		 *   Both REG and DIR can have xattr, so not useful
> +		 */
> +		switch (key.type) {
> +		case BTRFS_INODE_REF_KEY:
> +			/* The most accurate way to determine filetype */
> +			cur = btrfs_item_ptr_offset(leaf, slot);
> +			end = cur + btrfs_item_size_nr(leaf, slot);
> +			while (cur < end) {
> +				iref = (struct btrfs_inode_ref *)cur;
> +				namelen = min_t(u32, end - cur - sizeof(&iref),
> +					btrfs_inode_ref_name_len(leaf, iref));
> +				index = btrfs_inode_ref_index(leaf, iref);
> +				read_extent_buffer(leaf, namebuf,
> +					(unsigned long)(iref + 1), namelen);
> +				ret = find_file_type(root, ino, key.offset,
> +						index, namebuf, namelen,
> +						&imode);
> +				if (ret == 0) {
> +					found = true;
> +					goto out;
> +				}
> +				cur += sizeof(*iref) + namelen;
> +			}
> +			break;
> +		case BTRFS_DIR_ITEM_KEY:
> +		case BTRFS_DIR_INDEX_KEY:
> +			imode = S_IFDIR;
> +			goto out;
> +		case BTRFS_EXTENT_DATA_KEY:
> +			/*
> +			 * Both REG and LINK could have EXTENT_DATA.
> +			 * We just fall back to REG as user can inspect the
> +			 * content.
> +			 */
> +			imode = S_IFREG;
> +			goto out;
> +		}
> +	}
> +
> +out:
> +	/*
> +	 * Both CHR and BLK uses rdev, no way to distinguish them, so fall back
> +	 * to BLK. But either way it doesn't really matter, as CHR/BLK on btrfs
> +	 * should be pretty rare, and no real data will be lost.
> +	 */
> +	if (!found && btrfs_stack_inode_rdev(&iitem) != 0) {
> +		imode = S_IFBLK;
> +		found = true;
> +	}
> +
> +	if (found)
> +		*imode_ret = (imode | priv);
> +	else
> +		ret = -ENOENT;
> +	return ret;
> +}
> +
>  /*
>   * Reset the inode mode of the inode specified by @path.
>   *
> @@ -951,22 +1058,19 @@ int repair_imode_common(struct btrfs_root *root, struct btrfs_path *path)
>  	u32 imode;
>  	int ret;
>  
> -	if (root->root_key.objectid != BTRFS_ROOT_TREE_OBJECTID) {
> -		error(
> -		"repair inode mode outside of root tree is not supported yet");
> -		return -ENOTTY;
> -	}
>  	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>  	ASSERT(key.type == BTRFS_INODE_ITEM_KEY);
> -	if (key.objectid != BTRFS_ROOT_TREE_DIR_OBJECTID &&
> -	    !is_fstree(key.objectid)) {
> -		error("unsupported ino %llu", key.objectid);
> -		return -ENOTTY;
> +	if (root->objectid == BTRFS_ROOT_TREE_OBJECTID) {
> +		/* In root tree we only have two possible imode */
> +		if (key.objectid == BTRFS_ROOT_TREE_OBJECTID)
> +			imode = S_IFDIR | 0755;
> +		else
> +			imode = S_IFREG | 0600;
> +	} else {
> +		ret = detect_imode(root, path, &imode);
> +		if (ret < 0)
> +			return ret;
>  	}
> -	if (key.objectid == BTRFS_ROOT_TREE_DIR_OBJECTID)
> -		imode = 040755;
> -	else
> -		imode = 0100600;
>  
>  	trans = btrfs_start_transaction(root, 1);
>  	if (IS_ERR(trans)) {
>
Qu Wenruo Sept. 9, 2019, 2:27 p.m. UTC | #2
On 2019/9/9 下午10:17, Nikolay Borisov wrote:
>
>
> On 5.09.19 г. 10:57 ч., Qu Wenruo wrote:
>> Before this patch, repair_imode_common() can only handle two types of
>> inodes:
>> - Free space cache inodes
>> - ROOT DIR inodes
>>
>> For inodes in subvolume trees, the core complexity is how to determine the
>> correct imode, thus it was not implemented.
>>
>> However there are more reports of incorrect imode in subvolume trees, we
>> need to support such fix.
>>
>> So this patch adds a new function, detect_imode(), to detect imode for
>> inodes in subvolume trees.
>>
>> That function will determine imode by:
>> - Search for INODE_REF
>>   If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
>>   As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
>>   the BTRFS_FT_* to imode, then call it a day.
>>   This should be the most accurate way.
>>
>> - Search for DIR_INDEX/DIR_ITEM
>>   If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
>>   just after the INODE_ITEM.
>>   If any can be found, it's definitely a directory.
>>
>> - Search for EXTENT_DATA
>>   If EXTENT_DATA can be found, it's either REG or LNK.
>>   For this case, we default to REG, as user can inspect the file to
>>   determine if it's a file or just a path.
>>
>> - Use rdev to detect BLK/CHR
>>   If all above fails, but INODE_ITEM has non-zero rdev, then it's either
>>   a BLK or CHR file. Then we default to BLK.
>>
>> - Fail out if none of above methods succeeded
>>   No educated guess to make things worse.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 117 insertions(+), 13 deletions(-)
>>
>> diff --git a/check/mode-common.c b/check/mode-common.c
>> index c0ddc50a1dd0..abea2ceda4c4 100644
>> --- a/check/mode-common.c
>> +++ b/check/mode-common.c
>> @@ -935,6 +935,113 @@ out:
>>  	return ret;
>>  }
>>
>> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
>> +			u32 *imode_ret)
>
> I think the imode is less than u32 so it should be possible to return it
> directly from the function as a positive number and error as negative?

It still doesn't look good enough to me.
Combining data value from error number is not a good idea to me, even
even the range doesn't overlap.

>
>> +{
>> +	struct btrfs_key key;
>> +	struct btrfs_inode_item iitem;
>> +	const u32 priv = 0700;
>
> Having this in a variable doesn't bring more clarity, just use 0700
> directly at the end of the function.

I'm OK with that.

Thanks,
Qu
Su Yue Sept. 10, 2019, 4:27 a.m. UTC | #3
On 2019/9/5 3:57 PM, Qu Wenruo wrote:
> Before this patch, repair_imode_common() can only handle two types of
> inodes:
> - Free space cache inodes
> - ROOT DIR inodes
>
> For inodes in subvolume trees, the core complexity is how to determine the
> correct imode, thus it was not implemented.
>
> However there are more reports of incorrect imode in subvolume trees, we
> need to support such fix.
>
> So this patch adds a new function, detect_imode(), to detect imode for
> inodes in subvolume trees.
>
> That function will determine imode by:
> - Search for INODE_REF
>    If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
>    As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
>    the BTRFS_FT_* to imode, then call it a day.
>    This should be the most accurate way.
>
> - Search for DIR_INDEX/DIR_ITEM
>    If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
>    just after the INODE_ITEM.
>    If any can be found, it's definitely a directory.
>
> - Search for EXTENT_DATA
>    If EXTENT_DATA can be found, it's either REG or LNK.
>    For this case, we default to REG, as user can inspect the file to
>    determine if it's a file or just a path.
>
> - Use rdev to detect BLK/CHR
>    If all above fails, but INODE_ITEM has non-zero rdev, then it's either
>    a BLK or CHR file. Then we default to BLK.
>
> - Fail out if none of above methods succeeded
>    No educated guess to make things worse.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 117 insertions(+), 13 deletions(-)
>
> diff --git a/check/mode-common.c b/check/mode-common.c
> index c0ddc50a1dd0..abea2ceda4c4 100644
> --- a/check/mode-common.c
> +++ b/check/mode-common.c
> @@ -935,6 +935,113 @@ out:
>   	return ret;
>   }
>
> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
> +			u32 *imode_ret)
> +{
> +	struct btrfs_key key;
> +	struct btrfs_inode_item iitem;
> +	const u32 priv = 0700;
> +	bool found = false;
> +	u64 ino;
> +	u32 imode;
> +	int ret = 0;
> +
> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +	ino = key.objectid;
> +	read_extent_buffer(path->nodes[0], &iitem,
> +			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> +			sizeof(iitem));
> +	/* root inode */
> +	if (ino == BTRFS_FIRST_FREE_OBJECTID) {
> +		imode = S_IFDIR;
> +		found = true;
> +		goto out;
> +	}
> +
> +	while (1) {
> +		struct btrfs_inode_ref *iref;
> +		struct extent_buffer *leaf;
> +		unsigned long cur;
> +		unsigned long end;
> +		char namebuf[BTRFS_NAME_LEN] = {0};
> +		u64 index;
> +		u32 namelen;
> +		int slot;
> +
> +		ret = btrfs_next_item(root, path);
> +		if (ret > 0) {
> +			/* falls back to rdev check */
> +			ret = 0;
> +			goto out;
> +		}
> +		if (ret < 0)
> +			goto out;
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		btrfs_item_key_to_cpu(leaf, &key, slot);
> +		if (key.objectid != ino)
> +			goto out;
> +
> +		/*
> +		 * We ignore some types to make life easier:
> +		 * - XATTR
> +		 *   Both REG and DIR can have xattr, so not useful
> +		 */
> +		switch (key.type) {
> +		case BTRFS_INODE_REF_KEY:
> +			/* The most accurate way to determine filetype */
> +			cur = btrfs_item_ptr_offset(leaf, slot);
> +			end = cur + btrfs_item_size_nr(leaf, slot);
> +			while (cur < end) {
> +				iref = (struct btrfs_inode_ref *)cur;
> +				namelen = min_t(u32, end - cur - sizeof(&iref),
> +					btrfs_inode_ref_name_len(leaf, iref));
> +				index = btrfs_inode_ref_index(leaf, iref);
> +				read_extent_buffer(leaf, namebuf,
> +					(unsigned long)(iref + 1), namelen);
> +				ret = find_file_type(root, ino, key.offset,
> +						index, namebuf, namelen,
> +						&imode);
> +				if (ret == 0) {
> +					found = true;
> +					goto out;
> +				}
> +				cur += sizeof(*iref) + namelen;
> +			}
> +			break;
> +		case BTRFS_DIR_ITEM_KEY:
> +		case BTRFS_DIR_INDEX_KEY:
> +			imode = S_IFDIR;
> +			goto out;
> +		case BTRFS_EXTENT_DATA_KEY:
> +			/*
> +			 * Both REG and LINK could have EXTENT_DATA.
> +			 * We just fall back to REG as user can inspect the
> +			 * content.
> +			 */
> +			imode = S_IFREG;
> +			goto out;
> +		}
> +	}
> +
> +out:
> +	/*
> +	 * Both CHR and BLK uses rdev, no way to distinguish them, so fall back
> +	 * to BLK. But either way it doesn't really matter, as CHR/BLK on btrfs
> +	 * should be pretty rare, and no real data will be lost.
> +	 */
> +	if (!found && btrfs_stack_inode_rdev(&iitem) != 0) {
> +		imode = S_IFBLK;
> +		found = true;
> +	}
> +
> +	if (found)
> +		*imode_ret = (imode | priv);

Should set @ret to 0 here?
In above fallback path, @ret may be negative.

--
Su
> +	else
> +		ret = -ENOENT;
> +	return ret;
> +}
> +
>   /*
>    * Reset the inode mode of the inode specified by @path.
>    *
> @@ -951,22 +1058,19 @@ int repair_imode_common(struct btrfs_root *root, struct btrfs_path *path)
>   	u32 imode;
>   	int ret;
>
> -	if (root->root_key.objectid != BTRFS_ROOT_TREE_OBJECTID) {
> -		error(
> -		"repair inode mode outside of root tree is not supported yet");
> -		return -ENOTTY;
> -	}
>   	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>   	ASSERT(key.type == BTRFS_INODE_ITEM_KEY);
> -	if (key.objectid != BTRFS_ROOT_TREE_DIR_OBJECTID &&
> -	    !is_fstree(key.objectid)) {
> -		error("unsupported ino %llu", key.objectid);
> -		return -ENOTTY;
> +	if (root->objectid == BTRFS_ROOT_TREE_OBJECTID) {
> +		/* In root tree we only have two possible imode */
> +		if (key.objectid == BTRFS_ROOT_TREE_OBJECTID)
> +			imode = S_IFDIR | 0755;
> +		else
> +			imode = S_IFREG | 0600;
> +	} else {
> +		ret = detect_imode(root, path, &imode);
> +		if (ret < 0)
> +			return ret;
>   	}
> -	if (key.objectid == BTRFS_ROOT_TREE_DIR_OBJECTID)
> -		imode = 040755;
> -	else
> -		imode = 0100600;
>
>   	trans = btrfs_start_transaction(root, 1);
>   	if (IS_ERR(trans)) {
>
Nikolay Borisov Sept. 10, 2019, 4:14 p.m. UTC | #4
On 5.09.19 г. 10:57 ч., Qu Wenruo wrote:
> Before this patch, repair_imode_common() can only handle two types of
> inodes:
> - Free space cache inodes
> - ROOT DIR inodes
> 
> For inodes in subvolume trees, the core complexity is how to determine the
> correct imode, thus it was not implemented.
> 
> However there are more reports of incorrect imode in subvolume trees, we
> need to support such fix.
> 
> So this patch adds a new function, detect_imode(), to detect imode for
> inodes in subvolume trees.
> 
> That function will determine imode by:
> - Search for INODE_REF
>   If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
>   As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
>   the BTRFS_FT_* to imode, then call it a day.
>   This should be the most accurate way.
> 
> - Search for DIR_INDEX/DIR_ITEM
>   If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
>   just after the INODE_ITEM.
>   If any can be found, it's definitely a directory.

This needs an explicit satement that it will only work for non-empty files and directories
> 
> - Search for EXTENT_DATA
>   If EXTENT_DATA can be found, it's either REG or LNK.
>   For this case, we default to REG, as user can inspect the file to
>   determine if it's a file or just a path.
> 
> - Use rdev to detect BLK/CHR
>   If all above fails, but INODE_ITEM has non-zero rdev, then it's either
>   a BLK or CHR file. Then we default to BLK.
> 
> - Fail out if none of above methods succeeded
>   No educated guess to make things worse.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 117 insertions(+), 13 deletions(-)
> 
> diff --git a/check/mode-common.c b/check/mode-common.c
> index c0ddc50a1dd0..abea2ceda4c4 100644
> --- a/check/mode-common.c
> +++ b/check/mode-common.c
> @@ -935,6 +935,113 @@ out:
>  	return ret;
>  }
>  
> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
> +			u32 *imode_ret)
> +{
> +	struct btrfs_key key;
> +	struct btrfs_inode_item iitem;
> +	const u32 priv = 0700;
> +	bool found = false;
> +	u64 ino;
> +	u32 imode;
> +	int ret = 0;
> +
> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +	ino = key.objectid;
> +	read_extent_buffer(path->nodes[0], &iitem,
> +			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> +			sizeof(iitem));
> +	/* root inode */
> +	if (ino == BTRFS_FIRST_FREE_OBJECTID) {
> +		imode = S_IFDIR;
> +		found = true;
> +		goto out;
> +	}
> +
> +	while (1) {
> +		struct btrfs_inode_ref *iref;
> +		struct extent_buffer *leaf;
> +		unsigned long cur;
> +		unsigned long end;
> +		char namebuf[BTRFS_NAME_LEN] = {0};
> +		u64 index;
> +		u32 namelen;
> +		int slot;
> +
> +		ret = btrfs_next_item(root, path);
> +		if (ret > 0) {
> +			/* falls back to rdev check */
> +			ret = 0;
> +			goto out;
> +		}

In my testing if an inode is the last one in the leaf and it doesn't have 
an INODE_REF item then it won't be repaired. But e.g. it can have perfectly 
valid DIR_ITEM/DIR_INDEX entries which describe this inode as being a file. E.g. 

	item 2 key (256 DIR_ITEM 388586943) itemoff 16076 itemsize 35
		location key (260 INODE_ITEM 0) type FILE
		transid 7 data_len 0 name_len 5
		name: file3

	.....
	item 15 key (260 INODE_ITEM 0) itemoff 15184 itemsize 160
		generation 7 transid 7 size 0 nbytes 0
		block group 0 mode 26772225102 links 1 uid 0 gid 0 rdev 0
		sequence 1 flags 0x0(none)
		atime 1568127261.284993602 (2019-09-10 14:54:21)
		ctime 1568127261.284993602 (2019-09-10 14:54:21)
		mtime 1568127261.284993602 (2019-09-10 14:54:21)
		otime 1568127261.284993602 (2019-09-10 14:54:21)

I have intentionally deleted INODE_REF too see what's happening. Is this intended?


> +		if (ret < 0)
> +			goto out;
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		btrfs_item_key_to_cpu(leaf, &key, slot);
> +		if (key.objectid != ino)
> +			goto out;
> +
> +		/*
> +		 * We ignore some types to make life easier:
> +		 * - XATTR
> +		 *   Both REG and DIR can have xattr, so not useful
> +		 */
> +		switch (key.type) {
> +		case BTRFS_INODE_REF_KEY:
> +			/* The most accurate way to determine filetype */
> +			cur = btrfs_item_ptr_offset(leaf, slot);
> +			end = cur + btrfs_item_size_nr(leaf, slot);
> +			while (cur < end) {
> +				iref = (struct btrfs_inode_ref *)cur;
> +				namelen = min_t(u32, end - cur - sizeof(&iref),
> +					btrfs_inode_ref_name_len(leaf, iref));
> +				index = btrfs_inode_ref_index(leaf, iref);
> +				read_extent_buffer(leaf, namebuf,
> +					(unsigned long)(iref + 1), namelen);
> +				ret = find_file_type(root, ino, key.offset,
> +						index, namebuf, namelen,
> +						&imode);
> +				if (ret == 0) {
> +					found = true;
> +					goto out;
> +				}
> +				cur += sizeof(*iref) + namelen;
> +			}
> +			break;
> +		case BTRFS_DIR_ITEM_KEY:
> +		case BTRFS_DIR_INDEX_KEY:
> +			imode = S_IFDIR;

You should set found here. Otherwise this function returns ENOENT in those 2 branches. 
BTW in relation to out private conversation I found this while trying to create test 
cases which will trigger all branches. The fact it found bugs proves we should strive 
for as much testing coverage as possible. 

> +			goto out;
> +		case BTRFS_EXTENT_DATA_KEY:
> +			/*
> +			 * Both REG and LINK could have EXTENT_DATA.
> +			 * We just fall back to REG as user can inspect the
> +			 * content.
> +			 */
> +			imode = S_IFREG;
ditto

> +			goto out;
> +		}
> +	}
> +
> +out:
> +	/*
> +	 * Both CHR and BLK uses rdev, no way to distinguish them, so fall back
> +	 * to BLK. But either way it doesn't really matter, as CHR/BLK on btrfs
> +	 * should be pretty rare, and no real data will be lost.
> +	 */
> +	if (!found && btrfs_stack_inode_rdev(&iitem) != 0) {
> +		imode = S_IFBLK;
> +		found = true;
> +	}
> +
> +	if (found)
> +		*imode_ret = (imode | priv);
> +	else
> +		ret = -ENOENT;
> +	return ret;
> +}
> +
>  /*
>   * Reset the inode mode of the inode specified by @path.

<snip>
Qu Wenruo Sept. 11, 2019, 12:39 a.m. UTC | #5
[...]
>> - Search for DIR_INDEX/DIR_ITEM
>>   If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
>>   just after the INODE_ITEM.
>>   If any can be found, it's definitely a directory.
>
> This needs an explicit satement that it will only work for non-empty files and directories

Indeed.

>>
>> - Search for EXTENT_DATA
>>   If EXTENT_DATA can be found, it's either REG or LNK.
>>   For this case, we default to REG, as user can inspect the file to
>>   determine if it's a file or just a path.
>>
>> - Use rdev to detect BLK/CHR
>>   If all above fails, but INODE_ITEM has non-zero rdev, then it's either
>>   a BLK or CHR file. Then we default to BLK.
>>
>> - Fail out if none of above methods succeeded
>>   No educated guess to make things worse.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 117 insertions(+), 13 deletions(-)
>>
>> diff --git a/check/mode-common.c b/check/mode-common.c
>> index c0ddc50a1dd0..abea2ceda4c4 100644
>> --- a/check/mode-common.c
>> +++ b/check/mode-common.c
>> @@ -935,6 +935,113 @@ out:
>>  	return ret;
>>  }
>>
>> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
>> +			u32 *imode_ret)
>> +{
>> +	struct btrfs_key key;
>> +	struct btrfs_inode_item iitem;
>> +	const u32 priv = 0700;
>> +	bool found = false;
>> +	u64 ino;
>> +	u32 imode;
>> +	int ret = 0;
>> +
>> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> +	ino = key.objectid;
>> +	read_extent_buffer(path->nodes[0], &iitem,
>> +			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
>> +			sizeof(iitem));
>> +	/* root inode */
>> +	if (ino == BTRFS_FIRST_FREE_OBJECTID) {
>> +		imode = S_IFDIR;
>> +		found = true;
>> +		goto out;
>> +	}
>> +
>> +	while (1) {
>> +		struct btrfs_inode_ref *iref;
>> +		struct extent_buffer *leaf;
>> +		unsigned long cur;
>> +		unsigned long end;
>> +		char namebuf[BTRFS_NAME_LEN] = {0};
>> +		u64 index;
>> +		u32 namelen;
>> +		int slot;
>> +
>> +		ret = btrfs_next_item(root, path);
>> +		if (ret > 0) {
>> +			/* falls back to rdev check */
>> +			ret = 0;
>> +			goto out;
>> +		}
>
> In my testing if an inode is the last one in the leaf and it doesn't have
> an INODE_REF item then it won't be repaired. But e.g. it can have perfectly
> valid DIR_ITEM/DIR_INDEX entries which describe this inode as being a file. E.g.
>
> 	item 2 key (256 DIR_ITEM 388586943) itemoff 16076 itemsize 35
> 		location key (260 INODE_ITEM 0) type FILE
> 		transid 7 data_len 0 name_len 5
> 		name: file3
>
> 	.....
> 	item 15 key (260 INODE_ITEM 0) itemoff 15184 itemsize 160
> 		generation 7 transid 7 size 0 nbytes 0
> 		block group 0 mode 26772225102 links 1 uid 0 gid 0 rdev 0
> 		sequence 1 flags 0x0(none)
> 		atime 1568127261.284993602 (2019-09-10 14:54:21)
> 		ctime 1568127261.284993602 (2019-09-10 14:54:21)
> 		mtime 1568127261.284993602 (2019-09-10 14:54:21)
> 		otime 1568127261.284993602 (2019-09-10 14:54:21)
>
> I have intentionally deleted INODE_REF too see what's happening. Is this intended?

Yes, completely intended.

For this case, you need to iterate through the whole tree to locate the
DIR_INDEX to fix, which is not really possible with current code base,
which only search based on the INODE, not the DIR_INDEX/DIR_ITEM from
its parent dir.

Furthermore, didn't you mention that if we don't have confident about
the imode, then we should fail out instead of using REG as default?

>
>
>> +		if (ret < 0)
>> +			goto out;
>> +		leaf = path->nodes[0];
>> +		slot = path->slots[0];
>> +		btrfs_item_key_to_cpu(leaf, &key, slot);
>> +		if (key.objectid != ino)
>> +			goto out;
>> +
>> +		/*
>> +		 * We ignore some types to make life easier:
>> +		 * - XATTR
>> +		 *   Both REG and DIR can have xattr, so not useful
>> +		 */
>> +		switch (key.type) {
>> +		case BTRFS_INODE_REF_KEY:
>> +			/* The most accurate way to determine filetype */
>> +			cur = btrfs_item_ptr_offset(leaf, slot);
>> +			end = cur + btrfs_item_size_nr(leaf, slot);
>> +			while (cur < end) {
>> +				iref = (struct btrfs_inode_ref *)cur;
>> +				namelen = min_t(u32, end - cur - sizeof(&iref),
>> +					btrfs_inode_ref_name_len(leaf, iref));
>> +				index = btrfs_inode_ref_index(leaf, iref);
>> +				read_extent_buffer(leaf, namebuf,
>> +					(unsigned long)(iref + 1), namelen);
>> +				ret = find_file_type(root, ino, key.offset,
>> +						index, namebuf, namelen,
>> +						&imode);
>> +				if (ret == 0) {
>> +					found = true;
>> +					goto out;
>> +				}
>> +				cur += sizeof(*iref) + namelen;
>> +			}
>> +			break;
>> +		case BTRFS_DIR_ITEM_KEY:
>> +		case BTRFS_DIR_INDEX_KEY:
>> +			imode = S_IFDIR;
>
> You should set found here.

Yep, also found that and fixed in next version.

Thanks,
Qu

> Otherwise this function returns ENOENT in those 2 branches.
> BTW in relation to out private conversation I found this while trying to create test
> cases which will trigger all branches. The fact it found bugs proves we should strive
> for as much testing coverage as possible.
>
>> +			goto out;
>> +		case BTRFS_EXTENT_DATA_KEY:
>> +			/*
>> +			 * Both REG and LINK could have EXTENT_DATA.
>> +			 * We just fall back to REG as user can inspect the
>> +			 * content.
>> +			 */
>> +			imode = S_IFREG;
> ditto
>
>> +			goto out;
>> +		}
>> +	}
>> +
>> +out:
>> +	/*
>> +	 * Both CHR and BLK uses rdev, no way to distinguish them, so fall back
>> +	 * to BLK. But either way it doesn't really matter, as CHR/BLK on btrfs
>> +	 * should be pretty rare, and no real data will be lost.
>> +	 */
>> +	if (!found && btrfs_stack_inode_rdev(&iitem) != 0) {
>> +		imode = S_IFBLK;
>> +		found = true;
>> +	}
>> +
>> +	if (found)
>> +		*imode_ret = (imode | priv);
>> +	else
>> +		ret = -ENOENT;
>> +	return ret;
>> +}
>> +
>>  /*
>>   * Reset the inode mode of the inode specified by @path.
>
> <snip>
>
Nikolay Borisov Sept. 11, 2019, 12:27 p.m. UTC | #6
On 11.09.19 г. 3:39 ч., Qu Wenruo wrote:
> [...]
>>> - Search for DIR_INDEX/DIR_ITEM
>>>   If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
>>>   just after the INODE_ITEM.
>>>   If any can be found, it's definitely a directory.
>>
>> This needs an explicit satement that it will only work for non-empty files and directories
> 
> Indeed.
> 
>>>
>>> - Search for EXTENT_DATA
>>>   If EXTENT_DATA can be found, it's either REG or LNK.
>>>   For this case, we default to REG, as user can inspect the file to
>>>   determine if it's a file or just a path.
>>>
>>> - Use rdev to detect BLK/CHR
>>>   If all above fails, but INODE_ITEM has non-zero rdev, then it's either
>>>   a BLK or CHR file. Then we default to BLK.
>>>
>>> - Fail out if none of above methods succeeded
>>>   No educated guess to make things worse.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  check/mode-common.c | 130 +++++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 117 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/check/mode-common.c b/check/mode-common.c
>>> index c0ddc50a1dd0..abea2ceda4c4 100644
>>> --- a/check/mode-common.c
>>> +++ b/check/mode-common.c
>>> @@ -935,6 +935,113 @@ out:
>>>  	return ret;
>>>  }
>>>
>>> +static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
>>> +			u32 *imode_ret)
>>> +{
>>> +	struct btrfs_key key;
>>> +	struct btrfs_inode_item iitem;
>>> +	const u32 priv = 0700;
>>> +	bool found = false;
>>> +	u64 ino;
>>> +	u32 imode;
>>> +	int ret = 0;
>>> +
>>> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>>> +	ino = key.objectid;
>>> +	read_extent_buffer(path->nodes[0], &iitem,
>>> +			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
>>> +			sizeof(iitem));
>>> +	/* root inode */
>>> +	if (ino == BTRFS_FIRST_FREE_OBJECTID) {
>>> +		imode = S_IFDIR;
>>> +		found = true;
>>> +		goto out;
>>> +	}
>>> +
>>> +	while (1) {
>>> +		struct btrfs_inode_ref *iref;
>>> +		struct extent_buffer *leaf;
>>> +		unsigned long cur;
>>> +		unsigned long end;
>>> +		char namebuf[BTRFS_NAME_LEN] = {0};
>>> +		u64 index;
>>> +		u32 namelen;
>>> +		int slot;
>>> +
>>> +		ret = btrfs_next_item(root, path);
>>> +		if (ret > 0) {
>>> +			/* falls back to rdev check */
>>> +			ret = 0;
>>> +			goto out;
>>> +		}
>>
>> In my testing if an inode is the last one in the leaf and it doesn't have
>> an INODE_REF item then it won't be repaired. But e.g. it can have perfectly
>> valid DIR_ITEM/DIR_INDEX entries which describe this inode as being a file. E.g.
>>
>> 	item 2 key (256 DIR_ITEM 388586943) itemoff 16076 itemsize 35
>> 		location key (260 INODE_ITEM 0) type FILE
>> 		transid 7 data_len 0 name_len 5
>> 		name: file3
>>
>> 	.....
>> 	item 15 key (260 INODE_ITEM 0) itemoff 15184 itemsize 160
>> 		generation 7 transid 7 size 0 nbytes 0
>> 		block group 0 mode 26772225102 links 1 uid 0 gid 0 rdev 0
>> 		sequence 1 flags 0x0(none)
>> 		atime 1568127261.284993602 (2019-09-10 14:54:21)
>> 		ctime 1568127261.284993602 (2019-09-10 14:54:21)
>> 		mtime 1568127261.284993602 (2019-09-10 14:54:21)
>> 		otime 1568127261.284993602 (2019-09-10 14:54:21)
>>
>> I have intentionally deleted INODE_REF too see what's happening. Is this intended?
> 
> Yes, completely intended.
> 
> For this case, you need to iterate through the whole tree to locate the
> DIR_INDEX to fix, which is not really possible with current code base,
> which only search based on the INODE, not the DIR_INDEX/DIR_ITEM from
> its parent dir.
> 
> Furthermore, didn't you mention that if we don't have confident about
> the imode, then we should fail out instead of using REG as default?

I did, what I supposed could happen here is if we can't find an
INODE_REF then do a search for DIR_INDEX/DIR_ITEM since those items also
contain the type of the inode they are pointing to. Fixing really boils
down to exploiting redundancy in the on-disk metadata to repair existing
items. Here comes the question, of course, what to do if we don't have
an INODE_REF and DIR_INDEX/DIR_ITEM are broken. I guess it's a judgement
call, currently you decided that inode_ref will be the source of
information. I'm fine with that I was merely pointing there is more we
can do. Of course we need to weigh the pros/cons between code complexity
and the returns we get.

Just that I will advise to make it explicit in the changelog that you
made a judgement call to utilize INODE_REF as the starting point of
doing imode repair but not necessarily the only one.


<snip>
Qu Wenruo Sept. 11, 2019, 12:44 p.m. UTC | #7
[...]
>>> I have intentionally deleted INODE_REF too see what's happening. Is this intended?
>>
>> Yes, completely intended.
>>
>> For this case, you need to iterate through the whole tree to locate the
>> DIR_INDEX to fix, which is not really possible with current code base,
>> which only search based on the INODE, not the DIR_INDEX/DIR_ITEM from
>> its parent dir.
>>
>> Furthermore, didn't you mention that if we don't have confident about
>> the imode, then we should fail out instead of using REG as default?
>
> I did, what I supposed could happen here is if we can't find an
> INODE_REF then do a search for DIR_INDEX/DIR_ITEM since those items also
> contain the type of the inode they are pointing to. Fixing really boils
> down to exploiting redundancy in the on-disk metadata to repair existing
> items. Here comes the question, of course, what to do if we don't have
> an INODE_REF and DIR_INDEX/DIR_ITEM are broken. I guess it's a judgement
> call, currently you decided that inode_ref will be the source of
> information. I'm fine with that I was merely pointing there is more we
> can do. Of course we need to weigh the pros/cons between code complexity
> and the returns we get.

BTW, the kinda of "inconsistency" between different judgement calls is
somewhat causing the problem you see in some more complex repair situation.

To me and the old Fujitsu guys, what we (at least used to) believe is,
if we have any chance to recover, then try our best. (just like what I
did when we can't find inode ref)

But if we have redundant info, like the INODE_REF/DIR_INDEX/DIR_ITEM, we
go democracy, two valid items then recovery, one valid item, then
discard it.

If we have a very consistent policy, like anything wrong the discard the
bad part even it will cause data loss.
The recovery will be much easier and consistent.
In this imode case, we don't "recover" the inode, but remove it
completely, and if the INODE_REF/DIR_INDEX/DIR_ITEM repair follows the
same standard, a lot of things will be much easier to do.
(Now I'm regretting the call I did for the INODE_REF/DIR_INDEX/DIR_ITEM
repair code)

But I guess that's the ultimate judgement call made way beyond I joined
the btrfs-progs development (see the extent repair code and bad key
order repair), thus not that easy to change without a large rework...

>
> Just that I will advise to make it explicit in the changelog that you
> made a judgement call to utilize INODE_REF as the starting point of
> doing imode repair but not necessarily the only one.

No problem, I'd add more comment and the disadvantage of the current
behavior.

Thanks,
Qu

>
>
> <snip>
>

Patch
diff mbox series

diff --git a/check/mode-common.c b/check/mode-common.c
index c0ddc50a1dd0..abea2ceda4c4 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -935,6 +935,113 @@  out:
 	return ret;
 }
 
+static int detect_imode(struct btrfs_root *root, struct btrfs_path *path,
+			u32 *imode_ret)
+{
+	struct btrfs_key key;
+	struct btrfs_inode_item iitem;
+	const u32 priv = 0700;
+	bool found = false;
+	u64 ino;
+	u32 imode;
+	int ret = 0;
+
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	ino = key.objectid;
+	read_extent_buffer(path->nodes[0], &iitem,
+			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
+			sizeof(iitem));
+	/* root inode */
+	if (ino == BTRFS_FIRST_FREE_OBJECTID) {
+		imode = S_IFDIR;
+		found = true;
+		goto out;
+	}
+
+	while (1) {
+		struct btrfs_inode_ref *iref;
+		struct extent_buffer *leaf;
+		unsigned long cur;
+		unsigned long end;
+		char namebuf[BTRFS_NAME_LEN] = {0};
+		u64 index;
+		u32 namelen;
+		int slot;
+
+		ret = btrfs_next_item(root, path);
+		if (ret > 0) {
+			/* falls back to rdev check */
+			ret = 0;
+			goto out;
+		}
+		if (ret < 0)
+			goto out;
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (key.objectid != ino)
+			goto out;
+
+		/*
+		 * We ignore some types to make life easier:
+		 * - XATTR
+		 *   Both REG and DIR can have xattr, so not useful
+		 */
+		switch (key.type) {
+		case BTRFS_INODE_REF_KEY:
+			/* The most accurate way to determine filetype */
+			cur = btrfs_item_ptr_offset(leaf, slot);
+			end = cur + btrfs_item_size_nr(leaf, slot);
+			while (cur < end) {
+				iref = (struct btrfs_inode_ref *)cur;
+				namelen = min_t(u32, end - cur - sizeof(&iref),
+					btrfs_inode_ref_name_len(leaf, iref));
+				index = btrfs_inode_ref_index(leaf, iref);
+				read_extent_buffer(leaf, namebuf,
+					(unsigned long)(iref + 1), namelen);
+				ret = find_file_type(root, ino, key.offset,
+						index, namebuf, namelen,
+						&imode);
+				if (ret == 0) {
+					found = true;
+					goto out;
+				}
+				cur += sizeof(*iref) + namelen;
+			}
+			break;
+		case BTRFS_DIR_ITEM_KEY:
+		case BTRFS_DIR_INDEX_KEY:
+			imode = S_IFDIR;
+			goto out;
+		case BTRFS_EXTENT_DATA_KEY:
+			/*
+			 * Both REG and LINK could have EXTENT_DATA.
+			 * We just fall back to REG as user can inspect the
+			 * content.
+			 */
+			imode = S_IFREG;
+			goto out;
+		}
+	}
+
+out:
+	/*
+	 * Both CHR and BLK uses rdev, no way to distinguish them, so fall back
+	 * to BLK. But either way it doesn't really matter, as CHR/BLK on btrfs
+	 * should be pretty rare, and no real data will be lost.
+	 */
+	if (!found && btrfs_stack_inode_rdev(&iitem) != 0) {
+		imode = S_IFBLK;
+		found = true;
+	}
+
+	if (found)
+		*imode_ret = (imode | priv);
+	else
+		ret = -ENOENT;
+	return ret;
+}
+
 /*
  * Reset the inode mode of the inode specified by @path.
  *
@@ -951,22 +1058,19 @@  int repair_imode_common(struct btrfs_root *root, struct btrfs_path *path)
 	u32 imode;
 	int ret;
 
-	if (root->root_key.objectid != BTRFS_ROOT_TREE_OBJECTID) {
-		error(
-		"repair inode mode outside of root tree is not supported yet");
-		return -ENOTTY;
-	}
 	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
 	ASSERT(key.type == BTRFS_INODE_ITEM_KEY);
-	if (key.objectid != BTRFS_ROOT_TREE_DIR_OBJECTID &&
-	    !is_fstree(key.objectid)) {
-		error("unsupported ino %llu", key.objectid);
-		return -ENOTTY;
+	if (root->objectid == BTRFS_ROOT_TREE_OBJECTID) {
+		/* In root tree we only have two possible imode */
+		if (key.objectid == BTRFS_ROOT_TREE_OBJECTID)
+			imode = S_IFDIR | 0755;
+		else
+			imode = S_IFREG | 0600;
+	} else {
+		ret = detect_imode(root, path, &imode);
+		if (ret < 0)
+			return ret;
 	}
-	if (key.objectid == BTRFS_ROOT_TREE_DIR_OBJECTID)
-		imode = 040755;
-	else
-		imode = 0100600;
 
 	trans = btrfs_start_transaction(root, 1);
 	if (IS_ERR(trans)) {