diff mbox series

Btrfs: correctly caculate item size used when item key collision happends

Message ID 1534237532-1310-1-git-send-email-ethanwu@synology.com (mailing list archive)
State New, archived
Headers show
Series Btrfs: correctly caculate item size used when item key collision happends | expand

Commit Message

ethanwu Aug. 14, 2018, 9:05 a.m. UTC
Item key collision is allowed for some item types, like dir item and
inode refs, but the overall item size is limited by the leafsize.

item size(ins_len) passed from btrfs_insert_empty_items to
btrfs_search_slot already contains size of btrfs_item.

When btrfs_search_slot reaches leaf, we'll see if we need to split leaf,
since the ins_len includes one struct btrfs_item, the check might
fail even though new item we try to insert could merge into the existing
one without adding new btrfs_item.
And split_leaf return -EOVERFLOW from following code:
if (extend && data_size + btrfs_item_size_nr(l, slot) +
    sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
    return -EOVERFLOW;

In most cases, when callers receive -EOVERFLOW, they either return
this error or handle in different ways. For example, in normal dir item
insertion the userspace will get errno EOVERFLOW; in inode ref case
INODE_EXTREF is used instead if INODE_REF is full.

However, this is not the case for rename. To avoid the unrecoverable
situation in rename, btrfs_check_dir_item_collision is called in
early phase of rename. In this function, when item key collision is
detected leaf space is checked:

data_size = sizeof(*di) + name_len;
if (data_size + btrfs_item_size_nr(leaf, slot) +
    sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))

the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
refers to existing item size.

The check condition here is not consistent with the btrfs_search_slot
when item key collision happens. We might pass check here but fail
at btrfs_search_slot.

in the rename call path
btrfs_add_link
  btrfs_insert_dir_item
    insert_with_overflow
      btrfs_insert_empty_items (btrfs item is counted)
        btrfs_search_slot

if (ins_len > 0 &&
    btrfs_leaf_free_space(fs_info, b) < ins_len) {

The ins_len here contains btrfs_item and the item data, but this
btrfs_item is already in leaf used space, two btrfs_item is counted and
we only need one when this is item key collision cases.
Therfore, rename fails, and abort transaction is triggered with
following error messages:
BTRFS: error (device loop0) in btrfs_rename:9870: errno=-75 unknown

There are two ways to fix rename issue, one is to revert the patch
878f2d2cb355 Btrfs: fix max dir item size calculation
to make the condition consistent.

The other way is to handle the leaf space check correctly when
collision happens. I prefer the second one since it correct leaf
space check in collision case. This fix needs unify the usage of ins_len
in btrfs_search_slot to contain btrfs_item anyway and
adjust all callers of btrfs_search_slot that intentionally pass ins_len
without btrfs_item size to add size of btrfs_item from now.

dir item hash collision is not easy to reproduce.
The following is a leaf sample filled with inode ref.

Before applying the patch, when item data reaches 16200
and we want to add another link with namelen 26(inode ref size 36)
It will not pass the leafspace check
10(inode ref item) + 26(name len) + 25(btrfs item) >
    leaf free space 58
and use BTRFS_INODE_EXTREF_KEY instead.
Nevertheless, The 25 bytes btrfs_item is not needed because the
newly inserted item could be merged with the existing one.

before patch:
leaf 31571968 items 1 free space 58 generation 178 owner 262
fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
    item 0 key (261 INODE_REF 257) itemoff 83 itemsize 16200
        inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
        inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
        inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
        ...
        inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450

after patch:
leaf 31899648 items 1 free space 22 generation 180 owner 262
fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
    item 0 key (263 INODE_REF 262) itemoff 47 itemsize 16236
        inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
        inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
        inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
        ...
        inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450
        inode ref index 452 namelen 26 name: abcdefghijklmnopqrstuv0451

Signed-off-by: ethanwu <ethanwu@synology.com>
---
 fs/btrfs/ctree.c       | 15 +++++++++++++--
 fs/btrfs/extent-tree.c |  5 +++--
 fs/btrfs/file-item.c   |  2 +-
 3 files changed, 17 insertions(+), 5 deletions(-)

Comments

Hans van Kranenburg Aug. 14, 2018, 11:07 p.m. UTC | #1
On 08/14/2018 11:05 AM, ethanwu wrote:
> Item key collision is allowed for some item types, like dir item and
> inode refs, but the overall item size is limited by the leafsize.
> 
> item size(ins_len) passed from btrfs_insert_empty_items to
> btrfs_search_slot already contains size of btrfs_item.
> 
> When btrfs_search_slot reaches leaf, we'll see if we need to split leaf,
> since the ins_len includes one struct btrfs_item, the check might
> fail even though new item we try to insert could merge into the existing
> one without adding new btrfs_item.
> And split_leaf return -EOVERFLOW from following code:
> if (extend && data_size + btrfs_item_size_nr(l, slot) +
>     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
>     return -EOVERFLOW;
> 
> In most cases, when callers receive -EOVERFLOW, they either return
> this error or handle in different ways. For example, in normal dir item
> insertion the userspace will get errno EOVERFLOW; in inode ref case
> INODE_EXTREF is used instead if INODE_REF is full.
> 
> However, this is not the case for rename. To avoid the unrecoverable
> situation in rename, btrfs_check_dir_item_collision is called in
> early phase of rename. In this function, when item key collision is
> detected leaf space is checked:
> 
> data_size = sizeof(*di) + name_len;
> if (data_size + btrfs_item_size_nr(leaf, slot) +
>     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
> 
> the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
> refers to existing item size.
> 
> The check condition here is not consistent with the btrfs_search_slot
> when item key collision happens. We might pass check here but fail
> at btrfs_search_slot.
> 
> in the rename call path
> btrfs_add_link
>   btrfs_insert_dir_item
>     insert_with_overflow
>       btrfs_insert_empty_items (btrfs item is counted)
>         btrfs_search_slot
> 
> if (ins_len > 0 &&
>     btrfs_leaf_free_space(fs_info, b) < ins_len) {
> 
> The ins_len here contains btrfs_item and the item data, but this
> btrfs_item is already in leaf used space, two btrfs_item is counted and
> we only need one when this is item key collision cases.
> Therfore, rename fails, and abort transaction is triggered with
> following error messages:
> BTRFS: error (device loop0) in btrfs_rename:9870: errno=-75 unknown
> 
> There are two ways to fix rename issue, one is to revert the patch
> 878f2d2cb355 Btrfs: fix max dir item size calculation
> to make the condition consistent.
> 
> The other way is to handle the leaf space check correctly when
> collision happens. I prefer the second one since it correct leaf
> space check in collision case. This fix needs unify the usage of ins_len
> in btrfs_search_slot to contain btrfs_item anyway and
> adjust all callers of btrfs_search_slot that intentionally pass ins_len
> without btrfs_item size to add size of btrfs_item from now.
> 
> dir item hash collision is not easy to reproduce.

I used this when testing dir_item related things:

http://crypto.junod.info/2012/12/13/hash-dos-and-btrfs/

> The following is a leaf sample filled with inode ref.
> 
> Before applying the patch, when item data reaches 16200
> and we want to add another link with namelen 26(inode ref size 36)
> It will not pass the leafspace check
> 10(inode ref item) + 26(name len) + 25(btrfs item) >
>     leaf free space 58
> and use BTRFS_INODE_EXTREF_KEY instead.
> Nevertheless, The 25 bytes btrfs_item is not needed because the
> newly inserted item could be merged with the existing one.
> 
> before patch:
> leaf 31571968 items 1 free space 58 generation 178 owner 262
> fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
> chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
>     item 0 key (261 INODE_REF 257) itemoff 83 itemsize 16200
>         inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
>         inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
>         inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
>         ...
>         inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450
> 
> after patch:
> leaf 31899648 items 1 free space 22 generation 180 owner 262
> fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
> chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
>     item 0 key (263 INODE_REF 262) itemoff 47 itemsize 16236
>         inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
>         inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
>         inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
>         ...
>         inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450
>         inode ref index 452 namelen 26 name: abcdefghijklmnopqrstuv0451
> 
> Signed-off-by: ethanwu <ethanwu@synology.com>
> ---
>  fs/btrfs/ctree.c       | 15 +++++++++++++--
>  fs/btrfs/extent-tree.c |  5 +++--
>  fs/btrfs/file-item.c   |  2 +-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4bc326d..3614dd9 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2678,8 +2678,8 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
>   * @p:		Holds all btree nodes along the search path
>   * @root:	The root node of the tree
>   * @key:	The key we are looking for
> - * @ins_len:	Indicates purpose of search, for inserts it is 1, for
> - *		deletions it's -1. 0 for plain searches
> + * @ins_len:	Indicates purpose of search, for inserts it is item size
> + *		including btrfs_item, for deletions it's -1. 0 for plain searches
>   * @cow:	boolean should CoW operations be performed. Must always be 1
>   *		when modifying the tree.
>   *
> @@ -2893,6 +2893,17 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  			}
>  		} else {
>  			p->slots[level] = slot;
> +			/*
> +			 * item key collision happens. In this case, if we are allow
> +			 * to insert the item(for example, in dir_item case, item key
> +			 * collision is allowed), it will be merged with the original
> +			 * item. Only the item size grows, no new btrfs item will be
> +			 * added. Since the ins_len already accounts the size btrfs_item,
> +			 * this value is counted twice. Duduct this value here so the
> +			 * leaf space check will be correct.
> +			 */
> +			if (ret == 0 && ins_len > 0)
> +				ins_len -= sizeof(struct btrfs_item);
>  			if (ins_len > 0 &&
>  			    btrfs_leaf_free_space(fs_info, b) < ins_len) {
>  				if (write_lock_level < 1) {
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58..4e3aa09 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1094,7 +1094,7 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
>  
>  	new_size -= sizeof(*ei0);
>  	ret = btrfs_search_slot(trans, root, &key, path,
> -				new_size + extra_size, 1);
> +				new_size + extra_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0)
>  		return ret;
>  	BUG_ON(ret); /* Corruption */
> @@ -1644,7 +1644,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
>  	}
>  
>  again:
> -	ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
> +	ret = btrfs_search_slot(trans, root, &key, path,
> +			    extra_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0) {
>  		err = ret;
>  		goto out;
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index f9dd6d1..0b6c581 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -804,7 +804,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
>  	 */
>  	btrfs_release_path(path);
>  	ret = btrfs_search_slot(trans, root, &file_key, path,
> -				csum_size, 1);
> +				csum_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0)
>  		goto fail_unlock;
>  
>
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4bc326d..3614dd9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2678,8 +2678,8 @@  static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
  * @p:		Holds all btree nodes along the search path
  * @root:	The root node of the tree
  * @key:	The key we are looking for
- * @ins_len:	Indicates purpose of search, for inserts it is 1, for
- *		deletions it's -1. 0 for plain searches
+ * @ins_len:	Indicates purpose of search, for inserts it is item size
+ *		including btrfs_item, for deletions it's -1. 0 for plain searches
  * @cow:	boolean should CoW operations be performed. Must always be 1
  *		when modifying the tree.
  *
@@ -2893,6 +2893,17 @@  int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 			}
 		} else {
 			p->slots[level] = slot;
+			/*
+			 * item key collision happens. In this case, if we are allow
+			 * to insert the item(for example, in dir_item case, item key
+			 * collision is allowed), it will be merged with the original
+			 * item. Only the item size grows, no new btrfs item will be
+			 * added. Since the ins_len already accounts the size btrfs_item,
+			 * this value is counted twice. Duduct this value here so the
+			 * leaf space check will be correct.
+			 */
+			if (ret == 0 && ins_len > 0)
+				ins_len -= sizeof(struct btrfs_item);
 			if (ins_len > 0 &&
 			    btrfs_leaf_free_space(fs_info, b) < ins_len) {
 				if (write_lock_level < 1) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58..4e3aa09 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1094,7 +1094,7 @@  static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
 
 	new_size -= sizeof(*ei0);
 	ret = btrfs_search_slot(trans, root, &key, path,
-				new_size + extra_size, 1);
+				new_size + extra_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0)
 		return ret;
 	BUG_ON(ret); /* Corruption */
@@ -1644,7 +1644,8 @@  int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
 	}
 
 again:
-	ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
+	ret = btrfs_search_slot(trans, root, &key, path,
+			    extra_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0) {
 		err = ret;
 		goto out;
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index f9dd6d1..0b6c581 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -804,7 +804,7 @@  int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 	 */
 	btrfs_release_path(path);
 	ret = btrfs_search_slot(trans, root, &file_key, path,
-				csum_size, 1);
+				csum_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0)
 		goto fail_unlock;