diff mbox

[v2] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map

Message ID 1411030542-15677-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Accepted
Headers show

Commit Message

Qu Wenruo Sept. 18, 2014, 8:55 a.m. UTC
The following commit enhanced the merge_extent_mapping() to reduce
fragment in extent map tree, but it can't handle case which existing
lies before map_start:
51f39 btrfs: Use right extent length when inserting overlap extent map.

[BUG]
When existing extent map's start is before map_start,
the em->len will be minus, which will corrupt the extent map and fail to
insert the new extent map.
This will happen when someone get a large extent map, but when it is
going to insert it into extent map tree, some one has already commit
some write and split the huge extent into small parts.

[REPRODUCER]
It is very easy to tiger using filebench with randomrw personality.
It is about 100% to reproduce when using 8G preallocated file in 60s
randonrw test.

[FIX]
This patch can now handle any existing extent position.
Since it does not directly use existing->start, now it will find the
previous and next extent around map_start.
So the old existing->start < map_start bug will never happen again.

[ENHANCE]
This patch will insert the best fitted extent map into extent map tree,
other than the oldest [map_start, map_start + sectorsize) or the
relatively newer but not perfect [map_start, existing->start).

The patch will first search existing extent that does not intersects with
the desired map range [map_start, map_start + len).
The existing extent will be either before or behind map_start, and based
on the existing extent, we can find out the previous and next extent
around map_start.

So the best fitted extent would be [prev->end, next->start).
For prev or next is not found, em->start would be prev->end and em->end
wold be next->start.

With this patch, the fragment in extent map tree should be reduced much
more than the 51f39 commit and reduce an unneeded extent map tree search.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
---
changelog:
v2:
    Liu Bo points out that the if() use 'start + len <= existing->start'
    is not equal to original if(), which may cause problem in no-holes
    mode.
---
 fs/btrfs/inode.c | 79 ++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 57 insertions(+), 22 deletions(-)

Comments

Satoru Takeuchi Sept. 19, 2014, 7:33 a.m. UTC | #1
(2014/09/18 17:55), Qu Wenruo wrote:
> The following commit enhanced the merge_extent_mapping() to reduce
> fragment in extent map tree, but it can't handle case which existing
> lies before map_start:
> 51f39 btrfs: Use right extent length when inserting overlap extent map.
> 
> [BUG]
> When existing extent map's start is before map_start,
> the em->len will be minus, which will corrupt the extent map and fail to
> insert the new extent map.
> This will happen when someone get a large extent map, but when it is
> going to insert it into extent map tree, some one has already commit
> some write and split the huge extent into small parts.
> 
> [REPRODUCER]
> It is very easy to tiger using filebench with randomrw personality.
> It is about 100% to reproduce when using 8G preallocated file in 60s
> randonrw test.
> 
> [FIX]
> This patch can now handle any existing extent position.
> Since it does not directly use existing->start, now it will find the
> previous and next extent around map_start.
> So the old existing->start < map_start bug will never happen again.
> 
> [ENHANCE]
> This patch will insert the best fitted extent map into extent map tree,
> other than the oldest [map_start, map_start + sectorsize) or the
> relatively newer but not perfect [map_start, existing->start).
> 
> The patch will first search existing extent that does not intersects with
> the desired map range [map_start, map_start + len).
> The existing extent will be either before or behind map_start, and based
> on the existing extent, we can find out the previous and next extent
> around map_start.
> 
> So the best fitted extent would be [prev->end, next->start).
> For prev or next is not found, em->start would be prev->end and em->end
> wold be next->start.
> 
> With this patch, the fragment in extent map tree should be reduced much
> more than the 51f39 commit and reduce an unneeded extent map tree search.
> 
> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

Tested-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>

Sorry to late reply. I confirmed this problem happens without
this patch and does not happen with this patch.

Thanks,
Satoru

> ---
> changelog:
> v2:
>      Liu Bo points out that the if() use 'start + len <= existing->start'
>      is not equal to original if(), which may cause problem in no-holes
>      mode.
> ---
>   fs/btrfs/inode.c | 79 ++++++++++++++++++++++++++++++++++++++++----------------
>   1 file changed, 57 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 016c403..b3864b7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6191,21 +6191,60 @@ out_fail_inode:
>   	goto out_fail;
>   }
>   
> +/* Find next extent map of a given extent map, caller needs to ensure locks */
> +static struct extent_map *next_extent_map(struct extent_map *em)
> +{
> +	struct rb_node *next;
> +
> +	next = rb_next(&em->rb_node);
> +	if (!next)
> +		return NULL;
> +	return container_of(next, struct extent_map, rb_node);
> +}
> +
> +static struct extent_map *prev_extent_map(struct extent_map *em)
> +{
> +	struct rb_node *prev;
> +
> +	prev = rb_prev(&em->rb_node);
> +	if (!prev)
> +		return NULL;
> +	return container_of(prev, struct extent_map, rb_node);
> +}
> +
>   /* helper for btfs_get_extent.  Given an existing extent in the tree,
> + * the existing extent is the nearest extent to map_start,
>    * and an extent that you want to insert, deal with overlap and insert
> - * the new extent into the tree.
> + * the best fitted new extent into the tree.
>    */
>   static int merge_extent_mapping(struct extent_map_tree *em_tree,
>   				struct extent_map *existing,
>   				struct extent_map *em,
>   				u64 map_start)
>   {
> +	struct extent_map *prev;
> +	struct extent_map *next;
> +	u64 start;
> +	u64 end;
>   	u64 start_diff;
>   
>   	BUG_ON(map_start < em->start || map_start >= extent_map_end(em));
> -	start_diff = map_start - em->start;
> -	em->start = map_start;
> -	em->len = existing->start - em->start;
> +
> +	if (existing->start > map_start) {
> +		next = existing;
> +		prev = prev_extent_map(next);
> +	} else {
> +		prev = existing;
> +		next = next_extent_map(prev);
> +	}
> +
> +	start = prev ? extent_map_end(prev) : em->start;
> +	start = max_t(u64, start, em->start);
> +	end = next ? next->start : extent_map_end(em);
> +	end = min_t(u64, end, extent_map_end(em));
> +	start_diff = start - em->start;
> +	em->start = start;
> +	em->len = end - start;
>   	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
>   	    !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
>   		em->block_start += start_diff;
> @@ -6482,25 +6521,21 @@ insert:
>   
>   		ret = 0;
>   
> -		existing = lookup_extent_mapping(em_tree, start, len);
> -		if (existing && (existing->start > start ||
> -		    existing->start + existing->len <= start)) {
> +		existing = search_extent_mapping(em_tree, start, len);
> +		/*
> +		 * existing will always be non-NULL, since there must be
> +		 * extent causing the -EEXIST.
> +		 */
> +		if (start >= extent_map_end(existing) ||
> +		    start < existing->start) {
> +			/*
> +			 * The existing extent map is the one nearest to
> +			 * the [start, start + len) range which overlaps
> +			 */
> +			err = merge_extent_mapping(em_tree, existing,
> +						   em, start);
>   			free_extent_map(existing);
> -			existing = NULL;
> -		}
> -		if (!existing) {
> -			existing = lookup_extent_mapping(em_tree, em->start,
> -							 em->len);
> -			if (existing) {
> -				err = merge_extent_mapping(em_tree, existing,
> -							   em, start);
> -				free_extent_map(existing);
> -				if (err) {
> -					free_extent_map(em);
> -					em = NULL;
> -				}
> -			} else {
> -				err = -EIO;
> +			if (err) {
>   				free_extent_map(em);
>   				em = NULL;
>   			}
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 016c403..b3864b7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6191,21 +6191,60 @@  out_fail_inode:
 	goto out_fail;
 }
 
+/* Find next extent map of a given extent map, caller needs to ensure locks */
+static struct extent_map *next_extent_map(struct extent_map *em)
+{
+	struct rb_node *next;
+
+	next = rb_next(&em->rb_node);
+	if (!next)
+		return NULL;
+	return container_of(next, struct extent_map, rb_node);
+}
+
+static struct extent_map *prev_extent_map(struct extent_map *em)
+{
+	struct rb_node *prev;
+
+	prev = rb_prev(&em->rb_node);
+	if (!prev)
+		return NULL;
+	return container_of(prev, struct extent_map, rb_node);
+}
+
 /* helper for btfs_get_extent.  Given an existing extent in the tree,
+ * the existing extent is the nearest extent to map_start,
  * and an extent that you want to insert, deal with overlap and insert
- * the new extent into the tree.
+ * the best fitted new extent into the tree.
  */
 static int merge_extent_mapping(struct extent_map_tree *em_tree,
 				struct extent_map *existing,
 				struct extent_map *em,
 				u64 map_start)
 {
+	struct extent_map *prev;
+	struct extent_map *next;
+	u64 start;
+	u64 end;
 	u64 start_diff;
 
 	BUG_ON(map_start < em->start || map_start >= extent_map_end(em));
-	start_diff = map_start - em->start;
-	em->start = map_start;
-	em->len = existing->start - em->start;
+
+	if (existing->start > map_start) {
+		next = existing;
+		prev = prev_extent_map(next);
+	} else {
+		prev = existing;
+		next = next_extent_map(prev);
+	}
+
+	start = prev ? extent_map_end(prev) : em->start;
+	start = max_t(u64, start, em->start);
+	end = next ? next->start : extent_map_end(em);
+	end = min_t(u64, end, extent_map_end(em));
+	start_diff = start - em->start;
+	em->start = start;
+	em->len = end - start;
 	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
 	    !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
 		em->block_start += start_diff;
@@ -6482,25 +6521,21 @@  insert:
 
 		ret = 0;
 
-		existing = lookup_extent_mapping(em_tree, start, len);
-		if (existing && (existing->start > start ||
-		    existing->start + existing->len <= start)) {
+		existing = search_extent_mapping(em_tree, start, len);
+		/*
+		 * existing will always be non-NULL, since there must be
+		 * extent causing the -EEXIST.
+		 */
+		if (start >= extent_map_end(existing) ||
+		    start < existing->start) {
+			/*
+			 * The existing extent map is the one nearest to
+			 * the [start, start + len) range which overlaps
+			 */
+			err = merge_extent_mapping(em_tree, existing,
+						   em, start);
 			free_extent_map(existing);
-			existing = NULL;
-		}
-		if (!existing) {
-			existing = lookup_extent_mapping(em_tree, em->start,
-							 em->len);
-			if (existing) {
-				err = merge_extent_mapping(em_tree, existing,
-							   em, start);
-				free_extent_map(existing);
-				if (err) {
-					free_extent_map(em);
-					em = NULL;
-				}
-			} else {
-				err = -EIO;
+			if (err) {
 				free_extent_map(em);
 				em = NULL;
 			}