diff mbox series

btrfs: don't loop again over pinned extent maps when shrinking extent maps

Message ID cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com (mailing list archive)
State New
Headers show
Series btrfs: don't loop again over pinned extent maps when shrinking extent maps | expand

Commit Message

Filipe Manana July 1, 2024, 9:23 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

During extent map shrinking, while iterating over the extent maps of an
inode, if we happen to find a lot of pinned extent maps and we need to
reschedule, we'll start iterating the extent map tree from its first
extent map. This can result in visiting the same extent maps again, and if
they are not yet unpinned, we are just wasting time and can end up
iterating over them again if we happen to reschedule again before finding
an extent map that is not pinned - this could happen yet more times if the
unpinning doesn't happen soon (at ordered extent completion).

So improve on this by starting on the next extent map everytime we need
to reschedule. Any previously pinned extent maps we be checked again the
next time the extent map shrinker is run (if needed).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

This applies against the "for-next" branch, for a version that
applies cleanly to 6.10-rcX:

https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch

 fs/btrfs/extent_map.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

Comments

Qu Wenruo July 1, 2024, 9:38 a.m. UTC | #1
在 2024/7/1 18:53, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> During extent map shrinking, while iterating over the extent maps of an
> inode, if we happen to find a lot of pinned extent maps and we need to
> reschedule, we'll start iterating the extent map tree from its first
> extent map. This can result in visiting the same extent maps again, and if
> they are not yet unpinned, we are just wasting time and can end up
> iterating over them again if we happen to reschedule again before finding
> an extent map that is not pinned - this could happen yet more times if the
> unpinning doesn't happen soon (at ordered extent completion).
>
> So improve on this by starting on the next extent map everytime we need
> to reschedule. Any previously pinned extent maps we be checked again the
> next time the extent map shrinker is run (if needed).
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>
> This applies against the "for-next" branch, for a version that
> applies cleanly to 6.10-rcX:
>
> https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
>
>   fs/btrfs/extent_map.c | 24 ++++++++++++++++++------
>   1 file changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index b869a0ee24d2..2d75059eedd8 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -1139,8 +1139,10 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t
>   	while (node) {
>   		struct rb_node *next = rb_next(node);
>   		struct extent_map *em;
> +		u64 next_min_offset;
>
>   		em = rb_entry(node, struct extent_map, rb_node);
> +		next_min_offset = extent_map_end(em);
>   		(*scanned)++;
>
>   		if (em->flags & EXTENT_FLAG_PINNED)
> @@ -1166,14 +1168,24 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t
>   			break;
>
>   		/*
> -		 * Restart if we had to reschedule, and any extent maps that were
> -		 * pinned before may have become unpinned after we released the
> -		 * lock and took it again.
> +		 * If we had to reschedule start from where we were before. We
> +		 * could start from the first extent map in the tree in case we
> +		 * passed through pinned extent maps that may have become
> +		 * unpinned in the meanwhile, but it might be the case that they
> +		 * haven't been unpinned yet, so if we have many still unpinned
> +		 * extent maps, we could be wasting a lot of time and cpu. So
> +		 * don't consider previously pinned extent maps, we'll consider
> +		 * them in future calls of the extent map shrinker.
>   		 */
> -		if (cond_resched_rwlock_write(&tree->lock))
> -			node = rb_first(&tree->root);
> -		else
> +		if (cond_resched_rwlock_write(&tree->lock)) {
> +			em = search_extent_mapping(tree, next_min_offset, 0);
> +			if (em)
> +				node = &em->rb_node;
> +			else
> +				node = NULL;
> +		} else {
>   			node = next;
> +		}
>   	}
>   	write_unlock(&tree->lock);
>   	up_read(&inode->i_mmap_lock);
Josef Bacik July 1, 2024, 2:18 p.m. UTC | #2
On Mon, Jul 01, 2024 at 10:23:31AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> During extent map shrinking, while iterating over the extent maps of an
> inode, if we happen to find a lot of pinned extent maps and we need to
> reschedule, we'll start iterating the extent map tree from its first
> extent map. This can result in visiting the same extent maps again, and if
> they are not yet unpinned, we are just wasting time and can end up
> iterating over them again if we happen to reschedule again before finding
> an extent map that is not pinned - this could happen yet more times if the
> unpinning doesn't happen soon (at ordered extent completion).
> 
> So improve on this by starting on the next extent map everytime we need
> to reschedule. Any previously pinned extent maps we be checked again the
> next time the extent map shrinker is run (if needed).
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index b869a0ee24d2..2d75059eedd8 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -1139,8 +1139,10 @@  static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t
 	while (node) {
 		struct rb_node *next = rb_next(node);
 		struct extent_map *em;
+		u64 next_min_offset;
 
 		em = rb_entry(node, struct extent_map, rb_node);
+		next_min_offset = extent_map_end(em);
 		(*scanned)++;
 
 		if (em->flags & EXTENT_FLAG_PINNED)
@@ -1166,14 +1168,24 @@  static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t
 			break;
 
 		/*
-		 * Restart if we had to reschedule, and any extent maps that were
-		 * pinned before may have become unpinned after we released the
-		 * lock and took it again.
+		 * If we had to reschedule start from where we were before. We
+		 * could start from the first extent map in the tree in case we
+		 * passed through pinned extent maps that may have become
+		 * unpinned in the meanwhile, but it might be the case that they
+		 * haven't been unpinned yet, so if we have many still unpinned
+		 * extent maps, we could be wasting a lot of time and cpu. So
+		 * don't consider previously pinned extent maps, we'll consider
+		 * them in future calls of the extent map shrinker.
 		 */
-		if (cond_resched_rwlock_write(&tree->lock))
-			node = rb_first(&tree->root);
-		else
+		if (cond_resched_rwlock_write(&tree->lock)) {
+			em = search_extent_mapping(tree, next_min_offset, 0);
+			if (em)
+				node = &em->rb_node;
+			else
+				node = NULL;
+		} else {
 			node = next;
+		}
 	}
 	write_unlock(&tree->lock);
 	up_read(&inode->i_mmap_lock);