diff mbox series

[09/17] btrfs: push extent lock down in run_delalloc_nocow

Message ID be4f89a0ad01fc580a11bfe52300a0c2e7669cc4.1713363472.git.josef@toxicpanda.com (mailing list archive)
State New
Headers show
Series btrfs: restrain lock extent usage during writeback | expand

Commit Message

Josef Bacik April 17, 2024, 2:35 p.m. UTC
run_delalloc_nocow is a little special because we use the file extents
to see if we can nocow a range.  We don't actually need the protection
of the extent lock to look at the file extents at this point however.
We are currently holding the page lock for this range, so we are
protected from anybody who would simultaneously be modifying the file
extent items for this range.

* mmap() - we're holding the page lock.
* buffered writes - we're holding the page lock.
* direct writes - we're holding the page lock and direct IO has to flush
  page cache before it's able to continue.
* fallocate() - all callers flush the range and wait on ordered extents
  while holding the inode lock and the mmap lock, so we are again saved
  by the page lock.

We want to use the extent lock to protect

1) The mapping tree for the given range.
2) The ordered extents for the given range.
3) The io_tree for the given range.

Push the extent lock down to cover these operations.  In the
fallback_to_cow() case we simply lock before doing anything and rely on
the cow_file_range() helper to handle it's range properly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Comments

Goldwyn Rodrigues April 24, 2024, 5:41 p.m. UTC | #1
On 10:35 17/04, Josef Bacik wrote:
> run_delalloc_nocow is a little special because we use the file extents
> to see if we can nocow a range.  We don't actually need the protection
> of the extent lock to look at the file extents at this point however.
> We are currently holding the page lock for this range, so we are
> protected from anybody who would simultaneously be modifying the file
> extent items for this range.
> 
> * mmap() - we're holding the page lock.
> * buffered writes - we're holding the page lock.
> * direct writes - we're holding the page lock and direct IO has to flush
>   page cache before it's able to continue.
> * fallocate() - all callers flush the range and wait on ordered extents
>   while holding the inode lock and the mmap lock, so we are again saved
>   by the page lock.
> 
> We want to use the extent lock to protect
> 
> 1) The mapping tree for the given range.
> 2) The ordered extents for the given range.
> 3) The io_tree for the given range.
> 
> Push the extent lock down to cover these operations.  In the
> fallback_to_cow() case we simply lock before doing anything and rely on
> the cow_file_range() helper to handle it's range properly.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 80e92d37af34..ab3d6ebbce6a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1747,6 +1747,8 @@  static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page,
 	u64 count;
 	int ret;
 
+	lock_extent(io_tree, start, end, NULL);
+
 	/*
 	 * If EXTENT_NORESERVE is set it means that when the buffered write was
 	 * made we had not enough available data space and therefore we did not
@@ -1977,8 +1979,6 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	 */
 	ASSERT(!btrfs_is_zoned(fs_info) || btrfs_is_data_reloc_root(root));
 
-	lock_extent(&inode->io_tree, start, end, NULL);
-
 	path = btrfs_alloc_path();
 	if (!path) {
 		ret = -ENOMEM;
@@ -1994,6 +1994,7 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		struct btrfs_key found_key;
 		struct btrfs_file_extent_item *fi;
 		struct extent_buffer *leaf;
+		struct extent_state *cached_state = NULL;
 		u64 extent_end;
 		u64 ram_bytes;
 		u64 nocow_end;
@@ -2131,6 +2132,8 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		}
 
 		nocow_end = cur_offset + nocow_args.num_bytes - 1;
+		lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state);
+
 		is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
 		if (is_prealloc) {
 			u64 orig_start = found_key.offset - nocow_args.extent_offset;
@@ -2144,6 +2147,8 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					  ram_bytes, BTRFS_COMPRESS_NONE,
 					  BTRFS_ORDERED_PREALLOC);
 			if (IS_ERR(em)) {
+				unlock_extent(&inode->io_tree, cur_offset,
+					      nocow_end, &cached_state);
 				btrfs_dec_nocow_writers(nocow_bg);
 				ret = PTR_ERR(em);
 				goto error;
@@ -2164,6 +2169,8 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 				btrfs_drop_extent_map_range(inode, cur_offset,
 							    nocow_end, false);
 			}
+			unlock_extent(&inode->io_tree, cur_offset,
+				      nocow_end, &cached_state);
 			ret = PTR_ERR(ordered);
 			goto error;
 		}
@@ -2182,6 +2189,7 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					     EXTENT_DELALLOC |
 					     EXTENT_CLEAR_DATA_RESV,
 					     PAGE_UNLOCK | PAGE_SET_ORDERED);
+		free_extent_state(cached_state);
 
 		cur_offset = extent_end;
 
@@ -2217,13 +2225,20 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	 */
 	if (cow_start != (u64)-1)
 		cur_offset = cow_start;
-	if (cur_offset < end)
+
+	/*
+	 * We need to lock the extent here because we're clearing DELALLOC and
+	 * we're not locked at this point.
+	 */
+	if (cur_offset < end) {
+		lock_extent(&inode->io_tree, cur_offset, end, NULL);
 		extent_clear_unlock_delalloc(inode, cur_offset, end,
 					     locked_page, EXTENT_LOCKED |
 					     EXTENT_DELALLOC | EXTENT_DEFRAG |
 					     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
 					     PAGE_START_WRITEBACK |
 					     PAGE_END_WRITEBACK);
+	}
 	btrfs_free_path(path);
 	return ret;
 }