diff mbox series

[v4,10/10] btrfs: move ordered extent cleanup to where they are allocated

Message ID 9b55ff95921e88dd00413236390e7fd7561175de.1736591758.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: error handling fixes | expand

Commit Message

Qu Wenruo Jan. 11, 2025, 10:43 a.m. UTC
The ordered extent cleanup is hard to grasp because it doesn't follow
the common pattern that the cleanup happens where ordered extent get
allocated.

E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
ordered extents, but if any error is hit, the cleanup is done inside
btrfs_run_delalloc_range().

To fix the existing delayed cleanup:

- Update the comment on error handling
  For @cow_start set case, @cur_offset can be assigned to either
  @cow_start or @cow_end.
  So for such case we can not trust @cur_offset but only @cow_start.

- Only reset @cow_start when fallback_to_cow() succeeded

- Add Extra ASSERT() when @cow_start is still set at error path

- Rewind @cur_offset when @cow_start is set
  When we fall back to COW, @cur_offset can be set to either @cow_start
  or @cow_end.
  We can not directly trust @cur_offset, or we will do double ordered
  extent accounting on range which is already cleaned up
  cow_file_range().

- Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow() and
  cow_file_range()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 58 +++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

Comments

Qu Wenruo Jan. 12, 2025, 8:38 a.m. UTC | #1
I'll drop this patch from the series.

The run_delalloc_nocow() is really a mess to fix.

It has too many weird error path that can screw up the ordered extent 
cleanup.
E.g. in must_cow: tag, we can set @cow_start, but without setting a 
proper @cow_end, causing us much harder to properly skip any COW range.

I'll need a dedicated patch to refactor run_delalloc_nocow() to make it 
more streamlined to handle nocow and COW fallback (other than the 
current cursed way).

Thanks,
Qu

在 2025/1/11 21:13, Qu Wenruo 写道:
> The ordered extent cleanup is hard to grasp because it doesn't follow
> the common pattern that the cleanup happens where ordered extent get
> allocated.
> 
> E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
> ordered extents, but if any error is hit, the cleanup is done inside
> btrfs_run_delalloc_range().
> 
> To fix the existing delayed cleanup:
> 
> - Update the comment on error handling
>    For @cow_start set case, @cur_offset can be assigned to either
>    @cow_start or @cow_end.
>    So for such case we can not trust @cur_offset but only @cow_start.
> 
> - Only reset @cow_start when fallback_to_cow() succeeded
> 
> - Add Extra ASSERT() when @cow_start is still set at error path
> 
> - Rewind @cur_offset when @cow_start is set
>    When we fall back to COW, @cur_offset can be set to either @cow_start
>    or @cow_end.
>    We can not directly trust @cur_offset, or we will do double ordered
>    extent accounting on range which is already cleaned up
>    cow_file_range().
> 
> - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow() and
>    cow_file_range()
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/inode.c | 58 +++++++++++++++++++++++++++++-------------------
>   1 file changed, 35 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 130f0490b14f..70ef7f59b692 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1272,10 +1272,8 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
>    * - Else all pages except for @locked_folio are unlocked.
>    *
>    * When a failure happens in the second or later iteration of the
> - * while-loop, the ordered extents created in previous iterations are kept
> - * intact. So, the caller must clean them up by calling
> - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
> - * example.
> + * while-loop, the ordered extents created in previous iterations are
> + * cleaned up, leaving no ordered extent in the range.
>    */
>   static noinline int cow_file_range(struct btrfs_inode *inode,
>   				   struct folio *locked_folio, u64 start,
> @@ -1488,9 +1486,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>   
>   	/*
>   	 * For the range (1). We have already instantiated the ordered extents
> -	 * for this region. They are cleaned up by
> -	 * btrfs_cleanup_ordered_extents() in e.g,
> -	 * btrfs_run_delalloc_range().
> +	 * for this region, and need to clean them up.
>   	 * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
>   	 * are also handled by the cleanup function.
>   	 *
> @@ -1504,6 +1500,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>   
>   		if (!locked_folio)
>   			mapping_set_error(inode->vfs_inode.i_mapping, ret);
> +
> +		btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start);
>   		extent_clear_unlock_delalloc(inode, orig_start, start - 1,
>   					     locked_folio, NULL, clear_bits, page_ops);
>   	}
> @@ -1980,6 +1978,9 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode,
>    *
>    * If no cow copies or snapshots exist, we write directly to the existing
>    * blocks on disk
> + *
> + * If an error is hit, any ordered extent created inside the range is
> + * properly cleaned up.
>    */
>   static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>   				       struct folio *locked_folio,
> @@ -2152,12 +2153,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>   		if (cow_start != (u64)-1) {
>   			ret = fallback_to_cow(inode, locked_folio, cow_start,
>   					      found_key.offset - 1);
> -			cow_start = (u64)-1;
>   			if (ret) {
>   				cow_end = found_key.offset - 1;
>   				btrfs_dec_nocow_writers(nocow_bg);
>   				goto error;
>   			}
> +			cow_start = (u64)-1;
>   		}
>   
>   		nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
> @@ -2229,11 +2230,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>   
>   	if (cow_start != (u64)-1) {
>   		ret = fallback_to_cow(inode, locked_folio, cow_start, end);
> -		cow_start = (u64)-1;
>   		if (ret) {
>   			cow_end = end;
>   			goto error;
>   		}
> +		cow_start = (u64)-1;
>   	}
>   
>   	btrfs_free_path(path);
> @@ -2249,25 +2250,40 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>   	 *
>   	 *    For range [start, cur_offset) the folios are already unlocked (except
>   	 *    @locked_folio), EXTENT_DELALLOC already removed.
> -	 *    Only need to clear the dirty flag as they will never be submitted.
> -	 *    Ordered extent and extent maps are handled by
> -	 *    btrfs_mark_ordered_io_finished() inside run_delalloc_range().
> +	 *    Need to finish the ordered extents and their extent maps, then
> +	 *    clear the dirty flag as they will never to submitted.
>   	 *
>   	 * 2) Failed with error from fallback_to_cow()
> -	 *    start         cur_offset  cow_end    end
> +	 *    start         cow_start  cow_end    end
>   	 *    |/////////////|-----------|          |
>   	 *
> -	 *    For range [start, cur_offset) it's the same as case 1).
> -	 *    But for range [cur_offset, cow_end), the folios have dirty flag
> +	 *    Special note for this case, @cur_offset may be set to either
> +	 *    @cow_end or @cow_start.
> +	 *    So for such fallback_to_cow() case, we should not trust @cur_offset
> +	 *    but @cow_start.
> +	 *    For range [start, cow_start) it's the same as case 1).
> +	 *    But for range [cow_start, cow_end), the folios have dirty flag
>   	 *    cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
>   	 *
>   	 *    Thus we should not call extent_clear_unlock_delalloc() on range
> -	 *    [cur_offset, cow_end), as the folios are already unlocked.
> +	 *    [cow_start, cow_end), as the folios are already unlocked.
>   	 *
> -	 * So clear the folio dirty flags for [start, cur_offset) first.
> +	 * So clear the folio dirty flags for [start, cur_offset) and finish
> +	 * involved ordered extents.
> +	 *
> +	 * There is a special handling for COW case, where we advanced cur_offset to
> +	 * the COW end already. For those cases we need to rewind the cur_offset to
> +	 * the real correct location.
>   	 */
> -	if (cur_offset > start)
> +	if (cow_start != (u64)-1) {
> +		ASSERT(cow_end);
> +		ASSERT(cur_offset >= cow_start);
> +		cur_offset = cow_start;
> +	}
> +	if (cur_offset > start) {
> +		btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
>   		cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
> +	}
>   
>   	/*
>   	 * If an error happened while a COW region is outstanding, cur_offset
> @@ -2332,7 +2348,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
>   
>   	if (should_nocow(inode, start, end)) {
>   		ret = run_delalloc_nocow(inode, locked_folio, start, end);
> -		goto out;
> +		return ret;
>   	}
>   
>   	if (btrfs_inode_can_compress(inode) &&
> @@ -2346,10 +2362,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
>   	else
>   		ret = cow_file_range(inode, locked_folio, start, end, NULL,
>   				     false, false);
> -
> -out:
> -	if (ret < 0)
> -		btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
>   	return ret;
>   }
>
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 130f0490b14f..70ef7f59b692 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1272,10 +1272,8 @@  u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
  * - Else all pages except for @locked_folio are unlocked.
  *
  * When a failure happens in the second or later iteration of the
- * while-loop, the ordered extents created in previous iterations are kept
- * intact. So, the caller must clean them up by calling
- * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
- * example.
+ * while-loop, the ordered extents created in previous iterations are
+ * cleaned up, leaving no ordered extent in the range.
  */
 static noinline int cow_file_range(struct btrfs_inode *inode,
 				   struct folio *locked_folio, u64 start,
@@ -1488,9 +1486,7 @@  static noinline int cow_file_range(struct btrfs_inode *inode,
 
 	/*
 	 * For the range (1). We have already instantiated the ordered extents
-	 * for this region. They are cleaned up by
-	 * btrfs_cleanup_ordered_extents() in e.g,
-	 * btrfs_run_delalloc_range().
+	 * for this region, and need to clean them up.
 	 * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
 	 * are also handled by the cleanup function.
 	 *
@@ -1504,6 +1500,8 @@  static noinline int cow_file_range(struct btrfs_inode *inode,
 
 		if (!locked_folio)
 			mapping_set_error(inode->vfs_inode.i_mapping, ret);
+
+		btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start);
 		extent_clear_unlock_delalloc(inode, orig_start, start - 1,
 					     locked_folio, NULL, clear_bits, page_ops);
 	}
@@ -1980,6 +1978,9 @@  static void cleanup_dirty_folios(struct btrfs_inode *inode,
  *
  * If no cow copies or snapshots exist, we write directly to the existing
  * blocks on disk
+ *
+ * If an error is hit, any ordered extent created inside the range is
+ * properly cleaned up.
  */
 static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 				       struct folio *locked_folio,
@@ -2152,12 +2153,12 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		if (cow_start != (u64)-1) {
 			ret = fallback_to_cow(inode, locked_folio, cow_start,
 					      found_key.offset - 1);
-			cow_start = (u64)-1;
 			if (ret) {
 				cow_end = found_key.offset - 1;
 				btrfs_dec_nocow_writers(nocow_bg);
 				goto error;
 			}
+			cow_start = (u64)-1;
 		}
 
 		nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
@@ -2229,11 +2230,11 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 
 	if (cow_start != (u64)-1) {
 		ret = fallback_to_cow(inode, locked_folio, cow_start, end);
-		cow_start = (u64)-1;
 		if (ret) {
 			cow_end = end;
 			goto error;
 		}
+		cow_start = (u64)-1;
 	}
 
 	btrfs_free_path(path);
@@ -2249,25 +2250,40 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	 *
 	 *    For range [start, cur_offset) the folios are already unlocked (except
 	 *    @locked_folio), EXTENT_DELALLOC already removed.
-	 *    Only need to clear the dirty flag as they will never be submitted.
-	 *    Ordered extent and extent maps are handled by
-	 *    btrfs_mark_ordered_io_finished() inside run_delalloc_range().
+	 *    Need to finish the ordered extents and their extent maps, then
+	 *    clear the dirty flag as they will never to submitted.
 	 *
 	 * 2) Failed with error from fallback_to_cow()
-	 *    start         cur_offset  cow_end    end
+	 *    start         cow_start  cow_end    end
 	 *    |/////////////|-----------|          |
 	 *
-	 *    For range [start, cur_offset) it's the same as case 1).
-	 *    But for range [cur_offset, cow_end), the folios have dirty flag
+	 *    Special note for this case, @cur_offset may be set to either
+	 *    @cow_end or @cow_start.
+	 *    So for such fallback_to_cow() case, we should not trust @cur_offset
+	 *    but @cow_start.
+	 *    For range [start, cow_start) it's the same as case 1).
+	 *    But for range [cow_start, cow_end), the folios have dirty flag
 	 *    cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
 	 *
 	 *    Thus we should not call extent_clear_unlock_delalloc() on range
-	 *    [cur_offset, cow_end), as the folios are already unlocked.
+	 *    [cow_start, cow_end), as the folios are already unlocked.
 	 *
-	 * So clear the folio dirty flags for [start, cur_offset) first.
+	 * So clear the folio dirty flags for [start, cur_offset) and finish
+	 * involved ordered extents.
+	 *
+	 * There is a special handling for COW case, where we advanced cur_offset to
+	 * the COW end already. For those cases we need to rewind the cur_offset to
+	 * the real correct location.
 	 */
-	if (cur_offset > start)
+	if (cow_start != (u64)-1) {
+		ASSERT(cow_end);
+		ASSERT(cur_offset >= cow_start);
+		cur_offset = cow_start;
+	}
+	if (cur_offset > start) {
+		btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
 		cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+	}
 
 	/*
 	 * If an error happened while a COW region is outstanding, cur_offset
@@ -2332,7 +2348,7 @@  int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
 
 	if (should_nocow(inode, start, end)) {
 		ret = run_delalloc_nocow(inode, locked_folio, start, end);
-		goto out;
+		return ret;
 	}
 
 	if (btrfs_inode_can_compress(inode) &&
@@ -2346,10 +2362,6 @@  int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
 	else
 		ret = cow_file_range(inode, locked_folio, start, end, NULL,
 				     false, false);
-
-out:
-	if (ret < 0)
-		btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
 	return ret;
 }