Message ID | 9b55ff95921e88dd00413236390e7fd7561175de.1736591758.git.wqu@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: error handling fixes | expand |
I'll drop this patch from the series. The run_delalloc_nocow() is really a mess to fix. It has too many weird error path that can screw up the ordered extent cleanup. E.g. in must_cow: tag, we can set @cow_start, but without setting a proper @cow_end, causing us much harder to properly skip any COW range. I'll need a dedicated patch to refactor run_delalloc_nocow() to make it more streamlined to handle nocow and COW fallback (other than the current cursed way). Thanks, Qu 在 2025/1/11 21:13, Qu Wenruo 写道: > The ordered extent cleanup is hard to grasp because it doesn't follow > the common pattern that the cleanup happens where ordered extent get > allocated. > > E.g. run_delalloc_nocow() and cow_file_range() allocate one or more > ordered extents, but if any error is hit, the cleanup is done inside > btrfs_run_delalloc_range(). > > To fix the existing delayed cleanup: > > - Update the comment on error handling > For @cow_start set case, @cur_offset can be assigned to either > @cow_start or @cow_end. > So for such case we can not trust @cur_offset but only @cow_start. > > - Only reset @cow_start when fallback_to_cow() succeeded > > - Add Extra ASSERT() when @cow_start is still set at error path > > - Rewind @cur_offset when @cow_start is set > When we fall back to COW, @cur_offset can be set to either @cow_start > or @cow_end. > We can not directly trust @cur_offset, or we will do double ordered > extent accounting on range which is already cleaned up > cow_file_range(). > > - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow() and > cow_file_range() > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/inode.c | 58 +++++++++++++++++++++++++++++------------------- > 1 file changed, 35 insertions(+), 23 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 130f0490b14f..70ef7f59b692 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1272,10 +1272,8 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, > * - Else all pages except for @locked_folio are unlocked. > * > * When a failure happens in the second or later iteration of the > - * while-loop, the ordered extents created in previous iterations are kept > - * intact. So, the caller must clean them up by calling > - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for > - * example. > + * while-loop, the ordered extents created in previous iterations are > + * cleaned up, leaving no ordered extent in the range. > */ > static noinline int cow_file_range(struct btrfs_inode *inode, > struct folio *locked_folio, u64 start, > @@ -1488,9 +1486,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > > /* > * For the range (1). We have already instantiated the ordered extents > - * for this region. They are cleaned up by > - * btrfs_cleanup_ordered_extents() in e.g, > - * btrfs_run_delalloc_range(). > + * for this region, and need to clean them up. > * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV > * are also handled by the cleanup function. > * > @@ -1504,6 +1500,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > > if (!locked_folio) > mapping_set_error(inode->vfs_inode.i_mapping, ret); > + > + btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start); > extent_clear_unlock_delalloc(inode, orig_start, start - 1, > locked_folio, NULL, clear_bits, page_ops); > } > @@ -1980,6 +1978,9 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode, > * > * If no cow copies or snapshots exist, we write directly to the existing > * blocks on disk > + * > + * If an error is hit, any ordered extent created inside the range is > + * properly cleaned up. > */ > static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > struct folio *locked_folio, > @@ -2152,12 +2153,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > if (cow_start != (u64)-1) { > ret = fallback_to_cow(inode, locked_folio, cow_start, > found_key.offset - 1); > - cow_start = (u64)-1; > if (ret) { > cow_end = found_key.offset - 1; > btrfs_dec_nocow_writers(nocow_bg); > goto error; > } > + cow_start = (u64)-1; > } > > nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1; > @@ -2229,11 +2230,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > > if (cow_start != (u64)-1) { > ret = fallback_to_cow(inode, locked_folio, cow_start, end); > - cow_start = (u64)-1; > if (ret) { > cow_end = end; > goto error; > } > + cow_start = (u64)-1; > } > > btrfs_free_path(path); > @@ -2249,25 +2250,40 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > * > * For range [start, cur_offset) the folios are already unlocked (except > * @locked_folio), EXTENT_DELALLOC already removed. > - * Only need to clear the dirty flag as they will never be submitted. > - * Ordered extent and extent maps are handled by > - * btrfs_mark_ordered_io_finished() inside run_delalloc_range(). > + * Need to finish the ordered extents and their extent maps, then > + * clear the dirty flag as they will never to submitted. > * > * 2) Failed with error from fallback_to_cow() > - * start cur_offset cow_end end > + * start cow_start cow_end end > * |/////////////|-----------| | > * > - * For range [start, cur_offset) it's the same as case 1). > - * But for range [cur_offset, cow_end), the folios have dirty flag > + * Special note for this case, @cur_offset may be set to either > + * @cow_end or @cow_start. > + * So for such fallback_to_cow() case, we should not trust @cur_offset > + * but @cow_start. > + * For range [start, cow_start) it's the same as case 1). > + * But for range [cow_start, cow_end), the folios have dirty flag > * cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range(). > * > * Thus we should not call extent_clear_unlock_delalloc() on range > - * [cur_offset, cow_end), as the folios are already unlocked. > + * [cow_start, cow_end), as the folios are already unlocked. > * > - * So clear the folio dirty flags for [start, cur_offset) first. > + * So clear the folio dirty flags for [start, cur_offset) and finish > + * involved ordered extents. > + * > + * There is a special handling for COW case, where we advanced cur_offset to > + * the COW end already. For those cases we need to rewind the cur_offset to > + * the real correct location. > */ > - if (cur_offset > start) > + if (cow_start != (u64)-1) { > + ASSERT(cow_end); > + ASSERT(cur_offset >= cow_start); > + cur_offset = cow_start; > + } > + if (cur_offset > start) { > + btrfs_cleanup_ordered_extents(inode, start, cur_offset - start); > cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret); > + } > > /* > * If an error happened while a COW region is outstanding, cur_offset > @@ -2332,7 +2348,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol > > if (should_nocow(inode, start, end)) { > ret = run_delalloc_nocow(inode, locked_folio, start, end); > - goto out; > + return ret; > } > > if (btrfs_inode_can_compress(inode) && > @@ -2346,10 +2362,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol > else > ret = cow_file_range(inode, locked_folio, start, end, NULL, > false, false); > - > -out: > - if (ret < 0) > - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); > return ret; > } >
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 130f0490b14f..70ef7f59b692 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1272,10 +1272,8 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, * - Else all pages except for @locked_folio are unlocked. * * When a failure happens in the second or later iteration of the - * while-loop, the ordered extents created in previous iterations are kept - * intact. So, the caller must clean them up by calling - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for - * example. + * while-loop, the ordered extents created in previous iterations are + * cleaned up, leaving no ordered extent in the range. */ static noinline int cow_file_range(struct btrfs_inode *inode, struct folio *locked_folio, u64 start, @@ -1488,9 +1486,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, /* * For the range (1). We have already instantiated the ordered extents - * for this region. They are cleaned up by - * btrfs_cleanup_ordered_extents() in e.g, - * btrfs_run_delalloc_range(). + * for this region, and need to clean them up. * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV * are also handled by the cleanup function. * @@ -1504,6 +1500,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode, if (!locked_folio) mapping_set_error(inode->vfs_inode.i_mapping, ret); + + btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start); extent_clear_unlock_delalloc(inode, orig_start, start - 1, locked_folio, NULL, clear_bits, page_ops); } @@ -1980,6 +1978,9 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode, * * If no cow copies or snapshots exist, we write directly to the existing * blocks on disk + * + * If an error is hit, any ordered extent created inside the range is + * properly cleaned up. */ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, struct folio *locked_folio, @@ -2152,12 +2153,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, if (cow_start != (u64)-1) { ret = fallback_to_cow(inode, locked_folio, cow_start, found_key.offset - 1); - cow_start = (u64)-1; if (ret) { cow_end = found_key.offset - 1; btrfs_dec_nocow_writers(nocow_bg); goto error; } + cow_start = (u64)-1; } nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1; @@ -2229,11 +2230,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, if (cow_start != (u64)-1) { ret = fallback_to_cow(inode, locked_folio, cow_start, end); - cow_start = (u64)-1; if (ret) { cow_end = end; goto error; } + cow_start = (u64)-1; } btrfs_free_path(path); @@ -2249,25 +2250,40 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, * * For range [start, cur_offset) the folios are already unlocked (except * @locked_folio), EXTENT_DELALLOC already removed. - * Only need to clear the dirty flag as they will never be submitted. - * Ordered extent and extent maps are handled by - * btrfs_mark_ordered_io_finished() inside run_delalloc_range(). + * Need to finish the ordered extents and their extent maps, then + * clear the dirty flag as they will never to submitted. * * 2) Failed with error from fallback_to_cow() - * start cur_offset cow_end end + * start cow_start cow_end end * |/////////////|-----------| | * - * For range [start, cur_offset) it's the same as case 1). - * But for range [cur_offset, cow_end), the folios have dirty flag + * Special note for this case, @cur_offset may be set to either + * @cow_end or @cow_start. + * So for such fallback_to_cow() case, we should not trust @cur_offset + * but @cow_start. + * For range [start, cow_start) it's the same as case 1). + * But for range [cow_start, cow_end), the folios have dirty flag * cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range(). * * Thus we should not call extent_clear_unlock_delalloc() on range - * [cur_offset, cow_end), as the folios are already unlocked. + * [cow_start, cow_end), as the folios are already unlocked. * - * So clear the folio dirty flags for [start, cur_offset) first. + * So clear the folio dirty flags for [start, cur_offset) and finish + * involved ordered extents. + * + * There is a special handling for COW case, where we advanced cur_offset to + * the COW end already. For those cases we need to rewind the cur_offset to + * the real correct location. */ - if (cur_offset > start) + if (cow_start != (u64)-1) { + ASSERT(cow_end); + ASSERT(cur_offset >= cow_start); + cur_offset = cow_start; + } + if (cur_offset > start) { + btrfs_cleanup_ordered_extents(inode, start, cur_offset - start); cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret); + } /* * If an error happened while a COW region is outstanding, cur_offset @@ -2332,7 +2348,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol if (should_nocow(inode, start, end)) { ret = run_delalloc_nocow(inode, locked_folio, start, end); - goto out; + return ret; } if (btrfs_inode_can_compress(inode) && @@ -2346,10 +2362,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol else ret = cow_file_range(inode, locked_folio, start, end, NULL, false, false); - -out: - if (ret < 0) - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); return ret; }
The ordered extent cleanup is hard to grasp because it doesn't follow the common pattern that the cleanup happens where ordered extent get allocated. E.g. run_delalloc_nocow() and cow_file_range() allocate one or more ordered extents, but if any error is hit, the cleanup is done inside btrfs_run_delalloc_range(). To fix the existing delayed cleanup: - Update the comment on error handling For @cow_start set case, @cur_offset can be assigned to either @cow_start or @cow_end. So for such case we can not trust @cur_offset but only @cow_start. - Only reset @cow_start when fallback_to_cow() succeeded - Add Extra ASSERT() when @cow_start is still set at error path - Rewind @cur_offset when @cow_start is set When we fall back to COW, @cur_offset can be set to either @cow_start or @cow_end. We can not directly trust @cur_offset, or we will do double ordered extent accounting on range which is already cleaned up cow_file_range(). - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow() and cow_file_range() Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/inode.c | 58 +++++++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 23 deletions(-)