diff mbox series

btrfs: do proper folio cleanup when cow_file_range() failed

Message ID 42ee9dff1e240427f4a4d827c83e81b5598fe765.1733109950.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: do proper folio cleanup when cow_file_range() failed | expand

Commit Message

Qu Wenruo Dec. 2, 2024, 3:28 a.m. UTC
Just like cow_file_range(), from day 1 btrfs doesn't really clean the
dirty flags, if it has an ordered extent created successfully.

Per error handling protocol (according to the iomap, and the btrfs
handling if it failed at the beginning of the range), we should clear
all dirty flags for the involved folios.

Or the range of that folio will still be marked dirty, but has no
EXTENT_DEALLLOC set inside the io tree.

Since the folio range is still dirty, it will still be the target for
the next writeback, but since there is no EXTENT_DEALLLOC, no new
ordered extent will be created for it.

This means the writeback of that folio range will fall back to COW
fixup, which is being marked deprecated and will trigger a crash.

Unlike the fix in cow_file_range(), which holds the folio and extent
lock until error or a fully successfully run, here we have no such luxury
as we can fallback to COW, and in that case the extent/folio range will
be unlocked by cow_file_range().

So here we introduce a new helper, cleanup_dirty_folios(), to clear the
dirty flags for the involved folios.

Cc: stable@vger.kernel.org
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

Comments

Qu Wenruo Dec. 2, 2024, 10:03 p.m. UTC | #1
My bad, the subject line is completely the same from the 
cow_file_range() fix, meanwhile it's for run_delalloc_nocow().

Forgot to modify the function name.

The newer version is submitted here, with subject line changed:
https://lore.kernel.org/linux-btrfs/4727cdf5b669acf42d93504f2c73434f330e4d59.1733176210.git.wqu@suse.com/

Thanks,
Qu

在 2024/12/2 13:58, Qu Wenruo 写道:
> Just like cow_file_range(), from day 1 btrfs doesn't really clean the
> dirty flags, if it has an ordered extent created successfully.
> 
> Per error handling protocol (according to the iomap, and the btrfs
> handling if it failed at the beginning of the range), we should clear
> all dirty flags for the involved folios.
> 
> Or the range of that folio will still be marked dirty, but has no
> EXTENT_DEALLLOC set inside the io tree.
> 
> Since the folio range is still dirty, it will still be the target for
> the next writeback, but since there is no EXTENT_DEALLLOC, no new
> ordered extent will be created for it.
> 
> This means the writeback of that folio range will fall back to COW
> fixup, which is being marked deprecated and will trigger a crash.
> 
> Unlike the fix in cow_file_range(), which holds the folio and extent
> lock until error or a fully successfully run, here we have no such luxury
> as we can fallback to COW, and in that case the extent/folio range will
> be unlocked by cow_file_range().
> 
> So here we introduce a new helper, cleanup_dirty_folios(), to clear the
> dirty flags for the involved folios.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/inode.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 56 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e8232ac7917f..19e1b78508bd 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1969,6 +1969,46 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>   	return ret < 0 ? ret : can_nocow;
>   }
>   
> +static void cleanup_dirty_folios(struct btrfs_inode *inode,
> +				 struct folio *locked_folio,
> +				 u64 start, u64 end, int error)
> +{
> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
> +	struct address_space *mapping = inode->vfs_inode.i_mapping;
> +	pgoff_t start_index = start >> PAGE_SHIFT;
> +	pgoff_t end_index = end >> PAGE_SHIFT;
> +	u32 len;
> +
> +	ASSERT(end + 1 - start < U32_MAX);
> +	len = end + 1 - start;
> +
> +	/*
> +	 * Handle the locked folio first.
> +	 * btrfs_folio_clamp_*() helpers can handle range out of the folio case.
> +	 */
> +	btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
> +	btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
> +	btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
> +
> +	for (pgoff_t index = start_index; index <= end_index; index++) {
> +		struct folio *folio;
> +
> +		/* Already handled at the beginning. */
> +		if (index == locked_folio->index)
> +			continue;
> +		folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
> +		/* Cache already dropped, no need to do any cleanup. */
> +		if (IS_ERR(folio))
> +			continue;
> +		btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len);
> +		btrfs_folio_clamp_set_writeback(fs_info, folio, start, len);
> +		btrfs_folio_clamp_clear_writeback(fs_info, folio, start, len);
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +	mapping_set_error(mapping, error);
> +}
> +
>   /*
>    * when nowcow writeback call back.  This checks for snapshots or COW copies
>    * of the extents that exist in the file, and COWs the file as required.
> @@ -2228,6 +2268,22 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>   	return 0;
>   
>   error:
> +	/*
> +	 * We have some range with ordered extent created.
> +	 *
> +	 * Ordered extents and extent maps will be cleaned up by
> +	 * btrfs_mark_ordered_io_finished() later, but we also need to cleanup
> +	 * the dirty flags of folios.
> +	 *
> +	 * Or they can be written back again, but without any EXTENT_DELALLOC flag
> +	 * in io tree.
> +	 * This will force the writeback to go COW fixup, which is being deprecated.
> +	 *
> +	 * Also such left-over dirty flags do no follow the error handling protocol.
> +	 */
> +	if (cur_offset > start)
> +		cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
> +
>   	/*
>   	 * If an error happened while a COW region is outstanding, cur_offset
>   	 * needs to be reset to cow_start to ensure the COW region is unlocked
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e8232ac7917f..19e1b78508bd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1969,6 +1969,46 @@  static int can_nocow_file_extent(struct btrfs_path *path,
 	return ret < 0 ? ret : can_nocow;
 }
 
+static void cleanup_dirty_folios(struct btrfs_inode *inode,
+				 struct folio *locked_folio,
+				 u64 start, u64 end, int error)
+{
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	struct address_space *mapping = inode->vfs_inode.i_mapping;
+	pgoff_t start_index = start >> PAGE_SHIFT;
+	pgoff_t end_index = end >> PAGE_SHIFT;
+	u32 len;
+
+	ASSERT(end + 1 - start < U32_MAX);
+	len = end + 1 - start;
+
+	/*
+	 * Handle the locked folio first.
+	 * btrfs_folio_clamp_*() helpers can handle range out of the folio case.
+	 */
+	btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
+	btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
+	btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
+
+	for (pgoff_t index = start_index; index <= end_index; index++) {
+		struct folio *folio;
+
+		/* Already handled at the beginning. */
+		if (index == locked_folio->index)
+			continue;
+		folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
+		/* Cache already dropped, no need to do any cleanup. */
+		if (IS_ERR(folio))
+			continue;
+		btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len);
+		btrfs_folio_clamp_set_writeback(fs_info, folio, start, len);
+		btrfs_folio_clamp_clear_writeback(fs_info, folio, start, len);
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+	mapping_set_error(mapping, error);
+}
+
 /*
  * when nowcow writeback call back.  This checks for snapshots or COW copies
  * of the extents that exist in the file, and COWs the file as required.
@@ -2228,6 +2268,22 @@  static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	return 0;
 
 error:
+	/*
+	 * We have some range with ordered extent created.
+	 *
+	 * Ordered extents and extent maps will be cleaned up by
+	 * btrfs_mark_ordered_io_finished() later, but we also need to cleanup
+	 * the dirty flags of folios.
+	 *
+	 * Or they can be written back again, but without any EXTENT_DELALLOC flag
+	 * in io tree.
+	 * This will force the writeback to go COW fixup, which is being deprecated.
+	 *
+	 * Also such left-over dirty flags do no follow the error handling protocol.
+	 */
+	if (cur_offset > start)
+		cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+
 	/*
 	 * If an error happened while a COW region is outstanding, cur_offset
 	 * needs to be reset to cow_start to ensure the COW region is unlocked