diff mbox series

btrfs: clear file extent mapping for punch past i_size

Message ID 20200220142947.3880392-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: clear file extent mapping for punch past i_size | expand

Commit Message

Josef Bacik Feb. 20, 2020, 2:29 p.m. UTC
In my stress testing we were still seeing some hole's with my patches to
fix missing hole extents.  Turns out we do not fill in holes during hole
punch if the punch is past i_size.  I incorrectly assumed this was fine,
because anybody extending would use btrfs_cont_expand, however there is
a corner that still can give us trouble.  Start with an empty file and

fallocate KEEP_SIZE 1m-2m

We now have a 0 length file, and a hole file extent from 0-1m, and a
prealloc extent from 1m-2m.  Now

punch 1m-1.5m

Because this is past i_size we have

[HOLE EXTENT][ NOTHING ][PREALLOC]
[0        1m][1m   1.5m][1.5m  2m]

with an i_size of 0.  Now if we pwrite 0-1.5m we'll increas our i_size
to 1.5m, but our disk_i_size is still 0 until the ordered extent
completes.

However if we now immediately truncate 2m on the file we'll just call
btrfs_cont_expand(inode, 1.5m, 2m), since our old i_size is 1.5m.  If we
commit the transaction here and crash we'll expose the gap.

To fix this we need to clear the file extent mapping for the range that
we punched but didn't insert a corresponding file extent for.  This will
mean the truncate will only get an disk_i_size set to 1m if we crash
before the finish ordered io happens.

I've written an xfstest to reproduce the problem and validate this fix.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
- Dave, this needs to be folded into "btrfs: use the file extent tree
  infrastructure" and the changelog needs to be adjusted since I incorrectly
  point out that we don't need to clear for hole punch.  We definitely need to
  clear for the case that we're punching past i_size as we aren't inserting hole
  file extents.

 fs/btrfs/file.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

Filipe Manana Feb. 20, 2020, 4:18 p.m. UTC | #1
On Thu, Feb 20, 2020 at 2:30 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> In my stress testing we were still seeing some hole's with my patches to
> fix missing hole extents.  Turns out we do not fill in holes during hole
> punch if the punch is past i_size.  I incorrectly assumed this was fine,
> because anybody extending would use btrfs_cont_expand, however there is
> a corner that still can give us trouble.  Start with an empty file and
>
> fallocate KEEP_SIZE 1m-2m
>
> We now have a 0 length file, and a hole file extent from 0-1m, and a
> prealloc extent from 1m-2m.  Now
>
> punch 1m-1.5m
>
> Because this is past i_size we have
>
> [HOLE EXTENT][ NOTHING ][PREALLOC]
> [0        1m][1m   1.5m][1.5m  2m]
>
> with an i_size of 0.  Now if we pwrite 0-1.5m we'll increas our i_size
> to 1.5m, but our disk_i_size is still 0 until the ordered extent
> completes.
>
> However if we now immediately truncate 2m on the file we'll just call
> btrfs_cont_expand(inode, 1.5m, 2m), since our old i_size is 1.5m.  If we
> commit the transaction here and crash we'll expose the gap.
>
> To fix this we need to clear the file extent mapping for the range that
> we punched but didn't insert a corresponding file extent for.  This will
> mean the truncate will only get an disk_i_size set to 1m if we crash
> before the finish ordered io happens.
>
> I've written an xfstest to reproduce the problem and validate this fix.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

I hit another case, besides the one I reported last week, in my
automated tests during one of these evenings.
This is probably it, but I haven't tried to debug it yet - reusing the
same fsstress seed didn't trigger it, as the test used 20 fsstress
processes (with dm log writes).

Looks good,

Reviewed-by: Filipe Manana <fdmanana@suse.com>

> ---
> - Dave, this needs to be folded into "btrfs: use the file extent tree
>   infrastructure" and the changelog needs to be adjusted since I incorrectly
>   point out that we don't need to clear for hole punch.  We definitely need to
>   clear for the case that we're punching past i_size as we aren't inserting hole
>   file extents.
>
>  fs/btrfs/file.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 5cdcdbdd908b..6f6f1805e6fd 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2597,6 +2597,24 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
>                                 btrfs_abort_transaction(trans, ret);
>                                 break;
>                         }
> +               } else if (!clone_info && cur_offset < drop_end) {
> +                       /*
> +                        * We are past the i_size here, but since we didn't
> +                        * insert holes we need to clear the mapped area so we
> +                        * know to not set disk_i_size in this area until a new
> +                        * file extent is inserted here.
> +                        */
> +                       ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                       cur_offset, drop_end - cur_offset);
> +                       if (ret) {
> +                               /*
> +                                * We couldn't clear our area, so we could
> +                                * presumably adjust up and corrupt the fs, so
> +                                * we need to abort.
> +                                */
> +                               btrfs_abort_transaction(trans, ret);
> +                               break;
> +                       }
>                 }
>
>                 if (clone_info && drop_end > clone_info->file_offset) {
> @@ -2687,6 +2705,15 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
>                         btrfs_abort_transaction(trans, ret);
>                         goto out_trans;
>                 }
> +       } else if (!clone_info && cur_offset < drop_end) {
> +               /* See the comment in the loop above for the reasoning here. */
> +               ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> +                                       cur_offset, drop_end - cur_offset);
> +               if (ret) {
> +                       btrfs_abort_transaction(trans, ret);
> +                       goto out_trans;
> +               }
> +
>         }
>         if (clone_info) {
>                 ret = btrfs_insert_clone_extent(trans, inode, path, clone_info,
> --
> 2.24.1
>
David Sterba Feb. 21, 2020, 4:58 p.m. UTC | #2
On Thu, Feb 20, 2020 at 09:29:47AM -0500, Josef Bacik wrote:
> In my stress testing we were still seeing some hole's with my patches to
> fix missing hole extents.  Turns out we do not fill in holes during hole
> punch if the punch is past i_size.  I incorrectly assumed this was fine,
> because anybody extending would use btrfs_cont_expand, however there is
> a corner that still can give us trouble.  Start with an empty file and
> 
> fallocate KEEP_SIZE 1m-2m
> 
> We now have a 0 length file, and a hole file extent from 0-1m, and a
> prealloc extent from 1m-2m.  Now
> 
> punch 1m-1.5m
> 
> Because this is past i_size we have
> 
> [HOLE EXTENT][ NOTHING ][PREALLOC]
> [0        1m][1m   1.5m][1.5m  2m]
> 
> with an i_size of 0.  Now if we pwrite 0-1.5m we'll increas our i_size
> to 1.5m, but our disk_i_size is still 0 until the ordered extent
> completes.
> 
> However if we now immediately truncate 2m on the file we'll just call
> btrfs_cont_expand(inode, 1.5m, 2m), since our old i_size is 1.5m.  If we
> commit the transaction here and crash we'll expose the gap.
> 
> To fix this we need to clear the file extent mapping for the range that
> we punched but didn't insert a corresponding file extent for.  This will
> mean the truncate will only get an disk_i_size set to 1m if we crash
> before the finish ordered io happens.
> 
> I've written an xfstest to reproduce the problem and validate this fix.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> - Dave, this needs to be folded into "btrfs: use the file extent tree
>   infrastructure" and the changelog needs to be adjusted since I incorrectly
>   point out that we don't need to clear for hole punch.  We definitely need to
>   clear for the case that we're punching past i_size as we aren't inserting hole
>   file extents.

Ok, folded to the patch. I added the hole puching as 7) and appended
this changelog so it describes the tricky corner case. Let me know if
there are more updates needed. Thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5cdcdbdd908b..6f6f1805e6fd 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2597,6 +2597,24 @@  int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 				btrfs_abort_transaction(trans, ret);
 				break;
 			}
+		} else if (!clone_info && cur_offset < drop_end) {
+			/*
+			 * We are past the i_size here, but since we didn't
+			 * insert holes we need to clear the mapped area so we
+			 * know to not set disk_i_size in this area until a new
+			 * file extent is inserted here.
+			 */
+			ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+					cur_offset, drop_end - cur_offset);
+			if (ret) {
+				/*
+				 * We couldn't clear our area, so we could
+				 * presumably adjust up and corrupt the fs, so
+				 * we need to abort.
+				 */
+				btrfs_abort_transaction(trans, ret);
+				break;
+			}
 		}
 
 		if (clone_info && drop_end > clone_info->file_offset) {
@@ -2687,6 +2705,15 @@  int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			btrfs_abort_transaction(trans, ret);
 			goto out_trans;
 		}
+	} else if (!clone_info && cur_offset < drop_end) {
+		/* See the comment in the loop above for the reasoning here. */
+		ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
+					cur_offset, drop_end - cur_offset);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			goto out_trans;
+		}
+
 	}
 	if (clone_info) {
 		ret = btrfs_insert_clone_extent(trans, inode, path, clone_info,