diff mbox series

[3/4] Btrfs: fix RWF_NOWAIT write not failling when we need to cow

Message ID 20200615174913.14943-1-fdmanana@kernel.org (mailing list archive)
State New, archived
Headers show
Series [1/4] Btrfs: fix hang on snapshot creation after RWF_NOWAIT write | expand

Commit Message

Filipe Manana June 15, 2020, 5:49 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

If we attempt to do a RWF_NOWAIT write against a file range for which we
can only do NOCOW for a part of it, due to the existence of holes or
shared extents for example, we proceed with the write as if it were
possible to NOCOW the whole range.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/sdj/bar
  $ chattr +C /mnt/sdj/bar

  $ xfs_io -d -c "pwrite -S 0xab -b 256K 0 256K" /mnt/bar
  wrote 262144/262144 bytes at offset 0
  256 KiB, 1 ops; 0.0003 sec (694.444 MiB/sec and 2777.7778 ops/sec)

  $ xfs_io -c "fpunch 64K 64K" /mnt/bar
  $ sync

  $ xfs_io -d -c "pwrite -N -V 1 -b 128K -S 0xfe 0 128K" /mnt/bar
  wrote 131072/131072 bytes at offset 0
  128 KiB, 1 ops; 0.0007 sec (160.051 MiB/sec and 1280.4097 ops/sec)

This last write should fail with -EAGAIN since the file range from 64K to
128Kb is a hole. On xfs it fails, as expected, but on ext4 it currently
succeeds because apparently it is expensive to check if there are extents
allocated for the whole range, but I'll check with the ext4 people.

Fix the issue by checking if check_can_nocow() returns a number of
NOCOW'able bytes smaller then the requested number of bytes, and if it
does return -EAGAIN.

Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Josef Bacik June 16, 2020, 7:28 p.m. UTC | #1
On 6/15/20 1:49 PM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> If we attempt to do a RWF_NOWAIT write against a file range for which we
> can only do NOCOW for a part of it, due to the existence of holes or
> shared extents for example, we proceed with the write as if it were
> possible to NOCOW the whole range.
> 
> Example:
> 
>    $ mkfs.btrfs -f /dev/sdb
>    $ mount /dev/sdb /mnt
> 
>    $ touch /mnt/sdj/bar
>    $ chattr +C /mnt/sdj/bar
> 
>    $ xfs_io -d -c "pwrite -S 0xab -b 256K 0 256K" /mnt/bar
>    wrote 262144/262144 bytes at offset 0
>    256 KiB, 1 ops; 0.0003 sec (694.444 MiB/sec and 2777.7778 ops/sec)
> 
>    $ xfs_io -c "fpunch 64K 64K" /mnt/bar
>    $ sync
> 
>    $ xfs_io -d -c "pwrite -N -V 1 -b 128K -S 0xfe 0 128K" /mnt/bar
>    wrote 131072/131072 bytes at offset 0
>    128 KiB, 1 ops; 0.0007 sec (160.051 MiB/sec and 1280.4097 ops/sec)
> 
> This last write should fail with -EAGAIN since the file range from 64K to
> 128Kb is a hole. On xfs it fails, as expected, but on ext4 it currently
> succeeds because apparently it is expensive to check if there are extents
> allocated for the whole range, but I'll check with the ext4 people.
> 
> Fix the issue by checking if check_can_nocow() returns a number of
> NOCOW'able bytes smaller then the requested number of bytes, and if it
> does return -EAGAIN.
> 
> Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 04faa04fccd1..78481d1e5e6e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1904,18 +1904,28 @@  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	pos = iocb->ki_pos;
 	count = iov_iter_count(from);
 	if (iocb->ki_flags & IOCB_NOWAIT) {
+		size_t nocow_bytes = count;
 		/*
 		 * We will allocate space in case nodatacow is not set,
 		 * so bail
 		 */
 		if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
 					      BTRFS_INODE_PREALLOC)) ||
-		    check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
+		    check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes) <= 0) {
 			inode_unlock(inode);
 			return -EAGAIN;
 		}
 		/* check_can_nocow() locks the snapshot lock on success */
 		btrfs_drew_write_unlock(&root->snapshot_lock);
+		/*
+		 * There are holes in the range or parts of the range that must
+		 * be COWed (shared extents, RO block groups, etc), so just bail
+		 * out.
+		 */
+		if (nocow_bytes < count) {
+			inode_unlock(inode);
+			return -EAGAIN;
+		}
 	}
 
 	current->backing_dev_info = inode_to_bdi(inode);