Message ID | 20181105111417.11808-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Btrfs: fix data corruption due to cloning of eof block | expand |
On Mon, Nov 05, 2018 at 11:14:17AM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > We currently allow cloning a range from a file which includes the last > block of the file even if the file's size is not aligned to the block > size. This is fine and useful when the destination file has the same size, > but when it does not and the range ends somewhere in the middle of the > destination file, it leads to corruption because the bytes between the EOF > and the end of the block have undefined data (when there is support for > discard/trimming they have a value of 0x00). > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ export foo_size=$((256 * 1024 + 100)) > $ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo > $ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar > > $ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar > > $ od -A d -t x1 /mnt/bar > 0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 > * > 0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c > * > 0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00 > 0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 > * > 1048576 > > The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527 > (512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead > of 0xb5. > > This is similar to the problem we had for deduplication that got recently > fixed by commit de02b9f6bb65 ("Btrfs: fix data corruption when > deduplicating between different files"). > > Fix this by not allowing such operations to be performed and return the > errno -EINVAL to user space. This is what XFS is doing as well at the VFS > level. This change however now makes us return -EINVAL instead of > -EOPNOTSUPP for cases where the source range maps to an inline extent and > the destination range's end is smaller then the destination file's size, > since the detection of inline extents is done during the actual process of > dropping file extent items (at __btrfs_drop_extents()). Returning the > -EINVAL error is done early on and solely based on the input parameters > (offsets and length) and destination file's size. This makes us consistent > with XFS and anyone else supporting cloning since this case is now checked > at a higher level in the VFS and is where the -EINVAL will be returned > from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1 > by commit 07d19dc9fbe9 ("vfs: avoid problematic remapping requests into > partial EOF block"). So this change is more geared towards stable kernels, > as it's unlikely the new VFS checks get removed intentionally. > > A test case for fstests follows soon, as well as an update to filter > existing tests that expect -EOPNOTSUPP to accept -EINVAL as well. > > CC: <stable@vger.kernel.org> # 4.4+ > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f3134fc69880..30e098970063 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4277,9 +4277,17 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src, goto out_unlock; if (len == 0) olen = len = src->i_size - off; - /* if we extend to eof, continue to block boundary */ - if (off + len == src->i_size) + /* + * If we extend to eof, continue to block boundary if and only if the + * destination end offset matches the destination file's size, otherwise + * we would be corrupting data by placing the eof block into the middle + * of a file. + */ + if (off + len == src->i_size) { + if (!IS_ALIGNED(len, bs) && destoff + len < inode->i_size) + goto out_unlock; len = ALIGN(src->i_size, bs) - off; + } if (len == 0) { ret = 0;