Message ID | 20190214151720.23563-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Btrfs: fix corruption reading shared and compressed extents after hole punching | expand |
On Thu, Feb 14, 2019 at 03:17:20PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > In the past we had data corruption when reading compressed extents that > are shared within the same file and they are consecutive, this got fixed > by commit 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and > shared extents") and by commit 808f80b46790f ("Btrfs: update fix for read > corruption of compressed and shared extents"). However there was a case > that was missing in those fixes, which is when the shared and compressed > extents are referenced with a non-zero offset. The following shell script > creates a reproducer for this issue: > > #!/bin/bash > > mkfs.btrfs -f /dev/sdc &> /dev/null > mount -o compress /dev/sdc /mnt/sdc > > # Create a file with 3 consecutive compressed extents, each has an > # uncompressed size of 128Kb and a compressed size of 4Kb. > for ((i = 1; i <= 3; i++)); do > head -c 4096 /dev/zero > for ((j = 1; j <= 31; j++)); do > head -c 4096 /dev/zero | tr '\0' "\377" > done > done > /mnt/sdc/foobar > sync > > echo "Digest after file creation: $(md5sum /mnt/sdc/foobar)" > > # Clone the first extent into offsets 128K and 256K. > xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar > xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar > sync > > echo "Digest after cloning: $(md5sum /mnt/sdc/foobar)" > > # Punch holes into the regions that are already full of zeroes. > xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar > xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar > xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar > sync > > echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)" > > echo "Dropping page cache..." > sysctl -q vm.drop_caches=1 > echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)" > > umount /dev/sdc > > When running the script we get the following output: > > Digest after file creation: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar > linked 131072/131072 bytes at offset 131072 > 128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec) > linked 131072/131072 bytes at offset 262144 > 128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec) > Digest after cloning: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar > Digest after hole punching: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar > Dropping page cache... > Digest after hole punching: fba694ae8664ed0c2e9ff8937e7f1484 /mnt/sdc/foobar > > This happens because after reading all the pages of the extent in the > range from 128K to 256K for example, we read the hole at offset 256K > and then when reading the page at offset 260K we don't submit the > existing bio, which is responsible for filling all the page in the > range 128K to 256K only, therefore adding the pages from range 260K > to 384K to the existing bio and submitting it after iterating over the > entire range. Once the bio completes, the uncompressed data fills only > the pages in the range 128K to 256K because there's no more data read > from disk, leaving the pages in the range 260K to 384K unfilled. It is > just a slightly different variant of what was solved by commit > 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and shared > extents"). > > Fix this by forcing a bio submit, during readpages(), whenever we find a > compressed extent map for a page that is different from the extent map > for the previous page or has a different starting offset (in case it's > the same compressed extent), instead of the extent map's original start > offset. > > A test case for fstests follows soon. > > Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > Cc: stable@vger.kernel.org # 4.3+ > Signed-off-by: Filipe Manana <fdmanana@suse.com> Added to misc-next and scheduled for 5.1, thanks. And thanks to Zygo for the report.
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 52abe4082680..1bfb7207bbf0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2985,11 +2985,11 @@ static int __do_readpage(struct extent_io_tree *tree, */ if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags) && prev_em_start && *prev_em_start != (u64)-1 && - *prev_em_start != em->orig_start) + *prev_em_start != em->start) force_bio_submit = true; if (prev_em_start) - *prev_em_start = em->orig_start; + *prev_em_start = em->start; free_extent_map(em); em = NULL;