Message ID | 20230418-btrfs-extent-reads-v1-1-47ba9839f0cc@codewreck.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix and improve read code | expand |
On 2023/4/18 09:17, Dominique Martinet wrote: > From: Dominique Martinet <dominique.martinet@atmark-techno.com> The subject can be changed to "btrfs: fix offset when reading compressed extents". The original one is a little too generic. > > btrfs_read_extent_reg correctly computed the extent offset in the > BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset' > part correctly in the compressed case, making the function read > incorrect data. > > In the case I examined, the last 4k of a file was corrupted and > contained data from a few blocks prior: > btrfs_file_read() > -> btrfs_read_extent_reg > (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time) > last read had 0x4000 bytes in extent, but aligned_end - cur limited > the read to 0x3000 (offset 0x720000) > -> read_and_truncate_page > -> btrfs_read_extent_reg > reading the last 0x1000 bytes from offset 0x73000 (end of the > previous 0x4000) into 0x40ba3000 > here key.offset = 0x70000 so we need to use it to recover the > 0x3000 offset. You can use "btrfs ins dump-tree" to provide a much easier to read on-disk data layout. And you can also add a smaller reproducer. > > Confirmed by checking data, before patch: > u-boot=> mmc load 1:1 $loadaddr boot/uImage > u-boot=> md 0x40ba0000 > 40ba0000: c0232ff8 c0c722cb 030e94be bf10000c ./#..".......... > u-boot=> md 0x40ba3000 > 40ba3000: c0232ff8 c0c722cb 030e94be bf10000c ./#..".......... > > After patch (also confirmed with data read from linux): > u-boot=> md 0x40ba3000 > 40ba3000: 64cc9f03 81142256 6910000c 0c03483c ...dV".....i<H.. > > Note that the code previously (before commit e3427184f38a ("fs: btrfs: > Implement btrfs_file_read()")) did not split that read in two, so > this is a regression. > > Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and btrfs_read_extent_reg()") > Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> > --- > fs/btrfs/inode.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 40025662f250..3d6e39e6544d 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -443,6 +443,8 @@ int btrfs_read_extent_reg(struct btrfs_path *path, > IS_ALIGNED(len, fs_info->sectorsize)); > ASSERT(offset >= key.offset && > offset + len <= key.offset + extent_num_bytes); > + /* offset is now offset within extent */ > + offset = btrfs_file_extent_offset(leaf, fi) + offset - key.offset; I prefer not to use the @offset, explained later. > > /* Preallocated or hole , fill @dest with zero */ > if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || > @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, > if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { > u64 logical; > > - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + > - btrfs_file_extent_offset(leaf, fi) + > - offset - key.offset; > + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; This is touching non-compressed path, which is very weird as your commit message said this part is correct. I prefer the original one to show everything we need to take into consideration. Otherwise looks good to me. Thanks, Qu > read = len; > > num_copies = btrfs_num_copies(fs_info, logical, len); > @@ -511,7 +511,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, > if (ret < dsize) > memset(dbuf + ret, 0, dsize - ret); > /* Then copy the needed part */ > - memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len); > + memcpy(dest, dbuf + offset, len); > ret = len; > out: > free(cbuf); >
Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800: > The subject can be changed to "btrfs: fix offset when reading compressed > extents". > The original one is a little too generic. Ok. > > btrfs_file_read() > > -> btrfs_read_extent_reg > > (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time) > > last read had 0x4000 bytes in extent, but aligned_end - cur limited > > the read to 0x3000 (offset 0x720000) > > -> read_and_truncate_page > > -> btrfs_read_extent_reg > > reading the last 0x1000 bytes from offset 0x73000 (end of the > > previous 0x4000) into 0x40ba3000 > > here key.offset = 0x70000 so we need to use it to recover the > > 0x3000 offset. > > You can use "btrfs ins dump-tree" to provide a much easier to read on-disk > data layout. key.offset is the offset within the read call, not the offset on disk. The file on disk looks perfectly normal, the condition to trigger the bug is to have a file which size is not sector-aligned and where the last extent is bigger than a sector. I had a look at dump-tree early on and couldn't actually find my file in there, now the problem is understood it should be easy to make a reproducer so I'll add this info and commands needed to reproduce (+ a link to a fs image just in case) in v2 > > /* Preallocated or hole , fill @dest with zero */ > > if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || > > @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, > > if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { > > u64 logical; > > - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + > > - btrfs_file_extent_offset(leaf, fi) + > > - offset - key.offset; > > + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; > > This is touching non-compressed path, which is very weird as your commit > message said this part is correct. my rationale is that since this was considered once but forgotten the other time it'll be easy to add yet another code path that forgets it later, but I guess it won't change much anyway -- I'll fix the patch making it explicit again. Thanks,
On 2023/4/18 10:05, Dominique Martinet wrote: > Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800: >> The subject can be changed to "btrfs: fix offset when reading compressed >> extents". >> The original one is a little too generic. > > Ok. > >>> btrfs_file_read() >>> -> btrfs_read_extent_reg >>> (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time) >>> last read had 0x4000 bytes in extent, but aligned_end - cur limited >>> the read to 0x3000 (offset 0x720000) >>> -> read_and_truncate_page >>> -> btrfs_read_extent_reg >>> reading the last 0x1000 bytes from offset 0x73000 (end of the >>> previous 0x4000) into 0x40ba3000 >>> here key.offset = 0x70000 so we need to use it to recover the >>> 0x3000 offset. >> >> You can use "btrfs ins dump-tree" to provide a much easier to read on-disk >> data layout. > > key.offset is the offset within the read call, not the offset on disk. > The file on disk looks perfectly normal, the condition to trigger the > bug is to have a file which size is not sector-aligned and where the > last extent is bigger than a sector. Yes, I understand the runtime offset is involved. But you're still trying to explain the situation with involved on-disk file extent, right? In that case it's way easier to go something like this: We can have a compressed file extent like this: item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 generation 7 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 32768 ram 32768 extent compression 1 (zlib) Then if we try to read file range [4K, 8K) of inode 257 in Uboot, then we got corrupted data. At least that's the preferred way to explain in btrfs community (with on-disk thus static part, then runtime part). > > I had a look at dump-tree early on and couldn't actually find my file in > there, now the problem is understood it should be easy to make a > reproducer so I'll add this info and commands needed to reproduce (+ a > link to a fs image just in case) in v2 The reproducer is super easy. # mkfs.btrfs -f $dev # mount $dev -o compress $mnt # xfs_io -f -c "pwrite -i /dev/urandom 0 16K" $mnt/file # umount $mnt Then in uboot # load host 0 $addr file 0x1000 0x1000 # md $addr And compare to regular xxd from kernel. > >>> /* Preallocated or hole , fill @dest with zero */ >>> if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || >>> @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, >>> if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { >>> u64 logical; >>> - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + >>> - btrfs_file_extent_offset(leaf, fi) + >>> - offset - key.offset; >>> + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; >> >> This is touching non-compressed path, which is very weird as your commit >> message said this part is correct. > > my rationale is that since this was considered once but forgotten the > other time it'll be easy to add yet another code path that forgets it > later, but I guess it won't change much anyway -- I'll fix the patch > making it explicit again. OK, that makes sense now. Thanks, Qu > > > Thanks,
On 2023/4/18 10:05, Dominique Martinet wrote: > Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800: >> The subject can be changed to "btrfs: fix offset when reading compressed >> extents". >> The original one is a little too generic. > > Ok. > >>> btrfs_file_read() >>> -> btrfs_read_extent_reg >>> (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time) >>> last read had 0x4000 bytes in extent, but aligned_end - cur limited >>> the read to 0x3000 (offset 0x720000) >>> -> read_and_truncate_page >>> -> btrfs_read_extent_reg >>> reading the last 0x1000 bytes from offset 0x73000 (end of the >>> previous 0x4000) into 0x40ba3000 >>> here key.offset = 0x70000 so we need to use it to recover the >>> 0x3000 offset. >> >> You can use "btrfs ins dump-tree" to provide a much easier to read on-disk >> data layout. > > key.offset is the offset within the read call, not the offset on disk. > The file on disk looks perfectly normal, the condition to trigger the > bug is to have a file which size is not sector-aligned and where the > last extent is bigger than a sector. Forgot to mention, this bug does not only affect the mentioned case, it affects all partial read of an extent. Even if it's sector aligned. > > I had a look at dump-tree early on and couldn't actually find my file in > there, now the problem is understood it should be easy to make a > reproducer so I'll add this info and commands needed to reproduce (+ a > link to a fs image just in case) in v2 > >>> /* Preallocated or hole , fill @dest with zero */ >>> if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || >>> @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, >>> if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { >>> u64 logical; >>> - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + >>> - btrfs_file_extent_offset(leaf, fi) + >>> - offset - key.offset; >>> + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; >> >> This is touching non-compressed path, which is very weird as your commit >> message said this part is correct. > > my rationale is that since this was considered once but forgotten the > other time it'll be easy to add yet another code path that forgets it > later, but I guess it won't change much anyway -- I'll fix the patch > making it explicit again. > > > Thanks,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 40025662f250..3d6e39e6544d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -443,6 +443,8 @@ int btrfs_read_extent_reg(struct btrfs_path *path, IS_ALIGNED(len, fs_info->sectorsize)); ASSERT(offset >= key.offset && offset + len <= key.offset + extent_num_bytes); + /* offset is now offset within extent */ + offset = btrfs_file_extent_offset(leaf, fi) + offset - key.offset; /* Preallocated or hole , fill @dest with zero */ if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { u64 logical; - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + - btrfs_file_extent_offset(leaf, fi) + - offset - key.offset; + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; read = len; num_copies = btrfs_num_copies(fs_info, logical, len); @@ -511,7 +511,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (ret < dsize) memset(dbuf + ret, 0, dsize - ret); /* Then copy the needed part */ - memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len); + memcpy(dest, dbuf + offset, len); ret = len; out: free(cbuf);