diff mbox series

[U-BOOT,1/3] btrfs: fix offset within btrfs_read_extent_reg()

Message ID 20230418-btrfs-extent-reads-v1-1-47ba9839f0cc@codewreck.org (mailing list archive)
State New, archived
Headers show
Series btrfs: fix and improve read code | expand

Commit Message

Dominique Martinet April 18, 2023, 1:17 a.m. UTC
From: Dominique Martinet <dominique.martinet@atmark-techno.com>

btrfs_read_extent_reg correctly computed the extent offset in the
BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset'
part correctly in the compressed case, making the function read
incorrect data.

In the case I examined, the last 4k of a file was corrupted and
contained data from a few blocks prior:
btrfs_file_read()
 -> btrfs_read_extent_reg
    (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time)
     last read had 0x4000 bytes in extent, but aligned_end - cur limited
     the read to 0x3000 (offset 0x720000)
 -> read_and_truncate_page
   -> btrfs_read_extent_reg
      reading the last 0x1000 bytes from offset 0x73000 (end of the
      previous 0x4000) into 0x40ba3000
      here key.offset = 0x70000 so we need to use it to recover the
      0x3000 offset.

Confirmed by checking data, before patch:
u-boot=> mmc load 1:1 $loadaddr boot/uImage
u-boot=> md 0x40ba0000
40ba0000: c0232ff8 c0c722cb 030e94be bf10000c    ./#.."..........
u-boot=> md 0x40ba3000
40ba3000: c0232ff8 c0c722cb 030e94be bf10000c    ./#.."..........

After patch (also confirmed with data read from linux):
u-boot=> md 0x40ba3000
40ba3000: 64cc9f03 81142256 6910000c 0c03483c    ...dV".....i<H..

Note that the code previously (before commit e3427184f38a ("fs: btrfs:
Implement btrfs_file_read()")) did not split that read in two, so
this is a regression.

Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and btrfs_read_extent_reg()")
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
 fs/btrfs/inode.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Qu Wenruo April 18, 2023, 1:58 a.m. UTC | #1
On 2023/4/18 09:17, Dominique Martinet wrote:
> From: Dominique Martinet <dominique.martinet@atmark-techno.com>

The subject can be changed to "btrfs: fix offset when reading compressed 
extents".
The original one is a little too generic.

> 
> btrfs_read_extent_reg correctly computed the extent offset in the
> BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset'
> part correctly in the compressed case, making the function read
> incorrect data.
> 
> In the case I examined, the last 4k of a file was corrupted and
> contained data from a few blocks prior:
> btrfs_file_read()
>   -> btrfs_read_extent_reg
>      (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time)
>       last read had 0x4000 bytes in extent, but aligned_end - cur limited
>       the read to 0x3000 (offset 0x720000)
>   -> read_and_truncate_page
>     -> btrfs_read_extent_reg
>        reading the last 0x1000 bytes from offset 0x73000 (end of the
>        previous 0x4000) into 0x40ba3000
>        here key.offset = 0x70000 so we need to use it to recover the
>        0x3000 offset.

You can use "btrfs ins dump-tree" to provide a much easier to read 
on-disk data layout.

And you can also add a smaller reproducer.

> 
> Confirmed by checking data, before patch:
> u-boot=> mmc load 1:1 $loadaddr boot/uImage
> u-boot=> md 0x40ba0000
> 40ba0000: c0232ff8 c0c722cb 030e94be bf10000c    ./#.."..........
> u-boot=> md 0x40ba3000
> 40ba3000: c0232ff8 c0c722cb 030e94be bf10000c    ./#.."..........
> 
> After patch (also confirmed with data read from linux):
> u-boot=> md 0x40ba3000
> 40ba3000: 64cc9f03 81142256 6910000c 0c03483c    ...dV".....i<H..
> 
> Note that the code previously (before commit e3427184f38a ("fs: btrfs:
> Implement btrfs_file_read()")) did not split that read in two, so
> this is a regression.
> 
> Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and btrfs_read_extent_reg()")
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
>   fs/btrfs/inode.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 40025662f250..3d6e39e6544d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -443,6 +443,8 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
>   	       IS_ALIGNED(len, fs_info->sectorsize));
>   	ASSERT(offset >= key.offset &&
>   	       offset + len <= key.offset + extent_num_bytes);
> +	/* offset is now offset within extent */
> +	offset = btrfs_file_extent_offset(leaf, fi) + offset - key.offset;

I prefer not to use the @offset, explained later.

>   
>   	/* Preallocated or hole , fill @dest with zero */
>   	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
> @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
>   	if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
>   		u64 logical;
>   
> -		logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
> -			  btrfs_file_extent_offset(leaf, fi) +
> -			  offset - key.offset;
> +		logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;

This is touching non-compressed path, which is very weird as your commit 
message said this part is correct.

I prefer the original one to show everything we need to take into 
consideration.

Otherwise looks good to me.

Thanks,
Qu
>   		read = len;
>   
>   		num_copies = btrfs_num_copies(fs_info, logical, len);
> @@ -511,7 +511,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
>   	if (ret < dsize)
>   		memset(dbuf + ret, 0, dsize - ret);
>   	/* Then copy the needed part */
> -	memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len);
> +	memcpy(dest, dbuf + offset, len);
>   	ret = len;
>   out:
>   	free(cbuf);
>
Dominique Martinet April 18, 2023, 2:05 a.m. UTC | #2
Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800:
> The subject can be changed to "btrfs: fix offset when reading compressed
> extents".
> The original one is a little too generic.

Ok.

> > btrfs_file_read()
> >   -> btrfs_read_extent_reg
> >      (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time)
> >       last read had 0x4000 bytes in extent, but aligned_end - cur limited
> >       the read to 0x3000 (offset 0x720000)
> >   -> read_and_truncate_page
> >     -> btrfs_read_extent_reg
> >        reading the last 0x1000 bytes from offset 0x73000 (end of the
> >        previous 0x4000) into 0x40ba3000
> >        here key.offset = 0x70000 so we need to use it to recover the
> >        0x3000 offset.
> 
> You can use "btrfs ins dump-tree" to provide a much easier to read on-disk
> data layout.

key.offset is the offset within the read call, not the offset on disk.
The file on disk looks perfectly normal, the condition to trigger the
bug is to have a file which size is not sector-aligned and where the
last extent is bigger than a sector.

I had a look at dump-tree early on and couldn't actually find my file in
there, now the problem is understood it should be easy to make a
reproducer so I'll add this info and commands needed to reproduce (+ a
link to a fs image just in case) in v2
 
> >   	/* Preallocated or hole , fill @dest with zero */
> >   	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
> > @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
> >   	if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
> >   		u64 logical;
> > -		logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
> > -			  btrfs_file_extent_offset(leaf, fi) +
> > -			  offset - key.offset;
> > +		logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;
> 
> This is touching non-compressed path, which is very weird as your commit
> message said this part is correct.

my rationale is that since this was considered once but forgotten the
other time it'll be easy to add yet another code path that forgets it
later, but I guess it won't change much anyway -- I'll fix the patch
making it explicit again.


Thanks,
Qu Wenruo April 18, 2023, 2:16 a.m. UTC | #3
On 2023/4/18 10:05, Dominique Martinet wrote:
> Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800:
>> The subject can be changed to "btrfs: fix offset when reading compressed
>> extents".
>> The original one is a little too generic.
> 
> Ok.
> 
>>> btrfs_file_read()
>>>    -> btrfs_read_extent_reg
>>>       (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time)
>>>        last read had 0x4000 bytes in extent, but aligned_end - cur limited
>>>        the read to 0x3000 (offset 0x720000)
>>>    -> read_and_truncate_page
>>>      -> btrfs_read_extent_reg
>>>         reading the last 0x1000 bytes from offset 0x73000 (end of the
>>>         previous 0x4000) into 0x40ba3000
>>>         here key.offset = 0x70000 so we need to use it to recover the
>>>         0x3000 offset.
>>
>> You can use "btrfs ins dump-tree" to provide a much easier to read on-disk
>> data layout.
> 
> key.offset is the offset within the read call, not the offset on disk.
> The file on disk looks perfectly normal, the condition to trigger the
> bug is to have a file which size is not sector-aligned and where the
> last extent is bigger than a sector.

Yes, I understand the runtime offset is involved.

But you're still trying to explain the situation with involved on-disk 
file extent, right?

In that case it's way easier to go something like this:

   We can have a compressed file extent like this:

	item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
		generation 7 type 1 (regular)
		extent data disk byte 13631488 nr 4096
		extent data offset 0 nr 32768 ram 32768
		extent compression 1 (zlib)

   Then if we try to read file range [4K, 8K) of inode 257 in Uboot, then
   we got corrupted data.

At least that's the preferred way to explain in btrfs community (with 
on-disk thus static part, then runtime part).

> 
> I had a look at dump-tree early on and couldn't actually find my file in
> there, now the problem is understood it should be easy to make a
> reproducer so I'll add this info and commands needed to reproduce (+ a
> link to a fs image just in case) in v2

The reproducer is super easy.

  # mkfs.btrfs -f $dev
  # mount $dev -o compress $mnt
  # xfs_io -f -c "pwrite -i /dev/urandom 0 16K" $mnt/file
  # umount $mnt

Then in uboot
  # load host 0 $addr file 0x1000 0x1000
  # md $addr

And compare to regular xxd from kernel.

>   
>>>    	/* Preallocated or hole , fill @dest with zero */
>>>    	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
>>> @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
>>>    	if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
>>>    		u64 logical;
>>> -		logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
>>> -			  btrfs_file_extent_offset(leaf, fi) +
>>> -			  offset - key.offset;
>>> +		logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;
>>
>> This is touching non-compressed path, which is very weird as your commit
>> message said this part is correct.
> 
> my rationale is that since this was considered once but forgotten the
> other time it'll be easy to add yet another code path that forgets it
> later, but I guess it won't change much anyway -- I'll fix the patch
> making it explicit again.

OK, that makes sense now.

Thanks,
Qu

> 
> 
> Thanks,
Qu Wenruo April 18, 2023, 2:20 a.m. UTC | #4
On 2023/4/18 10:05, Dominique Martinet wrote:
> Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800:
>> The subject can be changed to "btrfs: fix offset when reading compressed
>> extents".
>> The original one is a little too generic.
> 
> Ok.
> 
>>> btrfs_file_read()
>>>    -> btrfs_read_extent_reg
>>>       (aligned part loop from 0x40480000 to 0x40ba0000, 128KB at a time)
>>>        last read had 0x4000 bytes in extent, but aligned_end - cur limited
>>>        the read to 0x3000 (offset 0x720000)
>>>    -> read_and_truncate_page
>>>      -> btrfs_read_extent_reg
>>>         reading the last 0x1000 bytes from offset 0x73000 (end of the
>>>         previous 0x4000) into 0x40ba3000
>>>         here key.offset = 0x70000 so we need to use it to recover the
>>>         0x3000 offset.
>>
>> You can use "btrfs ins dump-tree" to provide a much easier to read on-disk
>> data layout.
> 
> key.offset is the offset within the read call, not the offset on disk.
> The file on disk looks perfectly normal, the condition to trigger the
> bug is to have a file which size is not sector-aligned and where the
> last extent is bigger than a sector.

Forgot to mention, this bug does not only affect the mentioned case, it 
affects all partial read of an extent.

Even if it's sector aligned.

> 
> I had a look at dump-tree early on and couldn't actually find my file in
> there, now the problem is understood it should be easy to make a
> reproducer so I'll add this info and commands needed to reproduce (+ a
> link to a fs image just in case) in v2
>   
>>>    	/* Preallocated or hole , fill @dest with zero */
>>>    	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
>>> @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
>>>    	if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
>>>    		u64 logical;
>>> -		logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
>>> -			  btrfs_file_extent_offset(leaf, fi) +
>>> -			  offset - key.offset;
>>> +		logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;
>>
>> This is touching non-compressed path, which is very weird as your commit
>> message said this part is correct.
> 
> my rationale is that since this was considered once but forgotten the
> other time it'll be easy to add yet another code path that forgets it
> later, but I guess it won't change much anyway -- I'll fix the patch
> making it explicit again.
> 
> 
> Thanks,
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40025662f250..3d6e39e6544d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -443,6 +443,8 @@  int btrfs_read_extent_reg(struct btrfs_path *path,
 	       IS_ALIGNED(len, fs_info->sectorsize));
 	ASSERT(offset >= key.offset &&
 	       offset + len <= key.offset + extent_num_bytes);
+	/* offset is now offset within extent */
+	offset = btrfs_file_extent_offset(leaf, fi) + offset - key.offset;
 
 	/* Preallocated or hole , fill @dest with zero */
 	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
@@ -454,9 +456,7 @@  int btrfs_read_extent_reg(struct btrfs_path *path,
 	if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
 		u64 logical;
 
-		logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
-			  btrfs_file_extent_offset(leaf, fi) +
-			  offset - key.offset;
+		logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;
 		read = len;
 
 		num_copies = btrfs_num_copies(fs_info, logical, len);
@@ -511,7 +511,7 @@  int btrfs_read_extent_reg(struct btrfs_path *path,
 	if (ret < dsize)
 		memset(dbuf + ret, 0, dsize - ret);
 	/* Then copy the needed part */
-	memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len);
+	memcpy(dest, dbuf + offset, len);
 	ret = len;
 out:
 	free(cbuf);