diff mbox series

[1/2] btrfs-progs: convert: Prevent bit overflow for cctx->total_bytes

Message ID 20200720125109.93970-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] btrfs-progs: convert: Prevent bit overflow for cctx->total_bytes | expand

Commit Message

Qu Wenruo July 20, 2020, 12:51 p.m. UTC
[BUG]
When convert is called on a 64GiB ext4 fs, it fails like this:

  $ btrfs-convert  /dev/loop0p1
  create btrfs filesystem:
          blocksize: 4096
          nodesize:  16384
          features:  extref, skinny-metadata (default)
          checksum:  crc32c
  creating ext2 image file
  ERROR: missing data block for bytenr 1048576
  ERROR: failed to create ext2_saved/image: -2
  WARNING: an error occurred during conversion, filesystem is partially created but not finalized and not mountable

Btrfs-convert also corrupts the source fs:
  $ LANG=C e2fsck /dev/loop0p1 -f
  e2fsck 1.45.6 (20-Mar-2020)
  Resize inode not valid.  Recreate<y>? yes
  Pass 1: Checking inodes, blocks, and sizes
  Deleted inode 3681 has zero dtime.  Fix<y>? yes
  Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
  Inode 3744 was part of the orphaned inode list.  FIXED.
  Deleted inode 3745 has zero dtime.  Fix<y>? yes
  Inode 3747 has INLINE_DATA_FL flag on filesystem without inline data support.
  Clear<y>? yes
  ...

[CAUSE]
After some debugging, the first strange behavior is, the value of
cctx->total_bytes is 0 in ext2_open_fs().

It turns out that, the value assign for cctx->total_bytes could lead to
bit overflow for the unsigned int value.

And that 0 cctx->total_bytes leads to vairous problems for later free
space calculation.
For example, in calculate_available_space(), we use cctx->total_bytes to
ensure we won't create a data chunk beyond device end:

		cue_len = min(cctx->total_bytes - cur_off, cur_len);

If that cur_offset is also 0, we will create a cache_extent with 0 size,
which could cause a lot of problems for cache tree search.

[FIX]
Do manual casting for the multiply operation, so we could got a real u64
result.
The fix will be applied to all supported fses (ext* and reiserfs).

Reported-by: Christian Zangl <coralllama@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 convert/main.c            | 1 +
 convert/source-ext2.c     | 3 ++-
 convert/source-reiserfs.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

Comments

Nikolay Borisov July 20, 2020, 12:53 p.m. UTC | #1
On 20.07.20 г. 15:51 ч., Qu Wenruo wrote:
> [BUG]
> When convert is called on a 64GiB ext4 fs, it fails like this:
> 
>   $ btrfs-convert  /dev/loop0p1
>   create btrfs filesystem:
>           blocksize: 4096
>           nodesize:  16384
>           features:  extref, skinny-metadata (default)
>           checksum:  crc32c
>   creating ext2 image file
>   ERROR: missing data block for bytenr 1048576
>   ERROR: failed to create ext2_saved/image: -2
>   WARNING: an error occurred during conversion, filesystem is partially created but not finalized and not mountable
> 
> Btrfs-convert also corrupts the source fs:
>   $ LANG=C e2fsck /dev/loop0p1 -f
>   e2fsck 1.45.6 (20-Mar-2020)
>   Resize inode not valid.  Recreate<y>? yes
>   Pass 1: Checking inodes, blocks, and sizes
>   Deleted inode 3681 has zero dtime.  Fix<y>? yes
>   Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
>   Inode 3744 was part of the orphaned inode list.  FIXED.
>   Deleted inode 3745 has zero dtime.  Fix<y>? yes
>   Inode 3747 has INLINE_DATA_FL flag on filesystem without inline data support.
>   Clear<y>? yes
>   ...
> 
> [CAUSE]
> After some debugging, the first strange behavior is, the value of
> cctx->total_bytes is 0 in ext2_open_fs().
> 
> It turns out that, the value assign for cctx->total_bytes could lead to
> bit overflow for the unsigned int value.
> 
> And that 0 cctx->total_bytes leads to vairous problems for later free
> space calculation.
> For example, in calculate_available_space(), we use cctx->total_bytes to
> ensure we won't create a data chunk beyond device end:
> 
> 		cue_len = min(cctx->total_bytes - cur_off, cur_len);
> 
> If that cur_offset is also 0, we will create a cache_extent with 0 size,
> which could cause a lot of problems for cache tree search.
> 
> [FIX]
> Do manual casting for the multiply operation, so we could got a real u64
> result.
> The fix will be applied to all supported fses (ext* and reiserfs).
> 
> Reported-by: Christian Zangl <coralllama@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
David Sterba July 20, 2020, 4:09 p.m. UTC | #2
On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
> --- a/convert/source-ext2.c
> +++ b/convert/source-ext2.c
> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
>  	cctx->fs_data = ext2_fs;
>  	cctx->blocksize = ext2_fs->blocksize;
>  	cctx->block_count = ext2_fs->super->s_blocks_count;
> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
> +			    (u64)ext2_fs->super->s_blocks_count;

Do you need to cast both? Once one of the types is wide enough for the
result, there should be no loss.
Qu Wenruo July 20, 2020, 11:51 p.m. UTC | #3
On 2020/7/21 上午12:09, David Sterba wrote:
> On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
>> --- a/convert/source-ext2.c
>> +++ b/convert/source-ext2.c
>> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
>>  	cctx->fs_data = ext2_fs;
>>  	cctx->blocksize = ext2_fs->blocksize;
>>  	cctx->block_count = ext2_fs->super->s_blocks_count;
>> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
>> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
>> +			    (u64)ext2_fs->super->s_blocks_count;
> 
> Do you need to cast both? Once one of the types is wide enough for the
> result, there should be no loss.
> 
I just want to be extra safe.

Feel free to reduce one.

Thanks,
Qu
David Sterba July 21, 2020, 9:58 a.m. UTC | #4
On Tue, Jul 21, 2020 at 07:51:00AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/7/21 上午12:09, David Sterba wrote:
> > On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
> >> --- a/convert/source-ext2.c
> >> +++ b/convert/source-ext2.c
> >> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
> >>  	cctx->fs_data = ext2_fs;
> >>  	cctx->blocksize = ext2_fs->blocksize;
> >>  	cctx->block_count = ext2_fs->super->s_blocks_count;
> >> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
> >> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
> >> +			    (u64)ext2_fs->super->s_blocks_count;
> > 
> > Do you need to cast both? Once one of the types is wide enough for the
> > result, there should be no loss.
> > 
> I just want to be extra safe.

Typecasts in code raise questions why are they needed, 'to be extra'
safe is not a good reason. One typecast in multiplication/shifts is a
common pattern to widen the result but two look more like lack of
understanding of the integer promotion rules.
Qu Wenruo July 21, 2020, 10:29 a.m. UTC | #5
On 2020/7/21 下午5:58, David Sterba wrote:
> On Tue, Jul 21, 2020 at 07:51:00AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/7/21 上午12:09, David Sterba wrote:
>>> On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
>>>> --- a/convert/source-ext2.c
>>>> +++ b/convert/source-ext2.c
>>>> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
>>>>  	cctx->fs_data = ext2_fs;
>>>>  	cctx->blocksize = ext2_fs->blocksize;
>>>>  	cctx->block_count = ext2_fs->super->s_blocks_count;
>>>> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
>>>> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
>>>> +			    (u64)ext2_fs->super->s_blocks_count;
>>>
>>> Do you need to cast both? Once one of the types is wide enough for the
>>> result, there should be no loss.
>>>
>> I just want to be extra safe.
> 
> Typecasts in code raise questions why are they needed, 'to be extra'
> safe is not a good reason. One typecast in multiplication/shifts is a
> common pattern to widen the result but two look more like lack of
> understanding of the integer promotion rules.
> 

My point here is, I don't want the reviewers or new contributors to
bother about the promotion rules at all.

They only need to know that using blocksize and blocks_count directly to
do multiply would lead to overflow.

Other details like whether the multiply follows the highest factor or
the left operator or the right operator, shouldn't be the point and we
don't really need to bother.

Thus casting both would definitely be right, without the need to refer
to the complex rule book, thus save the reviewer several minutes.

Thanks,
Qu
David Sterba July 21, 2020, 1:55 p.m. UTC | #6
On Tue, Jul 21, 2020 at 06:29:31PM +0800, Qu Wenruo wrote:
> On 2020/7/21 下午5:58, David Sterba wrote:
> > On Tue, Jul 21, 2020 at 07:51:00AM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2020/7/21 上午12:09, David Sterba wrote:
> >>> On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
> >>>> --- a/convert/source-ext2.c
> >>>> +++ b/convert/source-ext2.c
> >>>> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
> >>>>  	cctx->fs_data = ext2_fs;
> >>>>  	cctx->blocksize = ext2_fs->blocksize;
> >>>>  	cctx->block_count = ext2_fs->super->s_blocks_count;
> >>>> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
> >>>> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
> >>>> +			    (u64)ext2_fs->super->s_blocks_count;
> >>>
> >>> Do you need to cast both? Once one of the types is wide enough for the
> >>> result, there should be no loss.
> >>>
> >> I just want to be extra safe.
> > 
> > Typecasts in code raise questions why are they needed, 'to be extra'
> > safe is not a good reason. One typecast in multiplication/shifts is a
> > common pattern to widen the result but two look more like lack of
> > understanding of the integer promotion rules.
> 
> My point here is, I don't want the reviewers or new contributors to
> bother about the promotion rules at all.

Ouch, I hope you don't mean that contributors should ignore the trickier
parts of C language. Especially reviewers _have_ to bother about all
sorts of subtle behaviour.

> They only need to know that using blocksize and blocks_count directly to
> do multiply would lead to overflow.
> 
> Other details like whether the multiply follows the highest factor or
> the left operator or the right operator, shouldn't be the point and we
> don't really need to bother.

... and introduce bugs?

> Thus casting both would definitely be right, without the need to refer
> to the complex rule book, thus save the reviewer several minutes.

The opposite, if you send me code that's not following known schemes or
idiomatic schemes I'll be highly suspicious and looking for the reasons
why it's that way and making sure it's correct costs way more time.
Stefan Traby July 21, 2020, 1:57 p.m. UTC | #7
On Tue, Jul 21, 2020 at 06:29:31PM +0800, Qu Wenruo wrote:
> On 2020/7/21 下午5:58, David Sterba wrote:
> > On Tue, Jul 21, 2020 at 07:51:00AM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2020/7/21 上午12:09, David Sterba wrote:
> >>> On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
> >>>> --- a/convert/source-ext2.c
> >>>> +++ b/convert/source-ext2.c
> >>>> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
> >>>>  	cctx->fs_data = ext2_fs;
> >>>>  	cctx->blocksize = ext2_fs->blocksize;
> >>>>  	cctx->block_count = ext2_fs->super->s_blocks_count;
> >>>> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
> >>>> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
> >>>> +			    (u64)ext2_fs->super->s_blocks_count;
> >>>
> >>> Do you need to cast both? Once one of the types is wide enough for the
> >>> result, there should be no loss.
> >>>
> >> I just want to be extra safe.
> > 
> > Typecasts in code raise questions why are they needed, 'to be extra'
> > safe is not a good reason. One typecast in multiplication/shifts is a
> > common pattern to widen the result but two look more like lack of
> > understanding of the integer promotion rules.
> > 
> 
> My point here is, I don't want the reviewers or new contributors to
> bother about the promotion rules at all.

Reviewers and contributors need to understand the rules of C.

> They only need to know that using blocksize and blocks_count directly to
> do multiply would lead to overflow.
> 
> Other details like whether the multiply follows the highest factor or
> the left operator or the right operator, shouldn't be the point and we
> don't really need to bother.

This is like suggesting to use brackets on every int-expression
like ((5*3)+7) because the precedere rules are too complex.
Do you think that writing ((5*3)+7) is "extra safe"?

> Thus casting both would definitely be right, without the need to refer
> to the complex rule book, thus save the reviewer several minutes.

I don't think so.
Qu Wenruo July 21, 2020, 10:58 p.m. UTC | #8
On 2020/7/21 下午9:55, David Sterba wrote:
> On Tue, Jul 21, 2020 at 06:29:31PM +0800, Qu Wenruo wrote:
>> On 2020/7/21 下午5:58, David Sterba wrote:
>>> On Tue, Jul 21, 2020 at 07:51:00AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2020/7/21 上午12:09, David Sterba wrote:
>>>>> On Mon, Jul 20, 2020 at 08:51:08PM +0800, Qu Wenruo wrote:
>>>>>> --- a/convert/source-ext2.c
>>>>>> +++ b/convert/source-ext2.c
>>>>>> @@ -87,7 +87,8 @@ static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
>>>>>>  	cctx->fs_data = ext2_fs;
>>>>>>  	cctx->blocksize = ext2_fs->blocksize;
>>>>>>  	cctx->block_count = ext2_fs->super->s_blocks_count;
>>>>>> -	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
>>>>>> +	cctx->total_bytes = (u64)ext2_fs->blocksize *
>>>>>> +			    (u64)ext2_fs->super->s_blocks_count;
>>>>>
>>>>> Do you need to cast both? Once one of the types is wide enough for the
>>>>> result, there should be no loss.
>>>>>
>>>> I just want to be extra safe.
>>>
>>> Typecasts in code raise questions why are they needed, 'to be extra'
>>> safe is not a good reason. One typecast in multiplication/shifts is a
>>> common pattern to widen the result but two look more like lack of
>>> understanding of the integer promotion rules.
>>
>> My point here is, I don't want the reviewers or new contributors to
>> bother about the promotion rules at all.
>
> Ouch, I hope you don't mean that contributors should ignore the trickier
> parts of C language. Especially reviewers _have_ to bother about all
> sorts of subtle behaviour.
>
>> They only need to know that using blocksize and blocks_count directly to
>> do multiply would lead to overflow.
>>
>> Other details like whether the multiply follows the highest factor or
>> the left operator or the right operator, shouldn't be the point and we
>> don't really need to bother.
>
> ... and introduce bugs?
>
>> Thus casting both would definitely be right, without the need to refer
>> to the complex rule book, thus save the reviewer several minutes.
>
> The opposite, if you send me code that's not following known schemes or
> idiomatic schemes I'll be highly suspicious and looking for the reasons
> why it's that way and making sure it's correct costs way more time.
>
OK, then would you please remove one casting at merge time, or do I need
to resend?

Thanks,
Qu
David Sterba July 22, 2020, 11:32 a.m. UTC | #9
On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
> >> Thus casting both would definitely be right, without the need to refer
> >> to the complex rule book, thus save the reviewer several minutes.
> >
> > The opposite, if you send me code that's not following known schemes or
> > idiomatic schemes I'll be highly suspicious and looking for the reasons
> > why it's that way and making sure it's correct costs way more time.
> >
> OK, then would you please remove one casting at merge time, or do I need
> to resend?

Yeah, I fix such things routinely no need to resend.
Neal Gompa July 23, 2020, 1:31 p.m. UTC | #10
On Wed, Jul 22, 2020 at 7:33 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
> > >> Thus casting both would definitely be right, without the need to refer
> > >> to the complex rule book, thus save the reviewer several minutes.
> > >
> > > The opposite, if you send me code that's not following known schemes or
> > > idiomatic schemes I'll be highly suspicious and looking for the reasons
> > > why it's that way and making sure it's correct costs way more time.
> > >
> > OK, then would you please remove one casting at merge time, or do I need
> > to resend?
>
> Yeah, I fix such things routinely no need to resend.

I have a report[1] that seems to look like this patch solves it, is
that correct?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c7
Qu Wenruo July 24, 2020, 12:01 a.m. UTC | #11
On 2020/7/23 下午9:31, Neal Gompa wrote:
> On Wed, Jul 22, 2020 at 7:33 AM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
>>>>> Thus casting both would definitely be right, without the need to refer
>>>>> to the complex rule book, thus save the reviewer several minutes.
>>>>
>>>> The opposite, if you send me code that's not following known schemes or
>>>> idiomatic schemes I'll be highly suspicious and looking for the reasons
>>>> why it's that way and making sure it's correct costs way more time.
>>>>
>>> OK, then would you please remove one casting at merge time, or do I need
>>> to resend?
>>
>> Yeah, I fix such things routinely no need to resend.
> 
> I have a report[1] that seems to look like this patch solves it, is
> that correct?
> 
> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c7
> 
Yep, looks like the same bug.

Thanks,
Qu
Neal Gompa July 28, 2020, 1:14 p.m. UTC | #12
On Thu, Jul 23, 2020 at 8:02 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/7/23 下午9:31, Neal Gompa wrote:
> > On Wed, Jul 22, 2020 at 7:33 AM David Sterba <dsterba@suse.cz> wrote:
> >>
> >> On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
> >>>>> Thus casting both would definitely be right, without the need to refer
> >>>>> to the complex rule book, thus save the reviewer several minutes.
> >>>>
> >>>> The opposite, if you send me code that's not following known schemes or
> >>>> idiomatic schemes I'll be highly suspicious and looking for the reasons
> >>>> why it's that way and making sure it's correct costs way more time.
> >>>>
> >>> OK, then would you please remove one casting at merge time, or do I need
> >>> to resend?
> >>
> >> Yeah, I fix such things routinely no need to resend.
> >
> > I have a report[1] that seems to look like this patch solves it, is
> > that correct?
> >
> > [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c7
> >
> Yep, looks like the same bug.
>

So I backported this fix into btrfs-progs-5.7-4.fc32[1], and the
reporter is still seeing issues[2].

Pasting from the bug comment[2]:

[liveuser@localhost-live ~]$ sudo btrfs-convert /dev/sda2
create btrfs filesystem:
    blocksize: 4096
    nodesize:  16384
    features:  extref, skinny-metadata (default)
    checksum:  crc32c
creating ext2 image file
creating btrfs metadata
convert/source-ext2.c:845: ext2_copy_inodes: BUG_ON `ret` triggered, value -28
btrfs-convert(+0x13589)[0x558cfe4a6589]
btrfs-convert(+0x14549)[0x558cfe4a7549]
btrfs-convert(main+0xfc0)[0x558cfe4a22b0]
/lib64/libc.so.6(__libc_start_main+0xf2)[0x7fd2c6666042]
btrfs-convert(_start+0x2e)[0x558cfe4a35ce]
Aborted

Any idea what's going on here?


[1]: https://src.fedoraproject.org/rpms/btrfs-progs/c/e4a3d39e87313954cb3e8214e28ee0ba50bee874
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c13
Qu Wenruo July 28, 2020, 1:19 p.m. UTC | #13
On 2020/7/28 下午9:14, Neal Gompa wrote:
> On Thu, Jul 23, 2020 at 8:02 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/7/23 下午9:31, Neal Gompa wrote:
>>> On Wed, Jul 22, 2020 at 7:33 AM David Sterba <dsterba@suse.cz> wrote:
>>>>
>>>> On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
>>>>>>> Thus casting both would definitely be right, without the need to refer
>>>>>>> to the complex rule book, thus save the reviewer several minutes.
>>>>>>
>>>>>> The opposite, if you send me code that's not following known schemes or
>>>>>> idiomatic schemes I'll be highly suspicious and looking for the reasons
>>>>>> why it's that way and making sure it's correct costs way more time.
>>>>>>
>>>>> OK, then would you please remove one casting at merge time, or do I need
>>>>> to resend?
>>>>
>>>> Yeah, I fix such things routinely no need to resend.
>>>
>>> I have a report[1] that seems to look like this patch solves it, is
>>> that correct?
>>>
>>> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c7
>>>
>> Yep, looks like the same bug.
>>
> 
> So I backported this fix into btrfs-progs-5.7-4.fc32[1], and the
> reporter is still seeing issues[2].
> 
> Pasting from the bug comment[2]:
> 
> [liveuser@localhost-live ~]$ sudo btrfs-convert /dev/sda2
> create btrfs filesystem:
>     blocksize: 4096
>     nodesize:  16384
>     features:  extref, skinny-metadata (default)
>     checksum:  crc32c
> creating ext2 image file
> creating btrfs metadata
> convert/source-ext2.c:845: ext2_copy_inodes: BUG_ON `ret` triggered, value -28

This means we have no space left.

We don't know if it's the fs already exhausted (little space left for
EXT*), or it's btrfs' extent allocator not working.

Would it possible to update the image?

BTW, even btrfs-convert crashed, the fs should be completely fine, just
as nothing happened to it (from the point of view of ext*)

Thanks,
Qu

> btrfs-convert(+0x13589)[0x558cfe4a6589]
> btrfs-convert(+0x14549)[0x558cfe4a7549]
> btrfs-convert(main+0xfc0)[0x558cfe4a22b0]
> /lib64/libc.so.6(__libc_start_main+0xf2)[0x7fd2c6666042]
> btrfs-convert(_start+0x2e)[0x558cfe4a35ce]
> Aborted
> 
> Any idea what's going on here?
> 
> 
> [1]: https://src.fedoraproject.org/rpms/btrfs-progs/c/e4a3d39e87313954cb3e8214e28ee0ba50bee874
> [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c13
>
Neal Gompa July 29, 2020, 1:56 a.m. UTC | #14
On Tue, Jul 28, 2020 at 9:20 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/7/28 下午9:14, Neal Gompa wrote:
> > On Thu, Jul 23, 2020 at 8:02 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2020/7/23 下午9:31, Neal Gompa wrote:
> >>> On Wed, Jul 22, 2020 at 7:33 AM David Sterba <dsterba@suse.cz> wrote:
> >>>>
> >>>> On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
> >>>>>>> Thus casting both would definitely be right, without the need to refer
> >>>>>>> to the complex rule book, thus save the reviewer several minutes.
> >>>>>>
> >>>>>> The opposite, if you send me code that's not following known schemes or
> >>>>>> idiomatic schemes I'll be highly suspicious and looking for the reasons
> >>>>>> why it's that way and making sure it's correct costs way more time.
> >>>>>>
> >>>>> OK, then would you please remove one casting at merge time, or do I need
> >>>>> to resend?
> >>>>
> >>>> Yeah, I fix such things routinely no need to resend.
> >>>
> >>> I have a report[1] that seems to look like this patch solves it, is
> >>> that correct?
> >>>
> >>> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c7
> >>>
> >> Yep, looks like the same bug.
> >>
> >
> > So I backported this fix into btrfs-progs-5.7-4.fc32[1], and the
> > reporter is still seeing issues[2].
> >
> > Pasting from the bug comment[2]:
> >
> > [liveuser@localhost-live ~]$ sudo btrfs-convert /dev/sda2
> > create btrfs filesystem:
> >     blocksize: 4096
> >     nodesize:  16384
> >     features:  extref, skinny-metadata (default)
> >     checksum:  crc32c
> > creating ext2 image file
> > creating btrfs metadata
> > convert/source-ext2.c:845: ext2_copy_inodes: BUG_ON `ret` triggered, value -28
>
> This means we have no space left.
>
> We don't know if it's the fs already exhausted (little space left for
> EXT*), or it's btrfs' extent allocator not working.
>
> Would it possible to update the image?
>

I'm not sure what you're asking here? Do you mean update the live
environment? You can boot a Fedora 32 live environment and update the
btrfs-progs package before using it like the bug reporter did...

> BTW, even btrfs-convert crashed, the fs should be completely fine, just
> as nothing happened to it (from the point of view of ext*)
>

I figured as much, but we shouldn't have a case where btrfs-convert
crashes like this...





--
真実はいつも一つ!/ Always, there's only one truth!
Qu Wenruo July 29, 2020, 2:30 a.m. UTC | #15
On 2020/7/29 上午9:56, Neal Gompa wrote:
> On Tue, Jul 28, 2020 at 9:20 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/7/28 下午9:14, Neal Gompa wrote:
>>> On Thu, Jul 23, 2020 at 8:02 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/7/23 下午9:31, Neal Gompa wrote:
>>>>> On Wed, Jul 22, 2020 at 7:33 AM David Sterba <dsterba@suse.cz> wrote:
>>>>>>
>>>>>> On Wed, Jul 22, 2020 at 06:58:39AM +0800, Qu Wenruo wrote:
>>>>>>>>> Thus casting both would definitely be right, without the need to refer
>>>>>>>>> to the complex rule book, thus save the reviewer several minutes.
>>>>>>>>
>>>>>>>> The opposite, if you send me code that's not following known schemes or
>>>>>>>> idiomatic schemes I'll be highly suspicious and looking for the reasons
>>>>>>>> why it's that way and making sure it's correct costs way more time.
>>>>>>>>
>>>>>>> OK, then would you please remove one casting at merge time, or do I need
>>>>>>> to resend?
>>>>>>
>>>>>> Yeah, I fix such things routinely no need to resend.
>>>>>
>>>>> I have a report[1] that seems to look like this patch solves it, is
>>>>> that correct?
>>>>>
>>>>> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1851674#c7
>>>>>
>>>> Yep, looks like the same bug.
>>>>
>>>
>>> So I backported this fix into btrfs-progs-5.7-4.fc32[1], and the
>>> reporter is still seeing issues[2].
>>>
>>> Pasting from the bug comment[2]:
>>>
>>> [liveuser@localhost-live ~]$ sudo btrfs-convert /dev/sda2
>>> create btrfs filesystem:
>>>     blocksize: 4096
>>>     nodesize:  16384
>>>     features:  extref, skinny-metadata (default)
>>>     checksum:  crc32c
>>> creating ext2 image file
>>> creating btrfs metadata
>>> convert/source-ext2.c:845: ext2_copy_inodes: BUG_ON `ret` triggered, value -28
>>
>> This means we have no space left.
>>
>> We don't know if it's the fs already exhausted (little space left for
>> EXT*), or it's btrfs' extent allocator not working.
>>
>> Would it possible to update the image?
>>
> 
> I'm not sure what you're asking here? Do you mean update the live
> environment? You can boot a Fedora 32 live environment and update the
> btrfs-progs package before using it like the bug reporter did...

I mean, upload the binary dump or e2image dump of /dev/sda2.
e2image -Q would be much smaller and without any data stored in it.

Sometimes that's the easiest way to debug.
Or we need to add tons of probes to btrfs-convert and let the reporter
to try-and-report to pin down the bug.

> 
>> BTW, even btrfs-convert crashed, the fs should be completely fine, just
>> as nothing happened to it (from the point of view of ext*)
>>
> 
> I figured as much, but we shouldn't have a case where btrfs-convert
> crashes like this...

Yep, the BUG_ON() should be removed and replaced with a more graceful exit.

I'll do that soon.

Thanks,
Qu
diff mbox series

Patch

diff --git a/convert/main.c b/convert/main.c
index 7709e9a6c085..df6a2ae32722 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1136,6 +1136,7 @@  static int do_convert(const char *devname, u32 convert_flags, u32 nodesize,
 	if (ret)
 		goto fail;
 
+	ASSERT(cctx.total_bytes);
 	blocksize = cctx.blocksize;
 	total_bytes = (u64)blocksize * (u64)cctx.block_count;
 	if (blocksize < 4096) {
diff --git a/convert/source-ext2.c b/convert/source-ext2.c
index f11ef651245a..a5e3a726863f 100644
--- a/convert/source-ext2.c
+++ b/convert/source-ext2.c
@@ -87,7 +87,8 @@  static int ext2_open_fs(struct btrfs_convert_context *cctx, const char *name)
 	cctx->fs_data = ext2_fs;
 	cctx->blocksize = ext2_fs->blocksize;
 	cctx->block_count = ext2_fs->super->s_blocks_count;
-	cctx->total_bytes = ext2_fs->blocksize * ext2_fs->super->s_blocks_count;
+	cctx->total_bytes = (u64)ext2_fs->blocksize *
+			    (u64)ext2_fs->super->s_blocks_count;
 	cctx->volume_name = strndup((char *)ext2_fs->super->s_volume_name, 16);
 	cctx->first_data_block = ext2_fs->super->s_first_data_block;
 	cctx->inodes_count = ext2_fs->super->s_inodes_count;
diff --git a/convert/source-reiserfs.c b/convert/source-reiserfs.c
index 9fd6b9abb9b4..8d9752f06ca9 100644
--- a/convert/source-reiserfs.c
+++ b/convert/source-reiserfs.c
@@ -82,7 +82,7 @@  static int reiserfs_open_fs(struct btrfs_convert_context *cxt, const char *name)
 	cxt->fs_data = fs;
 	cxt->blocksize = fs->fs_blocksize;
 	cxt->block_count = get_sb_block_count(fs->fs_ondisk_sb);
-	cxt->total_bytes = cxt->blocksize * cxt->block_count;
+	cxt->total_bytes = (u64)cxt->blocksize * (u64)cxt->block_count;
 	cxt->volume_name = strndup(fs->fs_ondisk_sb->s_label, 16);
 	cxt->first_data_block = 0;
 	cxt->inodes_count = reiserfs_count_objectids(fs);