diff mbox

kernel BUG at fs/btrfs/extent-tree.c:8113! (4.1.3 kernel)

Message ID 55CB71E5.1070302@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Aug. 12, 2015, 4:18 p.m. UTC
On 08/12/2015 12:09 PM, Marc MERLIN wrote:
> On Wed, Aug 12, 2015 at 11:15:39AM -0400, Josef Bacik wrote:
>> On 08/12/2015 10:47 AM, Marc MERLIN wrote:
>>> On Tue, Aug 11, 2015 at 11:40:45AM -0400, Josef Bacik wrote:
>>>>  From a48cf7a9ae44a17d927df5542c8b0be287aee9ed Mon Sep 17 00:00:00 2001
>>>> From: Josef Bacik <jbacik@fb.com>
>>>> Date: Tue, 11 Aug 2015 11:39:37 -0400
>>>> Subject: [PATCH] Btrfs: kill BUG_ON() in btrfs_lookup_extent_info()
>>>>
>>>> Replace it with an ASSERT(0) for the developers and an error for not the
>>>> developers.
>>>
>>> Thanks. We knocked one down and now another BUG has been triggered :)
>>>
>>> 	if (unlikely(wc->refs[level - 1] == 0)) {
>>> 		btrfs_err(root->fs_info, "Missing references.");
>>> 		BUG();
>>> 	}
>>>
>>
>> This is why you got your own branch, it's never just one.  Here's
>> the next bit
>
> Yes, I figured there might be a few more :)
> Thanks for this patch, it definitely made things better:
>
> [  165.656408] BTRFS info (device dm-0): disk space caching is enabled
> [  205.528199] BTRFS error (device dm-0): Missing references.
> [  205.528216] BTRFS: error (device dm-0) in btrfs_drop_snapshot:8652: errno=-5 IO failure
> [  205.528225] BTRFS info (device dm-0): forced readonly
>
> That's perfect, thanks much for that.
>
> Now, back to check --repair, does it make sense to fix it too so that it doesn't crash either?
>
> myth:~#  btrfs check --repair /dev/mapper/crypt_sdd1
> enabling repair mode
> Checking filesystem on /dev/mapper/crypt_sdd1
> UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
> checking extents
> cmds-check.c:4486: add_data_backref: Assertion `back->bytes != max_size` failed.
> btrfs[0x8066a73]
> btrfs[0x8066aa4]
> btrfs[0x8067991]
> btrfs[0x806b4ab]
> btrfs[0x806b9a3]
> btrfs[0x806c5b2]
> btrfs(cmd_check+0x1088)[0x806eddf]
> btrfs(main+0x153)[0x80557c6]
> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75784d3]
> btrfs[0x80557ec]
>

Going to need more info to figure this one out


 From d77cd13f94fae6d995f753f3de3728c4ef4f8e75 Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik@fb.com>
Date: Wed, 12 Aug 2015 12:18:01 -0400
Subject: [PATCH] some debugging

---
  cmds-check.c | 2 ++
  1 file changed, 2 insertions(+)

  		back->found_ref += 1;

Comments

Marc MERLIN Aug. 12, 2015, 5:19 p.m. UTC | #1
On Wed, Aug 12, 2015 at 12:18:45PM -0400, Josef Bacik wrote:
> Going to need more info to figure this one out
 
Thanks for the patch, here's the output:
enabling repair mode
Checking filesystem on /dev/mapper/crypt_sdd1
UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
checking extents
wtf, parent 575708413952 <<<<<<
cmds-check.c:4488: add_data_backref: Assertion `back->bytes != max_size` failed.
/tmp/btrfs[0x8066a83]
/tmp/btrfs[0x8066ab4]
/tmp/btrfs[0x80679d8]
/tmp/btrfs[0x806b4f2]
/tmp/btrfs[0x806b9ea]
/tmp/btrfs[0x806c5f9]
/tmp/btrfs(cmd_check+0x1088)[0x806ee26]
/tmp/btrfs(main+0x153)[0x80557d6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75a54d3]
/tmp/btrfs[0x80557fc]

Marc
Qu Wenruo Aug. 17, 2015, 2:01 a.m. UTC | #2
Hi Marc,

Did btrfs-debug-tree also has the crash?

If not, would you please attach the output if it doesn't contain 
classified data.

Thanks,
Qu

Marc MERLIN wrote on 2015/08/12 10:19 -0700:
> On Wed, Aug 12, 2015 at 12:18:45PM -0400, Josef Bacik wrote:
>> Going to need more info to figure this one out
>
> Thanks for the patch, here's the output:
> enabling repair mode
> Checking filesystem on /dev/mapper/crypt_sdd1
> UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
> checking extents
> wtf, parent 575708413952 <<<<<<
> cmds-check.c:4488: add_data_backref: Assertion `back->bytes != max_size` failed.
> /tmp/btrfs[0x8066a83]
> /tmp/btrfs[0x8066ab4]
> /tmp/btrfs[0x80679d8]
> /tmp/btrfs[0x806b4f2]
> /tmp/btrfs[0x806b9ea]
> /tmp/btrfs[0x806c5f9]
> /tmp/btrfs(cmd_check+0x1088)[0x806ee26]
> /tmp/btrfs(main+0x153)[0x80557d6]
> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75a54d3]
> /tmp/btrfs[0x80557fc]
>
> Marc
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc MERLIN Aug. 17, 2015, 2:49 p.m. UTC | #3
On Mon, Aug 17, 2015 at 10:01:16AM +0800, Qu Wenruo wrote:
> Hi Marc,
 
Hi Qu, thanks for your answer and looking at this.

> Did btrfs-debug-tree also has the crash?
> 
> If not, would you please attach the output if it doesn't contain
> classified data.
 
Sure thing:
btrfs-debug-tree /dev/mapper/crypt_sdd1 > /tmp/tree.out
parent transid verify failed on 2968115101696 wanted 34855 found 39533
parent transid verify failed on 2968115101696 wanted 34855 found 39533
parent transid verify failed on 2968115101696 wanted 34855 found 39533
parent transid verify failed on 2968115101696 wanted 34855 found 39533
Ignoring transid failure
parent transid verify failed on 2968115134464 wanted 34855 found 39533
parent transid verify failed on 2968115134464 wanted 34855 found 39533
parent transid verify failed on 2968115134464 wanted 34855 found 39533
parent transid verify failed on 2968115134464 wanted 34855 found 39533
Ignoring transid failure
parent transid verify failed on 2968115150848 wanted 34855 found 39533
parent transid verify failed on 2968115150848 wanted 34855 found 39533
parent transid verify failed on 2968115150848 wanted 34855 found 39533
parent transid verify failed on 2968115150848 wanted 34855 found 39533
Ignoring transid failure
parent transid verify failed on 2968115691520 wanted 34855 found 39533
parent transid verify failed on 2968115691520 wanted 34855 found 39533
parent transid verify failed on 2968115691520 wanted 34855 found 39533
parent transid verify failed on 2968115691520 wanted 34855 found 39533
Ignoring transid failure
parent transid verify failed on 1291597152256 wanted 35830 found 39530
parent transid verify failed on 1291597152256 wanted 35830 found 39530
parent transid verify failed on 1291597152256 wanted 35830 found 39530
parent transid verify failed on 1291597152256 wanted 35830 found 39530
Ignoring transid failure
parent transid verify failed on 2968116592640 wanted 34855 found 39533
parent transid verify failed on 2968116592640 wanted 34855 found 39533
parent transid verify failed on 2968116592640 wanted 34855 found 39533
parent transid verify failed on 2968116592640 wanted 34855 found 39533
Ignoring transid failure
parent transid verify failed on 2968116609024 wanted 34855 found 39533
parent transid verify failed on 2968116609024 wanted 34855 found 39533
parent transid verify failed on 2968116609024 wanted 34855 found 39533
parent transid verify failed on 2968116609024 wanted 34855 found 39533
Ignoring transid failure
print-tree.c:1094: btrfs_print_tree: Assertion failed.
btrfs-debug-tree[0x805ce93]
btrfs-debug-tree(btrfs_print_tree+0x26d)[0x805eb51]
btrfs-debug-tree(btrfs_print_tree+0x279)[0x805eb5d]
btrfs-debug-tree(main+0x8b5)[0x804dfb7]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb757c4d3]
btrfs-debug-tree[0x804e221]

Do you want the actual output?
(it's 1.1GB uncompressed)

Marc


> Thanks,
> Qu
> 
> Marc MERLIN wrote on 2015/08/12 10:19 -0700:
> >On Wed, Aug 12, 2015 at 12:18:45PM -0400, Josef Bacik wrote:
> >>Going to need more info to figure this one out
> >
> >Thanks for the patch, here's the output:
> >enabling repair mode
> >Checking filesystem on /dev/mapper/crypt_sdd1
> >UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
> >checking extents
> >wtf, parent 575708413952 <<<<<<
> >cmds-check.c:4488: add_data_backref: Assertion `back->bytes != max_size` failed.
> >/tmp/btrfs[0x8066a83]
> >/tmp/btrfs[0x8066ab4]
> >/tmp/btrfs[0x80679d8]
> >/tmp/btrfs[0x806b4f2]
> >/tmp/btrfs[0x806b9ea]
> >/tmp/btrfs[0x806c5f9]
> >/tmp/btrfs(cmd_check+0x1088)[0x806ee26]
> >/tmp/btrfs(main+0x153)[0x80557d6]
> >/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75a54d3]
> >/tmp/btrfs[0x80557fc]
> >
> >Marc
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Marc MERLIN Aug. 22, 2015, 2:37 p.m. UTC | #4
On Mon, Aug 17, 2015 at 07:49:04AM -0700, Marc MERLIN wrote:
> On Mon, Aug 17, 2015 at 10:01:16AM +0800, Qu Wenruo wrote:
> > Hi Marc,
>  
> Hi Qu, thanks for your answer and looking at this.
> 
> > Did btrfs-debug-tree also has the crash?
> > 
> > If not, would you please attach the output if it doesn't contain
> > classified data.
  
Do you need anything else before I wipe the filesystem and start over?

1) kernel is fixed not to crash
2) btrfs check --repair segfaults
3) btrfs-debug-tree ends with assert

Thanks,
Marc

> Sure thing:
> btrfs-debug-tree /dev/mapper/crypt_sdd1 > /tmp/tree.out
> parent transid verify failed on 2968115101696 wanted 34855 found 39533
> parent transid verify failed on 2968115101696 wanted 34855 found 39533
> parent transid verify failed on 2968115101696 wanted 34855 found 39533
> parent transid verify failed on 2968115101696 wanted 34855 found 39533
> Ignoring transid failure
> parent transid verify failed on 2968115134464 wanted 34855 found 39533
> parent transid verify failed on 2968115134464 wanted 34855 found 39533
> parent transid verify failed on 2968115134464 wanted 34855 found 39533
> parent transid verify failed on 2968115134464 wanted 34855 found 39533
> Ignoring transid failure
> parent transid verify failed on 2968115150848 wanted 34855 found 39533
> parent transid verify failed on 2968115150848 wanted 34855 found 39533
> parent transid verify failed on 2968115150848 wanted 34855 found 39533
> parent transid verify failed on 2968115150848 wanted 34855 found 39533
> Ignoring transid failure
> parent transid verify failed on 2968115691520 wanted 34855 found 39533
> parent transid verify failed on 2968115691520 wanted 34855 found 39533
> parent transid verify failed on 2968115691520 wanted 34855 found 39533
> parent transid verify failed on 2968115691520 wanted 34855 found 39533
> Ignoring transid failure
> parent transid verify failed on 1291597152256 wanted 35830 found 39530
> parent transid verify failed on 1291597152256 wanted 35830 found 39530
> parent transid verify failed on 1291597152256 wanted 35830 found 39530
> parent transid verify failed on 1291597152256 wanted 35830 found 39530
> Ignoring transid failure
> parent transid verify failed on 2968116592640 wanted 34855 found 39533
> parent transid verify failed on 2968116592640 wanted 34855 found 39533
> parent transid verify failed on 2968116592640 wanted 34855 found 39533
> parent transid verify failed on 2968116592640 wanted 34855 found 39533
> Ignoring transid failure
> parent transid verify failed on 2968116609024 wanted 34855 found 39533
> parent transid verify failed on 2968116609024 wanted 34855 found 39533
> parent transid verify failed on 2968116609024 wanted 34855 found 39533
> parent transid verify failed on 2968116609024 wanted 34855 found 39533
> Ignoring transid failure
> print-tree.c:1094: btrfs_print_tree: Assertion failed.
> btrfs-debug-tree[0x805ce93]
> btrfs-debug-tree(btrfs_print_tree+0x26d)[0x805eb51]
> btrfs-debug-tree(btrfs_print_tree+0x279)[0x805eb5d]
> btrfs-debug-tree(main+0x8b5)[0x804dfb7]
> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb757c4d3]
> btrfs-debug-tree[0x804e221]
> 
> Do you want the actual output?
> (it's 1.1GB uncompressed)
> 
> Marc
> 
> 
> > Thanks,
> > Qu
> > 
> > Marc MERLIN wrote on 2015/08/12 10:19 -0700:
> > >On Wed, Aug 12, 2015 at 12:18:45PM -0400, Josef Bacik wrote:
> > >>Going to need more info to figure this one out
> > >
> > >Thanks for the patch, here's the output:
> > >enabling repair mode
> > >Checking filesystem on /dev/mapper/crypt_sdd1
> > >UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
> > >checking extents
> > >wtf, parent 575708413952 <<<<<<
> > >cmds-check.c:4488: add_data_backref: Assertion `back->bytes != max_size` failed.
> > >/tmp/btrfs[0x8066a83]
> > >/tmp/btrfs[0x8066ab4]
> > >/tmp/btrfs[0x80679d8]
> > >/tmp/btrfs[0x806b4f2]
> > >/tmp/btrfs[0x806b9ea]
> > >/tmp/btrfs[0x806c5f9]
> > >/tmp/btrfs(cmd_check+0x1088)[0x806ee26]
> > >/tmp/btrfs(main+0x153)[0x80557d6]
> > >/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75a54d3]
> > >/tmp/btrfs[0x80557fc]
> > >
> > >Marc
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Qu Wenruo Aug. 24, 2015, 1:10 a.m. UTC | #5
Marc MERLIN wrote on 2015/08/22 07:37 -0700:
> On Mon, Aug 17, 2015 at 07:49:04AM -0700, Marc MERLIN wrote:
>> On Mon, Aug 17, 2015 at 10:01:16AM +0800, Qu Wenruo wrote:
>>> Hi Marc,
>>
>> Hi Qu, thanks for your answer and looking at this.
>>
>>> Did btrfs-debug-tree also has the crash?
>>>
>>> If not, would you please attach the output if it doesn't contain
>>> classified data.
>
> Do you need anything else before I wipe the filesystem and start over?
>
> 1) kernel is fixed not to crash
> 2) btrfs check --repair segfaults
> 3) btrfs-debug-tree ends with assert
>
> Thanks,
> Marc

Would you please take the following output?

1) btrfs check output
With error message if it happens.

2) btrfs check --repair output
Full output until segfault.

3) btrfs-debug-tree output
With assert output.

At least this should help us to figure out what's wrong with on-disk data.

Thanks,
Qu

>
>> Sure thing:
>> btrfs-debug-tree /dev/mapper/crypt_sdd1 > /tmp/tree.out
>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>> Ignoring transid failure
>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>> Ignoring transid failure
>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>> Ignoring transid failure
>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>> Ignoring transid failure
>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>> Ignoring transid failure
>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>> Ignoring transid failure
>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>> Ignoring transid failure
>> print-tree.c:1094: btrfs_print_tree: Assertion failed.
>> btrfs-debug-tree[0x805ce93]
>> btrfs-debug-tree(btrfs_print_tree+0x26d)[0x805eb51]
>> btrfs-debug-tree(btrfs_print_tree+0x279)[0x805eb5d]
>> btrfs-debug-tree(main+0x8b5)[0x804dfb7]
>> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb757c4d3]
>> btrfs-debug-tree[0x804e221]
>>
>> Do you want the actual output?
>> (it's 1.1GB uncompressed)
>>
>> Marc
>>
>>
>>> Thanks,
>>> Qu
>>>
>>> Marc MERLIN wrote on 2015/08/12 10:19 -0700:
>>>> On Wed, Aug 12, 2015 at 12:18:45PM -0400, Josef Bacik wrote:
>>>>> Going to need more info to figure this one out
>>>>
>>>> Thanks for the patch, here's the output:
>>>> enabling repair mode
>>>> Checking filesystem on /dev/mapper/crypt_sdd1
>>>> UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
>>>> checking extents
>>>> wtf, parent 575708413952 <<<<<<
>>>> cmds-check.c:4488: add_data_backref: Assertion `back->bytes != max_size` failed.
>>>> /tmp/btrfs[0x8066a83]
>>>> /tmp/btrfs[0x8066ab4]
>>>> /tmp/btrfs[0x80679d8]
>>>> /tmp/btrfs[0x806b4f2]
>>>> /tmp/btrfs[0x806b9ea]
>>>> /tmp/btrfs[0x806c5f9]
>>>> /tmp/btrfs(cmd_check+0x1088)[0x806ee26]
>>>> /tmp/btrfs(main+0x153)[0x80557d6]
>>>> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75a54d3]
>>>> /tmp/btrfs[0x80557fc]
>>>>
>>>> Marc
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> Microsoft is to operating systems ....
>>                                        .... what McDonalds is to gourmet cooking
>> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc MERLIN Aug. 24, 2015, 4:28 a.m. UTC | #6
On Mon, Aug 24, 2015 at 09:10:30AM +0800, Qu Wenruo wrote:
> Would you please take the following output?
> 
> 1) btrfs check output
> With error message if it happens.
 
myth:~# btrfs check /dev/mapper/crypt_sdd1
Checking filesystem on /dev/mapper/crypt_sdd1
UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
checking extents
cmds-check.c:4486: add_data_backref: Assertion `back->bytes != max_size` failed.
btrfs[0x8066a73]
btrfs[0x8066aa4]
btrfs[0x8067991]
btrfs[0x806b4ab]
btrfs[0x806b9a3]
btrfs[0x806c5b2]
btrfs(cmd_check+0x1088)[0x806eddf]
btrfs(main+0x153)[0x80557c6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb753a4d3]
btrfs[0x80557ec]

> 2) btrfs check --repair output
> Full output until segfault.
 
myth:~# btrfs check --repair /dev/mapper/crypt_sdd1
enabling repair mode
Checking filesystem on /dev/mapper/crypt_sdd1
UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
checking extents
cmds-check.c:4486: add_data_backref: Assertion `back->bytes != max_size` failed.
btrfs[0x8066a73]
btrfs[0x8066aa4]
btrfs[0x8067991]
btrfs[0x806b4ab]
btrfs[0x806b9a3]
btrfs[0x806c5b2]
btrfs(cmd_check+0x1088)[0x806eddf]
btrfs(main+0x153)[0x80557c6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75114d3]
btrfs[0x80557ec]

Strangely I'm not getting a segfault anymore.

> 3) btrfs-debug-tree output
> With assert output.
 
The full output is multi gigabyte. Do you need this and if so, do I need to
upload it somewhere and will you download the multi gigabyte file?

The errors and assert, I already posted here:

> >>Sure thing:
> >>btrfs-debug-tree /dev/mapper/crypt_sdd1 > /tmp/tree.out
> >>parent transid verify failed on 2968115101696 wanted 34855 found 39533
> >>parent transid verify failed on 2968115101696 wanted 34855 found 39533
> >>parent transid verify failed on 2968115101696 wanted 34855 found 39533
> >>parent transid verify failed on 2968115101696 wanted 34855 found 39533
> >>Ignoring transid failure
> >>parent transid verify failed on 2968115134464 wanted 34855 found 39533
> >>parent transid verify failed on 2968115134464 wanted 34855 found 39533
> >>parent transid verify failed on 2968115134464 wanted 34855 found 39533
> >>parent transid verify failed on 2968115134464 wanted 34855 found 39533
> >>Ignoring transid failure
> >>parent transid verify failed on 2968115150848 wanted 34855 found 39533
> >>parent transid verify failed on 2968115150848 wanted 34855 found 39533
> >>parent transid verify failed on 2968115150848 wanted 34855 found 39533
> >>parent transid verify failed on 2968115150848 wanted 34855 found 39533
> >>Ignoring transid failure
> >>parent transid verify failed on 2968115691520 wanted 34855 found 39533
> >>parent transid verify failed on 2968115691520 wanted 34855 found 39533
> >>parent transid verify failed on 2968115691520 wanted 34855 found 39533
> >>parent transid verify failed on 2968115691520 wanted 34855 found 39533
> >>Ignoring transid failure
> >>parent transid verify failed on 1291597152256 wanted 35830 found 39530
> >>parent transid verify failed on 1291597152256 wanted 35830 found 39530
> >>parent transid verify failed on 1291597152256 wanted 35830 found 39530
> >>parent transid verify failed on 1291597152256 wanted 35830 found 39530
> >>Ignoring transid failure
> >>parent transid verify failed on 2968116592640 wanted 34855 found 39533
> >>parent transid verify failed on 2968116592640 wanted 34855 found 39533
> >>parent transid verify failed on 2968116592640 wanted 34855 found 39533
> >>parent transid verify failed on 2968116592640 wanted 34855 found 39533
> >>Ignoring transid failure
> >>parent transid verify failed on 2968116609024 wanted 34855 found 39533
> >>parent transid verify failed on 2968116609024 wanted 34855 found 39533
> >>parent transid verify failed on 2968116609024 wanted 34855 found 39533
> >>parent transid verify failed on 2968116609024 wanted 34855 found 39533
> >>Ignoring transid failure
> >>print-tree.c:1094: btrfs_print_tree: Assertion failed.
> >>btrfs-debug-tree[0x805ce93]
> >>btrfs-debug-tree(btrfs_print_tree+0x26d)[0x805eb51]
> >>btrfs-debug-tree(btrfs_print_tree+0x279)[0x805eb5d]
> >>btrfs-debug-tree(main+0x8b5)[0x804dfb7]
> >>/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb757c4d3]
> >>btrfs-debug-tree[0x804e221]

Thanks,
Marc
Qu Wenruo Aug. 24, 2015, 5:11 a.m. UTC | #7
Marc MERLIN wrote on 2015/08/23 21:28 -0700:
> On Mon, Aug 24, 2015 at 09:10:30AM +0800, Qu Wenruo wrote:
>> Would you please take the following output?
>>
>> 1) btrfs check output
>> With error message if it happens.
>
> myth:~# btrfs check /dev/mapper/crypt_sdd1
> Checking filesystem on /dev/mapper/crypt_sdd1
> UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
> checking extents
> cmds-check.c:4486: add_data_backref: Assertion `back->bytes != max_size` failed.
> btrfs[0x8066a73]
> btrfs[0x8066aa4]
> btrfs[0x8067991]
> btrfs[0x806b4ab]
> btrfs[0x806b9a3]
> btrfs[0x806c5b2]
> btrfs(cmd_check+0x1088)[0x806eddf]
> btrfs(main+0x153)[0x80557c6]
> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb753a4d3]
> btrfs[0x80557ec]
>
>> 2) btrfs check --repair output
>> Full output until segfault.
>
> myth:~# btrfs check --repair /dev/mapper/crypt_sdd1
> enabling repair mode
> Checking filesystem on /dev/mapper/crypt_sdd1
> UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
> checking extents
> cmds-check.c:4486: add_data_backref: Assertion `back->bytes != max_size` failed.
> btrfs[0x8066a73]
> btrfs[0x8066aa4]
> btrfs[0x8067991]
> btrfs[0x806b4ab]
> btrfs[0x806b9a3]
> btrfs[0x806c5b2]
> btrfs(cmd_check+0x1088)[0x806eddf]
> btrfs(main+0x153)[0x80557c6]
> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75114d3]
> btrfs[0x80557ec]
>
> Strangely I'm not getting a segfault anymore.

It seems that the tree block's backref has something wrong.

>
>> 3) btrfs-debug-tree output
>> With assert output.
>
> The full output is multi gigabyte. Do you need this and if so, do I need to
> upload it somewhere and will you download the multi gigabyte file?
>
> The errors and assert, I already posted here:
>
>>>> Sure thing:
>>>> btrfs-debug-tree /dev/mapper/crypt_sdd1 > /tmp/tree.out
>>>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115101696 wanted 34855 found 39533
>>>> Ignoring transid failure
>>>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115134464 wanted 34855 found 39533
>>>> Ignoring transid failure
>>>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115150848 wanted 34855 found 39533
>>>> Ignoring transid failure
>>>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>>>> parent transid verify failed on 2968115691520 wanted 34855 found 39533
>>>> Ignoring transid failure
>>>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>>>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>>>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>>>> parent transid verify failed on 1291597152256 wanted 35830 found 39530
>>>> Ignoring transid failure
>>>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>>>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>>>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>>>> parent transid verify failed on 2968116592640 wanted 34855 found 39533
>>>> Ignoring transid failure
>>>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>>>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>>>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>>>> parent transid verify failed on 2968116609024 wanted 34855 found 39533
>>>> Ignoring transid failure
>>>> print-tree.c:1094: btrfs_print_tree: Assertion failed.
>>>> btrfs-debug-tree[0x805ce93]
>>>> btrfs-debug-tree(btrfs_print_tree+0x26d)[0x805eb51]
>>>> btrfs-debug-tree(btrfs_print_tree+0x279)[0x805eb5d]
>>>> btrfs-debug-tree(main+0x8b5)[0x804dfb7]
>>>> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb757c4d3]
>>>> btrfs-debug-tree[0x804e221]
Oh, sorry for ignoring the existing output.

And the last assert info should be enough. No need to upload it.

The b-tree seems to be hugely damaged, or at least one leaf tree block 
is referred by higher level node.
It maybe something wrong happened when level of a btree is reduced.

Normally, I have no idea on how to fix such huge problem in btrfsck.
But there is still some clue.

In your debug-tree output, the transid difference between wanted and 
found is quite huge. I suppose there would be a much much newer root 
tree, but not recorded in superblock.

So, my last bet will be, using "btrfs-find-root -a" to find the root 
with highest generation, and use the new root to exec "btrfsck -b 
<bytenr of highest gen root>".

The latest btrfs-find-root would output possible tree root by descending 
order of its generation. You'll find proper bytenr quite easy.
But be prepared as "btrfs-find-root -a" will iterate all metadata space, 
so it will takes a long long time to finish.
And until it scanned all the space, it won't output anything.

Thanks,
Qu
>
> Thanks,
> Marc
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc MERLIN Aug. 24, 2015, 2:10 p.m. UTC | #8
On Mon, Aug 24, 2015 at 01:11:26PM +0800, Qu Wenruo wrote:
> So, my last bet will be, using "btrfs-find-root -a" to find the root
> with highest generation, and use the new root to exec "btrfsck -b
> <bytenr of highest gen root>".
 
> The latest btrfs-find-root would output possible tree root by
> descending order of its generation. You'll find proper bytenr quite
> easy.
> But be prepared as "btrfs-find-root -a" will iterate all metadata
> space, so it will takes a long long time to finish.
> And until it scanned all the space, it won't output anything.

This is what I got:

myth:~# btrfs-find-root -a /dev/mapper/crypt_sdd1
Superblock thinks the generation is 39538
Superblock thinks the level is 1
Well block 4243456(gen: 3 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 4194304(gen: 2 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
myth:~# 

Does it mean there is no other block I can/should use?

Thanks,
Marc
Qu Wenruo Aug. 25, 2015, 12:26 a.m. UTC | #9
Marc MERLIN wrote on 2015/08/24 07:10 -0700:
> On Mon, Aug 24, 2015 at 01:11:26PM +0800, Qu Wenruo wrote:
>> So, my last bet will be, using "btrfs-find-root -a" to find the root
>> with highest generation, and use the new root to exec "btrfsck -b
>> <bytenr of highest gen root>".
>
>> The latest btrfs-find-root would output possible tree root by
>> descending order of its generation. You'll find proper bytenr quite
>> easy.
>> But be prepared as "btrfs-find-root -a" will iterate all metadata
>> space, so it will takes a long long time to finish.
>> And until it scanned all the space, it won't output anything.
>
> This is what I got:
>
> myth:~# btrfs-find-root -a /dev/mapper/crypt_sdd1
> Superblock thinks the generation is 39538
> Superblock thinks the level is 1
> Well block 4243456(gen: 3 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 4194304(gen: 2 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> myth:~#
>
> Does it mean there is no other block I can/should use?
>
> Thanks,
> Marc
>
Oh, it seems to be a new bug in btrfs-find-root.

I'll fix it first then.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Aug. 25, 2015, 2:51 a.m. UTC | #10
Marc MERLIN wrote on 2015/08/24 07:10 -0700:
> On Mon, Aug 24, 2015 at 01:11:26PM +0800, Qu Wenruo wrote:
>> So, my last bet will be, using "btrfs-find-root -a" to find the root
>> with highest generation, and use the new root to exec "btrfsck -b
>> <bytenr of highest gen root>".
>
>> The latest btrfs-find-root would output possible tree root by
>> descending order of its generation. You'll find proper bytenr quite
>> easy.
>> But be prepared as "btrfs-find-root -a" will iterate all metadata
>> space, so it will takes a long long time to finish.
>> And until it scanned all the space, it won't output anything.
>
> This is what I got:
>
> myth:~# btrfs-find-root -a /dev/mapper/crypt_sdd1
> Superblock thinks the generation is 39538
> Superblock thinks the level is 1
> Well block 4243456(gen: 3 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 4194304(gen: 2 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> myth:~#
>
> Does it mean there is no other block I can/should use?
>
> Thanks,
> Marc
>
Patches sent and CCed to you.

Please try the two patches and see what's new.
This time, I think the output will be much larger.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc MERLIN Aug. 25, 2015, 5:28 a.m. UTC | #11
On Tue, Aug 25, 2015 at 10:51:00AM +0800, Qu Wenruo wrote:
> Patches sent and CCed to you.
> 
> Please try the two patches and see what's new.
> This time, I think the output will be much larger.

Indeed.

However the bad news is that gen 39538 is the highest.
Should I force btrfsck to work with an older generation, or do we throw the towel and stop bothering
trying to rescue this FS longer (it's a backup FS, so I have no data I need to recover on it, I just curious
on how it managed to corrupt itself when all I did was a weekly backup to it via btrfs send/receive.

myth:~# sort  -rn -k +4  /var/spool/out  |head -20
Well block 29523968(gen: 39538 level: 1) seems good, and it matches superblock
Well block 29687808(gen: 39537 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 93749248(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 60669952(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 30474240(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 29540352(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 150880256(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 150568960(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 150552576(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 150519808(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 150503424(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 141410304(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 136347648(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7312195813376(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7312194404352(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7312086122496(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7312079536128(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7311940960256(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7311866003456(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
Well block 7311839477760(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1

Thanks,
Marc
Qu Wenruo Aug. 25, 2015, 6 a.m. UTC | #12
Marc MERLIN wrote on 2015/08/24 22:28 -0700:
> On Tue, Aug 25, 2015 at 10:51:00AM +0800, Qu Wenruo wrote:
>> Patches sent and CCed to you.
>>
>> Please try the two patches and see what's new.
>> This time, I think the output will be much larger.
>
> Indeed.
>
> However the bad news is that gen 39538 is the highest.
> Should I force btrfsck to work with an older generation, or do we throw the towel and stop bothering
> trying to rescue this FS longer (it's a backup FS, so I have no data I need to recover on it, I just curious
> on how it managed to corrupt itself when all I did was a weekly backup to it via btrfs send/receive.
>
> myth:~# sort  -rn -k +4  /var/spool/out  |head -20
> Well block 29523968(gen: 39538 level: 1) seems good, and it matches superblock
> Well block 29687808(gen: 39537 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 93749248(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 60669952(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 30474240(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 29540352(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 150880256(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 150568960(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 150552576(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 150519808(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 150503424(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 141410304(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 136347648(gen: 39536 level: 0) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7312195813376(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7312194404352(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7312086122496(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7312079536128(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7311940960256(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7311866003456(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
> Well block 7311839477760(gen: 39535 level: 1) seems good, but generation/level doesn't match, want gen: 39538 level: 1
>
> Thanks,
> Marc
>
Thanks for all your work and patient Marc,

Good to know there is backup.
But as there is no higher generation one, so I'd assume that's not a
normal transaction id failure case.

Personally, I'd like to try btrfsck with gen 39537(--tree-root 29687808),
but that's all my personal curiosity.
Although my curiosity is driving me from finding a clue how it's
damaged to try to recover it.

If you think it's OK, then just wipe it, nobody has the right to disturb 
your sleep.

At least we got some clue here.
Some parent nodes got corrupted with much higher and non-exists generation.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc MERLIN Aug. 25, 2015, 6:50 a.m. UTC | #13
On Tue, Aug 25, 2015 at 02:00:32PM +0800, Qu Wenruo wrote:
> Thanks for all your work and patient Marc,
 
Haha, no problem, you're doing a lot more work than I am :)

> Good to know there is backup.
> But as there is no higher generation one, so I'd assume that's not a
> normal transaction id failure case.

Right.

> Personally, I'd like to try btrfsck with gen 39537(--tree-root 29687808),
> but that's all my personal curiosity.
> Although my curiosity is driving me from finding a clue how it's
> damaged to try to recover it.

I gave that a shot, same thing:
myth:~# btrfs check --repair --tree-root 29687808 /dev/mapper/crypt_sdd1
enabling repair mode
parent transid verify failed on 29687808 wanted 39538 found 39537
parent transid verify failed on 29687808 wanted 39538 found 39537
parent transid verify failed on 29687808 wanted 39538 found 39537
parent transid verify failed on 29687808 wanted 39538 found 39537
Ignoring transid failure
Checking filesystem on /dev/mapper/crypt_sdd1
UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
checking extents
wtf, parent 575708413952
cmds-check.c:4488: add_data_backref: Assertion `back->bytes != max_size` failed.
btrfs[0x8066a83]
btrfs[0x8066ab4]
btrfs[0x80679d8]
btrfs[0x806b4f2]
btrfs[0x806b9ea]
btrfs[0x806c5f9]
btrfs(cmd_check+0x1088)[0x806ee26]
btrfs(main+0x153)[0x80557d6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75154d3]
btrfs[0x80557fc]

> If you think it's OK, then just wipe it, nobody has the right to
> disturb your sleep.
 
That's not a problem :) 
I signed up for bugs when using btrfs, and am happy to help with reports
and getting the tools improved with your help and others' where
possible.

> At least we got some clue here.
> Some parent nodes got corrupted with much higher and non-exists generation.

Right. So should I try to go back in time until it works, but the
previous level doesn't work either:

enabling repair mode
parent transid verify failed on 7312195813376 wanted 39538 found 39535
parent transid verify failed on 7312195813376 wanted 39538 found 39535
parent transid verify failed on 7312195813376 wanted 39538 found 39535
parent transid verify failed on 7312195813376 wanted 39538 found 39535
Ignoring transid failure
parent transid verify failed on 29687808 wanted 39524 found 39537
parent transid verify failed on 29687808 wanted 39524 found 39537
parent transid verify failed on 29687808 wanted 39524 found 39537
parent transid verify failed on 29687808 wanted 39524 found 39537
Ignoring transid failure
Checking filesystem on /dev/mapper/crypt_sdd1
UUID: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
checking extents
cmds-check.c:3730: check_owner_ref: Assertion `rec->is_root` failed.
btrfs[0x8066a83]
btrfs[0x8066ab4]
btrfs[0x806acc4]
btrfs[0x806b9ea]
btrfs[0x806c5f9]
btrfs(cmd_check+0x1088)[0x806ee26]
btrfs(main+0x153)[0x80557d6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75104d3]
btrfs[0x80557fc]

Marc
diff mbox

Patch

diff --git a/cmds-check.c b/cmds-check.c
index dd2fce3..8f668d7 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4524,6 +4524,8 @@  static int add_data_backref(struct cache_tree 
*extent_cache, u64 bytenr,
  	if (found_ref) {
  		BUG_ON(num_refs != 1);
  		if (back->node.found_ref)
+			if (back->bytes != max_size)
+				fprintf(stderr, "wtf, parent %llu\n", (unsigned long long)parent);
  			BUG_ON(back->bytes != max_size);
  		back->node.found_ref = 1;