diff mbox

Btrfs-progs: allow --init-extent-tree to work when extent tree is borked

Message ID

1382724100-5276-1-git-send-email-jbacik@fusionio.com (mailing list archive)

State

Accepted, archived

Headers

show

From: Josef Bacik <jbacik@fusionio.com>
To: <linux-btrfs@vger.kernel.org>
Subject: [PATCH] Btrfs-progs: allow --init-extent-tree to work when extent
	tree is borked
Date: Fri, 25 Oct 2013 14:01:40 -0400
Message-ID: <1382724100-5276-1-git-send-email-jbacik@fusionio.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Josef Bacik Oct. 25, 2013, 6:01 p.m. UTC

Unfortunately you can't run --init-extent-tree if you can't actually read the
extent root.  Fix this by allowing partial starts with no extent root and then
have fsck only check to see if the extent root is uptodate _after_ the check to
see if we are init'ing the extent tree.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
---
 cmds-check.c |  9 ++++++---
 disk-io.c    | 16 ++++++++++++++--
 2 files changed, 20 insertions(+), 5 deletions(-)

Comments

Martin Oct. 25, 2013, 6:27 p.m. UTC | #1

On 25/10/13 19:01, Josef Bacik wrote:
> Unfortunately you can't run --init-extent-tree if you can't actually read the
> extent root.  Fix this by allowing partial starts with no extent root and then
> have fsck only check to see if the extent root is uptodate _after_ the check to
> see if we are init'ing the extent tree.  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> ---
>  cmds-check.c |  9 ++++++---
>  disk-io.c    | 16 ++++++++++++++--
>  2 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/cmds-check.c b/cmds-check.c
> index 69b0327..8ed7baa 100644
> --- a/cmds-check.c
> +++ b/cmds-check.c

Hey! Quick work!...

Is that worth patching locally and trying against my example?

Thanks,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik Oct. 25, 2013, 6:31 p.m. UTC | #2

On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
> On 25/10/13 19:01, Josef Bacik wrote:
> > Unfortunately you can't run --init-extent-tree if you can't actually read the
> > extent root.  Fix this by allowing partial starts with no extent root and then
> > have fsck only check to see if the extent root is uptodate _after_ the check to
> > see if we are init'ing the extent tree.  Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> >  cmds-check.c |  9 ++++++---
> >  disk-io.c    | 16 ++++++++++++++--
> >  2 files changed, 20 insertions(+), 5 deletions(-)
> > 
> > diff --git a/cmds-check.c b/cmds-check.c
> > index 69b0327..8ed7baa 100644
> > --- a/cmds-check.c
> > +++ b/cmds-check.c
> 
> Hey! Quick work!...
> 
> Is that worth patching locally and trying against my example?
> 

Yes, I'm a little worried about your particular case so I'd like to see if it
works.  If you don't see a lot of output after say 5 minutes let's assume I
didn't fix your problem and let me know so I can make the other change I
considered.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Oct. 26, 2013, 11:16 p.m. UTC | #3

On 25/10/13 19:31, Josef Bacik wrote:
> On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
>> On 25/10/13 19:01, Josef Bacik wrote:
>>> Unfortunately you can't run --init-extent-tree if you can't actually read the
>>> extent root.  Fix this by allowing partial starts with no extent root and then
>>> have fsck only check to see if the extent root is uptodate _after_ the check to
>>> see if we are init'ing the extent tree.  Thanks,
>>>
>>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>> ---
>>>  cmds-check.c |  9 ++++++---
>>>  disk-io.c    | 16 ++++++++++++++--
>>>  2 files changed, 20 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/cmds-check.c b/cmds-check.c
>>> index 69b0327..8ed7baa 100644
>>> --- a/cmds-check.c
>>> +++ b/cmds-check.c
>>
>> Hey! Quick work!...
>>
>> Is that worth patching locally and trying against my example?
>>
> 
> Yes, I'm a little worried about your particular case so I'd like to see if it
> works.  If you don't see a lot of output after say 5 minutes let's assume I
> didn't fix your problem and let me know so I can make the other change I
> considered.  Thanks,

Nope... No-go.

parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure

...And nothing more. Looped.

# gdb /sbin/btrfsck 31887
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /sbin/btrfsck...Reading symbols from
/usr/lib64/debug/sbin/btrfsck.debug...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Attaching to program: /sbin/btrfsck, process 31887

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /lib64/libuuid.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/libblkid.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libblkid.so.1
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /usr/lib64/liblzo2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/liblzo2.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000000042b7a9 in read_extent_buffer ()
(gdb)
(gdb) bt
#0  0x000000000042b7a9 in read_extent_buffer ()
#1  0x000000000041ccfd in btrfs_check_node ()
#2  0x000000000041e0a2 in check_block ()
#3  0x000000000041e69e in btrfs_search_slot ()
#4  0x0000000000425a6e in find_first_block_group ()
#5  0x0000000000425b28 in btrfs_read_block_groups ()
#6  0x0000000000421c40 in btrfs_setup_all_roots ()
#7  0x0000000000421e3f in __open_ctree_fd ()
#8  0x0000000000421f19 in open_ctree_fs_info ()
#9  0x00000000004169b4 in cmd_check ()
#10 0x000000000040443b in main ()
(gdb)

# btrfs version
Btrfs v0.20-rc1-358-g194aa4a-dirty

>>> Emerging (1 of 1) sys-fs/btrfs-progs-9999
>>> Unpacking source...
GIT update -->
   repository:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
   at the commit:            194aa4a1bd6447bb545286d0bcb0b0be8204d79f
   branch:                   master
   storage directory:
"/usr/portage/distfiles/egit-src/btrfs-progs.git"
   checkout type:            bare repository
Cloning into
'/var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999'...
done.
Branch branch-master set up to track remote branch master from origin.
Switched to a new branch 'branch-master'
>>> Unpacked to
/var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999
>>> Source unpacked in /var/tmp/portage/sys-fs/btrfs-progs-9999/work
>>> Preparing source in
/var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999 ...
>>> Source prepared.
 * Applying user patches from
/etc/portage/patches//sys-fs/btrfs-progs-9999 ...
 *   jbpatch2013-10-25-extents-fix.patch ...

[ ok ]
 * Done with patching
>>> Configuring source in
/var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999 ...
>>> Source configured.
[...]

Note the compile warnings:

 * QA Notice: Package triggers severe warnings which indicate that it
 *            may exhibit random runtime failures.
 * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 * volumes.c:1905:5: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
 * volumes.c:1906:6: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]

 * QA Notice: Package triggers severe warnings which indicate that it
 *            may exhibit random runtime failures.
 * cmds-chunk.c:1343:8: warning: array subscript is above array bounds
[-Warray-bounds]

(The "jbpatch2013-10-25-extents-fix.patch" is the diff txt of your
earlier posting.)

Hope that helps.

Next?

Regards,
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik Oct. 27, 2013, 1:44 a.m. UTC | #4

Yup I have another plan for your situation, I will wire it up Monday and send it out.  Thanks,

Josef

> On Oct 26, 2013, at 7:16 PM, Martin <m_btrfs@ml1.co.uk> wrote:
> 
>> On 25/10/13 19:31, Josef Bacik wrote:
>>> On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
>>>> On 25/10/13 19:01, Josef Bacik wrote:
>>>> Unfortunately you can't run --init-extent-tree if you can't actually read the
>>>> extent root.  Fix this by allowing partial starts with no extent root and then
>>>> have fsck only check to see if the extent root is uptodate _after_ the check to
>>>> see if we are init'ing the extent tree.  Thanks,
>>>> 
>>>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>>> ---
>>>> cmds-check.c |  9 ++++++---
>>>> disk-io.c    | 16 ++++++++++++++--
>>>> 2 files changed, 20 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/cmds-check.c b/cmds-check.c
>>>> index 69b0327..8ed7baa 100644
>>>> --- a/cmds-check.c
>>>> +++ b/cmds-check.c
>>> 
>>> Hey! Quick work!...
>>> 
>>> Is that worth patching locally and trying against my example?
>> 
>> Yes, I'm a little worried about your particular case so I'd like to see if it
>> works.  If you don't see a lot of output after say 5 minutes let's assume I
>> didn't fix your problem and let me know so I can make the other change I
>> considered.  Thanks,
> 
> Nope... No-go.
> 
> parent transid verify failed on 911904604160 wanted 17448 found 17449
> parent transid verify failed on 911904604160 wanted 17448 found 17449
> parent transid verify failed on 911904604160 wanted 17448 found 17449
> parent transid verify failed on 911904604160 wanted 17448 found 17449
> Ignoring transid failure
> 
> ...And nothing more. Looped.
> 
> 
> # gdb /sbin/btrfsck 31887
> GNU gdb (Gentoo 7.5.1 p2) 7.5.1
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-pc-linux-gnu".
> For bug reporting instructions, please see:
> <http://bugs.gentoo.org/>...
> Reading symbols from /sbin/btrfsck...Reading symbols from
> /usr/lib64/debug/sbin/btrfsck.debug...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Attaching to program: /sbin/btrfsck, process 31887
> 
> warning: Could not load shared library symbols for linux-vdso.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> Reading symbols from /lib64/libuuid.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libuuid.so.1
> Reading symbols from /lib64/libblkid.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libblkid.so.1
> Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libz.so.1
> Reading symbols from /usr/lib64/liblzo2.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/liblzo2.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000000042b7a9 in read_extent_buffer ()
> (gdb)
> (gdb) bt
> #0  0x000000000042b7a9 in read_extent_buffer ()
> #1  0x000000000041ccfd in btrfs_check_node ()
> #2  0x000000000041e0a2 in check_block ()
> #3  0x000000000041e69e in btrfs_search_slot ()
> #4  0x0000000000425a6e in find_first_block_group ()
> #5  0x0000000000425b28 in btrfs_read_block_groups ()
> #6  0x0000000000421c40 in btrfs_setup_all_roots ()
> #7  0x0000000000421e3f in __open_ctree_fd ()
> #8  0x0000000000421f19 in open_ctree_fs_info ()
> #9  0x00000000004169b4 in cmd_check ()
> #10 0x000000000040443b in main ()
> (gdb)
> 
> 
> # btrfs version
> Btrfs v0.20-rc1-358-g194aa4a-dirty
> 
> 
>>>> Emerging (1 of 1) sys-fs/btrfs-progs-9999
>>>> Unpacking source...
> GIT update -->
>   repository:
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
>   at the commit:            194aa4a1bd6447bb545286d0bcb0b0be8204d79f
>   branch:                   master
>   storage directory:
> "/usr/portage/distfiles/egit-src/btrfs-progs.git"
>   checkout type:            bare repository
> Cloning into
> '/var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999'...
> done.
> Branch branch-master set up to track remote branch master from origin.
> Switched to a new branch 'branch-master'
>>>> Unpacked to
> /var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999
>>>> Source unpacked in /var/tmp/portage/sys-fs/btrfs-progs-9999/work
>>>> Preparing source in
> /var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999 ...
>>>> Source prepared.
> * Applying user patches from
> /etc/portage/patches//sys-fs/btrfs-progs-9999 ...
> *   jbpatch2013-10-25-extents-fix.patch ...
> 
> 
> [ ok ]
> * Done with patching
>>>> Configuring source in
> /var/tmp/portage/sys-fs/btrfs-progs-9999/work/btrfs-progs-9999 ...
>>>> Source configured.
> [...]
> 
> Note the compile warnings:
> 
> 
> * QA Notice: Package triggers severe warnings which indicate that it
> *            may exhibit random runtime failures.
> * disk-io.c:91:5: warning: dereferencing type-punned pointer will break
> strict-aliasing rules [-Wstrict-aliasing]
> * volumes.c:1905:5: warning: dereferencing type-punned pointer will
> break strict-aliasing rules [-Wstrict-aliasing]
> * volumes.c:1906:6: warning: dereferencing type-punned pointer will
> break strict-aliasing rules [-Wstrict-aliasing]
> 
> 
> * QA Notice: Package triggers severe warnings which indicate that it
> *            may exhibit random runtime failures.
> * cmds-chunk.c:1343:8: warning: array subscript is above array bounds
> [-Warray-bounds]
> 
> 
> 
> (The "jbpatch2013-10-25-extents-fix.patch" is the diff txt of your
> earlier posting.)
> 
> Hope that helps.
> 
> Next?
> 
> Regards,
> Martin
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik Oct. 28, 2013, 3:11 p.m. UTC | #5

On Sun, Oct 27, 2013 at 12:16:12AM +0100, Martin wrote:
> On 25/10/13 19:31, Josef Bacik wrote:
> > On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
> >> On 25/10/13 19:01, Josef Bacik wrote:
> >>> Unfortunately you can't run --init-extent-tree if you can't actually read the
> >>> extent root.  Fix this by allowing partial starts with no extent root and then
> >>> have fsck only check to see if the extent root is uptodate _after_ the check to
> >>> see if we are init'ing the extent tree.  Thanks,
> >>>
> >>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> >>> ---
> >>>  cmds-check.c |  9 ++++++---
> >>>  disk-io.c    | 16 ++++++++++++++--
> >>>  2 files changed, 20 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/cmds-check.c b/cmds-check.c
> >>> index 69b0327..8ed7baa 100644
> >>> --- a/cmds-check.c
> >>> +++ b/cmds-check.c
> >>
> >> Hey! Quick work!...
> >>
> >> Is that worth patching locally and trying against my example?
> >>
> > 
> > Yes, I'm a little worried about your particular case so I'd like to see if it
> > works.  If you don't see a lot of output after say 5 minutes let's assume I
> > didn't fix your problem and let me know so I can make the other change I
> > considered.  Thanks,
> 
> Nope... No-go.
> 

Ok I've sent

[PATCH] Btrfs-progs: rework open_ctree to take flags, add a new one

which should address your situation.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 7, 2013, 1:25 a.m. UTC | #6

On 28/10/13 15:11, Josef Bacik wrote:
> On Sun, Oct 27, 2013 at 12:16:12AM +0100, Martin wrote:
>> On 25/10/13 19:31, Josef Bacik wrote:
>>> On Fri, Oct 25, 2013 at 07:27:24PM +0100, Martin wrote:
>>>> On 25/10/13 19:01, Josef Bacik wrote:
>>>>> Unfortunately you can't run --init-extent-tree if you can't actually read the
>>>>> extent root.  Fix this by allowing partial starts with no extent root and then
>>>>> have fsck only check to see if the extent root is uptodate _after_ the check to
>>>>> see if we are init'ing the extent tree.  Thanks,
>>>>>
>>>>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>>>> ---
>>>>>  cmds-check.c |  9 ++++++---
>>>>>  disk-io.c    | 16 ++++++++++++++--
>>>>>  2 files changed, 20 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/cmds-check.c b/cmds-check.c
>>>>> index 69b0327..8ed7baa 100644
>>>>> --- a/cmds-check.c
>>>>> +++ b/cmds-check.c
>>>>
>>>> Hey! Quick work!...
>>>>
>>>> Is that worth patching locally and trying against my example?
>>>>
>>>
>>> Yes, I'm a little worried about your particular case so I'd like to see if it
>>> works.  If you don't see a lot of output after say 5 minutes let's assume I
>>> didn't fix your problem and let me know so I can make the other change I
>>> considered.  Thanks,
>>
>> Nope... No-go.
>>
> 
> Ok I've sent
> 
> [PATCH] Btrfs-progs: rework open_ctree to take flags, add a new one
> 
> which should address your situation.  Thanks,


Josef,

Tried your patch:

####
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

 13 files changed, 75 insertions(+), 113 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 26c7b5f..ae10eed 100644
####

And the patching fails due to mismatching code...

I have the Gentoo source for:

Btrfs v0.20-rc1-358-g194aa4a

(On Gentoo 3.11.5, will be on 3.11.6 later today.)


What are the magic incantations to download your version of source code
to try please? (Patched or unpatched?)


Many thanks,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 11, 2013, 10:52 p.m. UTC | #7

On 07/11/13 01:25, Martin wrote:
> On 28/10/13 15:11, Josef Bacik wrote:

>> Ok I've sent
>>
>> [PATCH] Btrfs-progs: rework open_ctree to take flags, add a new one
>>
>> which should address your situation.  Thanks,
> 
> 
> Josef,
> 
> Tried your patch:
> 
> ####
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> 
>  13 files changed, 75 insertions(+), 113 deletions(-)
> 
> diff --git a/btrfs-convert.c b/btrfs-convert.c
> index 26c7b5f..ae10eed 100644
> ####
> 
> And the patching fails due to mismatching code...
> 
> I have the Gentoo source for:
> 
> Btrfs v0.20-rc1-358-g194aa4a
> 
> (On Gentoo 3.11.5, will be on 3.11.6 later today.)
> 
> 
> What are the magic incantations to download your version of source code
> to try please? (Patched or unpatched?)

OK so Chris Mason and the Gentoo sys-fs/btrfs-progs-9999 came to the
rescue to give:

# btrfs version
Btrfs v0.20-rc1-591-gc652e4e

This time:

# btrfsck --repair --init-extent-tree /dev/sdc

quickly gave:

parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
parent transid verify failed on 911904604160 wanted 17448 found 17449
Ignoring transid failure
btrfs unable to find ref byte nr 910293991424 parent 0 root 1  owner 2
offset 0
btrfs unable to find ref byte nr 910293995520 parent 0 root 1  owner 1
offset 1
btrfs unable to find ref byte nr 910293999616 parent 0 root 1  owner 0
offset 1
leaf free space ret -297791851, leaf data size 3995, used 297795846
nritems 2
checking extents
btrfsck: extent_io.c:609: free_extent_buffer: Assertion `!(eb->refs <
0)' failed.
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 38a60270-f9c6-4ed4-8421-4bf1253ae0b3
Creating a new extent tree
Failed to find [910293991424, 168, 4096]
Failed to find [910293995520, 168, 4096]
Failed to find [910293999616, 168, 4096]

From that, I've tried running again:

# btrfsck --repair /dev/sdc

giving thus far:

parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
Ignoring transid failure

... And it is still running a couple of days later.

GDB shows:

(gdb) bt
#0  0x000000000042d576 in read_extent_buffer ()
#1  0x000000000041ee79 in btrfs_check_node ()
#2  0x0000000000420211 in check_block ()
#3  0x0000000000420813 in btrfs_search_slot ()
#4  0x0000000000427bb4 in btrfs_read_block_groups ()
#5  0x0000000000423e40 in btrfs_setup_all_roots ()
#6  0x000000000042406d in __open_ctree_fd ()
#7  0x0000000000424126 in open_ctree_fs_info ()
#8  0x000000000041812e in cmd_check ()
#9  0x0000000000404904 in main ()

So... Has it looped or is it busy? There is no activity on /dev/sdc.

Which comes to a request:

Can the options "-v" (for verbose) and "-s" (to continuously show
status) be added to btrfsck to give some indication of progress and what
is happening? The "-s" should report progress by whatever appropriate
real-time counts as done by such as "badblocks -s".

I'll leave running for a little while longer before trying a mount.

Hope of interest.

Thanks,
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 13, 2013, 12:08 p.m. UTC | #8

On 11/11/13 22:52, Martin wrote:
> On 07/11/13 01:25, Martin wrote:

> OK so Chris Mason and the Gentoo sys-fs/btrfs-progs-9999 came to the
> rescue to give:
> 
> 
> # btrfs version
> Btrfs v0.20-rc1-591-gc652e4e

> From that, I've tried running again:
> 
> # btrfsck --repair /dev/sdc
> 
> giving thus far:
> 
> parent transid verify failed on 911904604160 wanted 17448 found 17450
> parent transid verify failed on 911904604160 wanted 17448 found 17450
> parent transid verify failed on 911904604160 wanted 17448 found 17450
> parent transid verify failed on 911904604160 wanted 17448 found 17450
> Ignoring transid failure
> 
> 
> ... And it is still running a couple of days later.
> 
> GDB shows:
> 
> (gdb) bt
> #0  0x000000000042d576 in read_extent_buffer ()
> #1  0x000000000041ee79 in btrfs_check_node ()
> #2  0x0000000000420211 in check_block ()
> #3  0x0000000000420813 in btrfs_search_slot ()
> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
> #6  0x000000000042406d in __open_ctree_fd ()
> #7  0x0000000000424126 in open_ctree_fs_info ()
> #8  0x000000000041812e in cmd_check ()
> #9  0x0000000000404904 in main ()


Another two days and:

(gdb) bt
#0  0x000000000042373a in read_tree_block ()
#1  0x0000000000421538 in btrfs_search_slot ()
#2  0x0000000000427bb4 in btrfs_read_block_groups ()
#3  0x0000000000423e40 in btrfs_setup_all_roots ()
#4  0x000000000042406d in __open_ctree_fd ()
#5  0x0000000000424126 in open_ctree_fs_info ()
#6  0x000000000041812e in cmd_check ()
#7  0x0000000000404904 in main ()


> So... Has it looped or is it busy? There is no activity on /dev/sdc.

Same "btrfs_read_block_groups" but different stack above that: So
perhaps something useful is being done?...

No disk activity noticed.


> Which comes to a request:
> 
> Can the options "-v" (for verbose) and "-s" (to continuously show
> status) be added to btrfsck to give some indication of progress and what
> is happening? The "-s" should report progress by whatever appropriate
> real-time counts as done by such as "badblocks -s".


OK... So I'll leave running for a little while longer before trying a mount.

Some sort of progress indicator would be rather useful... Is this going
to run for a few hours more or might this need to run for weeks to
complete? Any clues to look for?

(All on a 2TByte single disk btrfs, 4k defaults)

Hope of interest.

Regards,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan Nov. 13, 2013, 1:46 p.m. UTC | #9

Martin posted on Wed, 13 Nov 2013 12:08:50 +0000 as excerpted:

>> Which comes to a request:
>> 
>> Can the options "-v" (for verbose) and "-s" (to continuously show
>> status) be added to btrfsck to give some indication of progress and
>> what is happening? The "-s" should report progress by whatever
>> appropriate real-time counts as done by such as "badblocks -s".
> 
> 
> OK... So I'll leave running for a little while longer before trying a
> mount.
> 
> Some sort of progress indicator would be rather useful... Is this going
> to run for a few hours more or might this need to run for weeks to
> complete? Any clues to look for?

Apropos to that request, I noticed when I updated btrfs-progs yesterday 
(reports Btrfs v0.20-rc1-591-gc652e4e), that there's a -v "verbose" flag 
for btrfs balance, now.  Here's a sample:

btrfs bal stat -v /home

Balance on '/home' is running
5 out of about 17 chunks balanced (6 considered),  71% left
Dumping filters: flags 0x7, state 0x1, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing

So with -v balance has progress indication at the chunk level now, and 
indicates which chunk classes, data/metadata/system are being balanced, 
as well.  Chunk level isn't terribly granular, but it's better than not 
having /any/ idea.

And btrfs scrub has a new -R "raw" flag.  With the new flag, and without:

btrfs scrub stat -R /home

scrub status for d3ae4dbd-874e-42ec-996b-47995fd0c45a
        scrub started at Wed Nov 13 06:16:57 2013, running for 13 seconds
        data_extents_scrubbed: 302779
        tree_extents_scrubbed: 202692
        data_bytes_scrubbed: 8803897344
        tree_bytes_scrubbed: 830226432
        read_errors: 0
        csum_errors: 0
        verify_errors: 0
        no_csum: 286620
        csum_discards: 481041
        super_errors: 0
        malloc_errors: 0
        uncorrectable_errors: 0
        unverified_errors: 0
        corrected_errors: 0
        last_physical: 11743133696

btrfs scrub stat /home

scrub status for d3ae4dbd-874e-42ec-996b-47995fd0c45a
        scrub started at Wed Nov 13 06:16:57 2013, running for 22 seconds
        total bytes scrubbed: 15.13GiB with 0 errors

So scrub status now indicates byte-level (well, I suppose incremented per 
file or per node or some such) level.

(FWIW I'm on SSD, thus the 15+ gigs in 22 seconds.  I guess it'd be far 
slower on spinning rust.  I keep multiple relatively small independent 
btrfs partitions too; 100 gigs would be a rather big partition for me, so 
combined with the speed of ssd, checks/balances/scrubs finish in a couple 
minutes or less, not the days people often report on-list for multi-
tebibyte spinning rust, and I can conveniently run them on-demand for 
posting examples as I'm doing here, without tying up the system for days 
as a result! =:^)

As for btrfsck, it's actually available under the btrfs primary command 
as btrfs check, now.  But there's no btrfs check status comparable to 
btrfs balance status and btrfs scrub status, and no -v flag or similar.  
No real status updates either, beyond the fsck "stage" (checking ...) 
it's at:

btrfs check /dev/sda10

Checking filesystem on /dev/sda10
UUID: 7cb58c4f-ebd0-4c13-831b-ad0a964c51c6
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 2446191674 bytes used err is 0
total csum bytes: 11257148
total tree bytes: 524185600
total fs tree bytes: 481792000
total extent tree bytes: 27873280
btree space waste bytes: 152482568
file data blocks allocated: 11530465280
 referenced 12528828416
Btrfs v0.20-rc1-591-gc652e4e


So btrfs check needs updated with a status command and -v flag or similar 
too, but there's visible progress as balance and scrub have better status 
updating now, and btrfsck is at least available (as check) under the main 
btrfs command now. =:^)

Martin Nov. 15, 2013, 5:18 p.m. UTC | #10

Another two days and a backtrace shows the hope of progress:

#0  0x000000000041de2f in btrfs_node_key ()
#1  0x000000000041ee79 in btrfs_check_node ()
#2  0x0000000000420211 in check_block ()
#3  0x0000000000420813 in btrfs_search_slot ()
#4  0x0000000000427bb4 in btrfs_read_block_groups ()
#5  0x0000000000423e40 in btrfs_setup_all_roots ()
#6  0x000000000042406d in __open_ctree_fd ()
#7  0x0000000000424126 in open_ctree_fs_info ()
#8  0x000000000041812e in cmd_check ()
#9  0x0000000000404904 in main ()

No other output, 100% CPU, using only a single core, and no apparent
disk activity.

There looks to be a repeating pattern of calls. Is this working though
the same test repeated per btrfs block? Are there any variables that can
be checked with gdb to see how far it has gone so as to guess how long
it might need to run?


Phew?

Hope of interest,

Regards,
Martin




On 13/11/13 12:08, Martin wrote:
> On 11/11/13 22:52, Martin wrote:
>> On 07/11/13 01:25, Martin wrote:
> 
>> OK so Chris Mason and the Gentoo sys-fs/btrfs-progs-9999 came to the
>> rescue to give:
>>
>>
>> # btrfs version
>> Btrfs v0.20-rc1-591-gc652e4e
> 
>> From that, I've tried running again:
>>
>> # btrfsck --repair /dev/sdc
>>
>> giving thus far:
>>
>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>> Ignoring transid failure
>>
>>
>> ... And it is still running a couple of days later.
>>
>> GDB shows:
>>
>> (gdb) bt
>> #0  0x000000000042d576 in read_extent_buffer ()
>> #1  0x000000000041ee79 in btrfs_check_node ()
>> #2  0x0000000000420211 in check_block ()
>> #3  0x0000000000420813 in btrfs_search_slot ()
>> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
>> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
>> #6  0x000000000042406d in __open_ctree_fd ()
>> #7  0x0000000000424126 in open_ctree_fs_info ()
>> #8  0x000000000041812e in cmd_check ()
>> #9  0x0000000000404904 in main ()
> 
> 
> Another two days and:
> 
> (gdb) bt
> #0  0x000000000042373a in read_tree_block ()
> #1  0x0000000000421538 in btrfs_search_slot ()
> #2  0x0000000000427bb4 in btrfs_read_block_groups ()
> #3  0x0000000000423e40 in btrfs_setup_all_roots ()
> #4  0x000000000042406d in __open_ctree_fd ()
> #5  0x0000000000424126 in open_ctree_fs_info ()
> #6  0x000000000041812e in cmd_check ()
> #7  0x0000000000404904 in main ()
> 
> 
>> So... Has it looped or is it busy? There is no activity on /dev/sdc.
> 
> Same "btrfs_read_block_groups" but different stack above that: So
> perhaps something useful is being done?...
> 
> No disk activity noticed.
> 
> 
>> Which comes to a request:
>>
>> Can the options "-v" (for verbose) and "-s" (to continuously show
>> status) be added to btrfsck to give some indication of progress and what
>> is happening? The "-s" should report progress by whatever appropriate
>> real-time counts as done by such as "badblocks -s".



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 19, 2013, 6:25 a.m. UTC | #11

On 07/11/13 01:25, Martin wrote:
[...]
> And the patching fails due to mismatching code...
> 
> I have the Gentoo source for:
> 
> Btrfs v0.20-rc1-358-g194aa4a
> 
> (On Gentoo 3.11.5, will be on 3.11.6 later today.)
> 
> 
> What are the magic incantations to download your version of source code
> to try please? (Patched or unpatched?)

As an FYI for anyone stumbling onto this thread:

See:

https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

to get to the code!


Cheers,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 19, 2013, 6:34 a.m. UTC | #12

Continuing:

gdb bt now gives:

#0  0x000000000042075a in btrfs_search_slot ()
#1  0x0000000000427bb4 in btrfs_read_block_groups ()
#2  0x0000000000423e40 in btrfs_setup_all_roots ()
#3  0x000000000042406d in __open_ctree_fd ()
#4  0x0000000000424126 in open_ctree_fs_info ()
#5  0x000000000041812e in cmd_check ()
#6  0x0000000000404904 in main ()

#0  0x00000000004208bc in btrfs_search_slot ()
#1  0x0000000000427bb4 in btrfs_read_block_groups ()
#2  0x0000000000423e40 in btrfs_setup_all_roots ()
#3  0x000000000042406d in __open_ctree_fd ()
#4  0x0000000000424126 in open_ctree_fs_info ()
#5  0x000000000041812e in cmd_check ()
#6  0x0000000000404904 in main ()

#0  0x00000000004208d0 in btrfs_search_slot ()
#1  0x0000000000427bb4 in btrfs_read_block_groups ()
#2  0x0000000000423e40 in btrfs_setup_all_roots ()
#3  0x000000000042406d in __open_ctree_fd ()
#4  0x0000000000424126 in open_ctree_fs_info ()
#5  0x000000000041812e in cmd_check ()
#6  0x0000000000404904 in main ()


Still no further output. btrfsck running at 100% on a single core and
with no apparent disk activity. All for a 2TB hdd.


Should it take this long?...

Regards,
Martin




On 15/11/13 17:18, Martin wrote:
> Another two days and a backtrace shows the hope of progress:
> 
> #0  0x000000000041de2f in btrfs_node_key ()
> #1  0x000000000041ee79 in btrfs_check_node ()
> #2  0x0000000000420211 in check_block ()
> #3  0x0000000000420813 in btrfs_search_slot ()
> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
> #6  0x000000000042406d in __open_ctree_fd ()
> #7  0x0000000000424126 in open_ctree_fs_info ()
> #8  0x000000000041812e in cmd_check ()
> #9  0x0000000000404904 in main ()
> 
> No other output, 100% CPU, using only a single core, and no apparent
> disk activity.
> 
> There looks to be a repeating pattern of calls. Is this working though
> the same test repeated per btrfs block? Are there any variables that can
> be checked with gdb to see how far it has gone so as to guess how long
> it might need to run?
> 
> 
> Phew?
> 
> Hope of interest,
> 
> Regards,
> Martin
> 
> 
> 
> 
> On 13/11/13 12:08, Martin wrote:
>> On 11/11/13 22:52, Martin wrote:
>>> On 07/11/13 01:25, Martin wrote:
>>
>>> OK so Chris Mason and the Gentoo sys-fs/btrfs-progs-9999 came to the
>>> rescue to give:
>>>
>>>
>>> # btrfs version
>>> Btrfs v0.20-rc1-591-gc652e4e
>>
>>> From that, I've tried running again:
>>>
>>> # btrfsck --repair /dev/sdc
>>>
>>> giving thus far:
>>>
>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>> Ignoring transid failure
>>>
>>>
>>> ... And it is still running a couple of days later.
>>>
>>> GDB shows:
>>>
>>> (gdb) bt
>>> #0  0x000000000042d576 in read_extent_buffer ()
>>> #1  0x000000000041ee79 in btrfs_check_node ()
>>> #2  0x0000000000420211 in check_block ()
>>> #3  0x0000000000420813 in btrfs_search_slot ()
>>> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
>>> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
>>> #6  0x000000000042406d in __open_ctree_fd ()
>>> #7  0x0000000000424126 in open_ctree_fs_info ()
>>> #8  0x000000000041812e in cmd_check ()
>>> #9  0x0000000000404904 in main ()
>>
>>
>> Another two days and:
>>
>> (gdb) bt
>> #0  0x000000000042373a in read_tree_block ()
>> #1  0x0000000000421538 in btrfs_search_slot ()
>> #2  0x0000000000427bb4 in btrfs_read_block_groups ()
>> #3  0x0000000000423e40 in btrfs_setup_all_roots ()
>> #4  0x000000000042406d in __open_ctree_fd ()
>> #5  0x0000000000424126 in open_ctree_fs_info ()
>> #6  0x000000000041812e in cmd_check ()
>> #7  0x0000000000404904 in main ()
>>
>>
>>> So... Has it looped or is it busy? There is no activity on /dev/sdc.
>>
>> Same "btrfs_read_block_groups" but different stack above that: So
>> perhaps something useful is being done?...
>>
>> No disk activity noticed.
>>
>>
>>> Which comes to a request:
>>>
>>> Can the options "-v" (for verbose) and "-s" (to continuously show
>>> status) be added to btrfsck to give some indication of progress and what
>>> is happening? The "-s" should report progress by whatever appropriate
>>> real-time counts as done by such as "badblocks -s".



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 20, 2013, 6:51 a.m. UTC | #13

It's now gone back to a pattern from a full week ago:

(gdb) bt
#0  0x000000000042d576 in read_extent_buffer ()
#1  0x000000000041ee79 in btrfs_check_node ()
#2  0x0000000000420211 in check_block ()
#3  0x0000000000420813 in btrfs_search_slot ()
#4  0x0000000000427bb4 in btrfs_read_block_groups ()
#5  0x0000000000423e40 in btrfs_setup_all_roots ()
#6  0x000000000042406d in __open_ctree_fd ()
#7  0x0000000000424126 in open_ctree_fs_info ()
#8  0x000000000041812e in cmd_check ()
#9  0x0000000000404904 in main ()


I don't know if that has gone through that pattern during the week but
at a-week-a-time, this is not going to finish in reasonable time.

How come so very slow?

Any hints/tips/fixes or abandon the test?


Regards,
Martin




On 19/11/13 06:34, Martin wrote:
> Continuing:
> 
> gdb bt now gives:
> 
> #0  0x000000000042075a in btrfs_search_slot ()
> #1  0x0000000000427bb4 in btrfs_read_block_groups ()
> #2  0x0000000000423e40 in btrfs_setup_all_roots ()
> #3  0x000000000042406d in __open_ctree_fd ()
> #4  0x0000000000424126 in open_ctree_fs_info ()
> #5  0x000000000041812e in cmd_check ()
> #6  0x0000000000404904 in main ()
> 
> #0  0x00000000004208bc in btrfs_search_slot ()
> #1  0x0000000000427bb4 in btrfs_read_block_groups ()
> #2  0x0000000000423e40 in btrfs_setup_all_roots ()
> #3  0x000000000042406d in __open_ctree_fd ()
> #4  0x0000000000424126 in open_ctree_fs_info ()
> #5  0x000000000041812e in cmd_check ()
> #6  0x0000000000404904 in main ()
> 
> #0  0x00000000004208d0 in btrfs_search_slot ()
> #1  0x0000000000427bb4 in btrfs_read_block_groups ()
> #2  0x0000000000423e40 in btrfs_setup_all_roots ()
> #3  0x000000000042406d in __open_ctree_fd ()
> #4  0x0000000000424126 in open_ctree_fs_info ()
> #5  0x000000000041812e in cmd_check ()
> #6  0x0000000000404904 in main ()
> 
> 
> Still no further output. btrfsck running at 100% on a single core and
> with no apparent disk activity. All for a 2TB hdd.
> 
> 
> Should it take this long?...
> 
> Regards,
> Martin
> 
> 
> 
> 
> On 15/11/13 17:18, Martin wrote:
>> Another two days and a backtrace shows the hope of progress:
>>
>> #0  0x000000000041de2f in btrfs_node_key ()
>> #1  0x000000000041ee79 in btrfs_check_node ()
>> #2  0x0000000000420211 in check_block ()
>> #3  0x0000000000420813 in btrfs_search_slot ()
>> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
>> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
>> #6  0x000000000042406d in __open_ctree_fd ()
>> #7  0x0000000000424126 in open_ctree_fs_info ()
>> #8  0x000000000041812e in cmd_check ()
>> #9  0x0000000000404904 in main ()
>>
>> No other output, 100% CPU, using only a single core, and no apparent
>> disk activity.
>>
>> There looks to be a repeating pattern of calls. Is this working though
>> the same test repeated per btrfs block? Are there any variables that can
>> be checked with gdb to see how far it has gone so as to guess how long
>> it might need to run?
>>
>>
>> Phew?
>>
>> Hope of interest,
>>
>> Regards,
>> Martin
>>
>>
>>
>>
>> On 13/11/13 12:08, Martin wrote:
>>> On 11/11/13 22:52, Martin wrote:
>>>> On 07/11/13 01:25, Martin wrote:
>>>
>>>> OK so Chris Mason and the Gentoo sys-fs/btrfs-progs-9999 came to the
>>>> rescue to give:
>>>>
>>>>
>>>> # btrfs version
>>>> Btrfs v0.20-rc1-591-gc652e4e
>>>
>>>> From that, I've tried running again:
>>>>
>>>> # btrfsck --repair /dev/sdc
>>>>
>>>> giving thus far:
>>>>
>>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>>> parent transid verify failed on 911904604160 wanted 17448 found 17450
>>>> Ignoring transid failure
>>>>
>>>>
>>>> ... And it is still running a couple of days later.
>>>>
>>>> GDB shows:
>>>>
>>>> (gdb) bt
>>>> #0  0x000000000042d576 in read_extent_buffer ()
>>>> #1  0x000000000041ee79 in btrfs_check_node ()
>>>> #2  0x0000000000420211 in check_block ()
>>>> #3  0x0000000000420813 in btrfs_search_slot ()
>>>> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
>>>> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
>>>> #6  0x000000000042406d in __open_ctree_fd ()
>>>> #7  0x0000000000424126 in open_ctree_fs_info ()
>>>> #8  0x000000000041812e in cmd_check ()
>>>> #9  0x0000000000404904 in main ()
>>>
>>>
>>> Another two days and:
>>>
>>> (gdb) bt
>>> #0  0x000000000042373a in read_tree_block ()
>>> #1  0x0000000000421538 in btrfs_search_slot ()
>>> #2  0x0000000000427bb4 in btrfs_read_block_groups ()
>>> #3  0x0000000000423e40 in btrfs_setup_all_roots ()
>>> #4  0x000000000042406d in __open_ctree_fd ()
>>> #5  0x0000000000424126 in open_ctree_fs_info ()
>>> #6  0x000000000041812e in cmd_check ()
>>> #7  0x0000000000404904 in main ()
>>>
>>>
>>>> So... Has it looped or is it busy? There is no activity on /dev/sdc.
>>>
>>> Same "btrfs_read_block_groups" but different stack above that: So
>>> perhaps something useful is being done?...
>>>
>>> No disk activity noticed.
>>>
>>>
>>>> Which comes to a request:
>>>>
>>>> Can the options "-v" (for verbose) and "-s" (to continuously show
>>>> status) be added to btrfsck to give some indication of progress and what
>>>> is happening? The "-s" should report progress by whatever appropriate
>>>> real-time counts as done by such as "badblocks -s".
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan Nov. 20, 2013, 5:08 p.m. UTC | #14

Martin posted on Wed, 20 Nov 2013 06:51:20 +0000 as excerpted:

> It's now gone back to a pattern from a full week ago:
> 
> (gdb) bt #0  0x000000000042d576 in read_extent_buffer ()
> #1  0x000000000041ee79 in btrfs_check_node ()
> #2  0x0000000000420211 in check_block ()
> #3  0x0000000000420813 in btrfs_search_slot ()
> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
> #6  0x000000000042406d in __open_ctree_fd ()
> #7  0x0000000000424126 in open_ctree_fs_info ()
> #8  0x000000000041812e in cmd_check ()
> #9  0x0000000000404904 in main ()
> 
> 
> I don't know if that has gone through that pattern during the week but
> at a-week-a-time, this is not going to finish in reasonable time.
> 
> How come so very slow?
> 
> Any hints/tips/fixes or abandon the test?

You're a patient man. =:^)

While I don't have any personal knowledge of btrfs on spinning rust 
times, nor of TiB sized times even on SSD, there's a comment in the wiki 
FAQ about balance times saying 20 hours is "normal" for 1 TB.

( https://btrfs.wiki.kernel.org/index.php/FAQ , search on "hours". )

OK, so we round that to a day a TB, double for your two TB, and double 
again in case your drive is much slower than the "normal" drive the 
comment might have been considering and because that's for a balance but 
you're doing a btrfsck --repair, which for all we know takes longer.

That's still "only" four days, and you've been going well over a week.  
At this point I think it's reasonably safe to conclude it's in some sort 
of loop and likely will never finish.

Which leads to the question of what to do next.  Obviously, there have 
been a number of update patches since then, some of which might address 
your problem.  You could update your kernel and userspace and try 
again... /if/ you have the patience... or just consider it lost, get 
anything you can off of it if you need to (I've forgotten what you tried 
in terms of that previously, or how desperate you are for that data, but 
if you've been waiting well over a week, I'd guess you have some reason 
to try to save it), do a mkfs (possibly with a wipe first) and try 
again.  You're a patient man, definitely, but at a week a shot, there 
comes a time when it's simply time to declare a loss and move on.

Martin Nov. 20, 2013, 8 p.m. UTC | #15

On 20/11/13 17:08, Duncan wrote:
> Martin posted on Wed, 20 Nov 2013 06:51:20 +0000 as excerpted:
> 
>> It's now gone back to a pattern from a full week ago:
>>
>> (gdb) bt #0  0x000000000042d576 in read_extent_buffer ()
>> #1  0x000000000041ee79 in btrfs_check_node ()
>> #2  0x0000000000420211 in check_block ()
>> #3  0x0000000000420813 in btrfs_search_slot ()
>> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
>> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
>> #6  0x000000000042406d in __open_ctree_fd ()
>> #7  0x0000000000424126 in open_ctree_fs_info ()
>> #8  0x000000000041812e in cmd_check ()
>> #9  0x0000000000404904 in main ()
>>
>>
>> I don't know if that has gone through that pattern during the week but
>> at a-week-a-time, this is not going to finish in reasonable time.
>>
>> How come so very slow?
>>
>> Any hints/tips/fixes or abandon the test?
> 
> You're a patient man. =:^)

Sort of... I can leave it running in the background until I come to need
to do something else with that machine. So... A bit of an experiment.



> ( https://btrfs.wiki.kernel.org/index.php/FAQ , search on "hours". )
> 
> OK, so we round that to a day a TB, double for your two TB, and double 
> again in case your drive is much slower than the "normal" drive the 
> comment might have been considering and because that's for a balance but 
> you're doing a btrfsck --repair, which for all we know takes longer.
> 
> That's still "only" four days, and you've been going well over a week.  
> At this point I think it's reasonably safe to conclude it's in some sort 
> of loop and likely will never finish.


> ... but at a week a shot, there 
> comes a time when it's simply time to declare a loss and move on.

Exactly so...

No idea what btrfsck is so very slowly checking through or if it has
indeed looped. Which is where progress output would be useful.

However, btrfsck is rather too slow to be practical at the moment.

Further development?... Any useful debug to be had from this case before
I move on?


Regards,
Martin


Still at:

parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
Ignoring transid failure

...which is all the output thus far.


And:

(gdb) bt
#0  0x000000000042d574 in read_extent_buffer ()
#1  0x000000000041ee79 in btrfs_check_node ()
#2  0x0000000000420211 in check_block ()
#3  0x0000000000420813 in btrfs_search_slot ()
#4  0x0000000000427bb4 in btrfs_read_block_groups ()
#5  0x0000000000423e40 in btrfs_setup_all_roots ()
#6  0x000000000042406d in __open_ctree_fd ()
#7  0x0000000000424126 in open_ctree_fs_info ()
#8  0x000000000041812e in cmd_check ()
#9  0x0000000000404904 in main ()





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 20, 2013, 8:03 p.m. UTC | #16

On 20/11/13 17:08, Duncan wrote:

> Which leads to the question of what to do next.  Obviously, there have 
> been a number of update patches since then, some of which might address 
> your problem.  You could update your kernel and userspace and try 
> again... /if/ you have the patience...


This is on kernel 3.11.5 and Btrfs v0.20-rc1-591-gc652e4e.

Can easily upgrade to the latest kernel at the expense of killing the
existing btrfsck run.

Regards,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 25, 2013, 11:18 p.m. UTC | #17

On 20/11/13 20:00, Martin wrote:
> On 20/11/13 17:08, Duncan wrote:
>> Martin posted on Wed, 20 Nov 2013 06:51:20 +0000 as excerpted:
>>
>>> It's now gone back to a pattern from a full week ago:
>>>
>>> (gdb) bt #0  0x000000000042d576 in read_extent_buffer ()
>>> #1  0x000000000041ee79 in btrfs_check_node ()
>>> #2  0x0000000000420211 in check_block ()
>>> #3  0x0000000000420813 in btrfs_search_slot ()
>>> #4  0x0000000000427bb4 in btrfs_read_block_groups ()
>>> #5  0x0000000000423e40 in btrfs_setup_all_roots ()
>>> #6  0x000000000042406d in __open_ctree_fd ()
>>> #7  0x0000000000424126 in open_ctree_fs_info ()
>>> #8  0x000000000041812e in cmd_check ()
>>> #9  0x0000000000404904 in main ()
>>>
>>>
>>> I don't know if that has gone through that pattern during the week but
>>> at a-week-a-time, this is not going to finish in reasonable time.
>>>
>>> How come so very slow?
>>>
>>> Any hints/tips/fixes or abandon the test?
>>
>> You're a patient man. =:^)
> 
> Sort of... I can leave it running in the background until I come to need
> to do something else with that machine. So... A bit of an experiment.


Until... No more... And just as the gdb bt shows something a little
different!

(gdb) bt
#0  0x000000000041ddc4 in btrfs_comp_keys ()
#1  0x00000000004208e9 in btrfs_search_slot ()
#2  0x0000000000427bb4 in btrfs_read_block_groups ()
#3  0x0000000000423e40 in btrfs_setup_all_roots ()
#4  0x000000000042406d in __open_ctree_fd ()
#5  0x0000000000424126 in open_ctree_fs_info ()
#6  0x000000000041812e in cmd_check ()
#7  0x0000000000404904 in main ()


Nearly done or weeks yet more to run?

The poor thing gets killed in the morning for new work.


Regards,
Martin




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Nov. 27, 2013, 6:29 a.m. UTC | #18

>>>> I don't know if that has gone through that pattern during the week but
>>>> at a-week-a-time, this is not going to finish in reasonable time.
>>>>
>>>> How come so very slow?
>>>>
>>>> Any hints/tips/fixes or abandon the test?
>>>
>>> You're a patient man. =:^)
>>
>> Sort of... I can leave it running in the background until I come to need
>> to do something else with that machine. So... A bit of an experiment.
> 
> 
> Until... No more... And just as the gdb bt shows something a little
> different!
> 
> (gdb) bt
> #0  0x000000000041ddc4 in btrfs_comp_keys ()
> #1  0x00000000004208e9 in btrfs_search_slot ()
> #2  0x0000000000427bb4 in btrfs_read_block_groups ()
> #3  0x0000000000423e40 in btrfs_setup_all_roots ()
> #4  0x000000000042406d in __open_ctree_fd ()
> #5  0x0000000000424126 in open_ctree_fs_info ()
> #6  0x000000000041812e in cmd_check ()
> #7  0x0000000000404904 in main ()
> 
> 
> Nearly done or weeks yet more to run?
> 
> The poor thing gets killed in the morning for new work.

OK, so that all came to naught and it got killed for a kernel update and
new work.

"Just for a giggle," I tried mounting that disk with the 'recovery'
option and it failed with the usual complaint:

btrfs: disabling disk space caching
btrfs: enabling auto recovery
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
btrfs: open_ctree failed


Trying a wild guess of "btrfs-zero-log /dev/sdc" gives:

parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
parent transid verify failed on 911904604160 wanted 17448 found 17450
Ignoring transid failure

... and it is sat there at 100% CPU usage, no further output, and no
apparent disk activity... Just like btrfsck was...


So... Looks like time finally for a reformat.



Any chance of outputting some indication of progress, and for a speedup,
or options for partial recovery or?... Or for a fast 'slash-and-burn'
recovery where damaged trees get cleanly amputated rather than
too-painfully-slowly repaired?...

Just a few wild ideas ;-)


Regards,
Martin




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff mbox

Patch

diff --git a/cmds-check.c b/cmds-check.c
index 69b0327..8ed7baa 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5932,7 +5932,7 @@  static int reinit_extent_tree(struct btrfs_fs_info *fs_info)
 	}
 
 	/* Ok we can allocate now, reinit the extent root */
-	ret = btrfs_fsck_reinit_root(trans, fs_info->extent_root, 1);
+	ret = btrfs_fsck_reinit_root(trans, fs_info->extent_root, 0);
 	if (ret) {
 		fprintf(stderr, "extent root initialization failed\n");
 		/*
@@ -6118,20 +6118,23 @@  int cmd_check(int argc, char **argv)
 
 	if (!extent_buffer_uptodate(info->tree_root->node) ||
 	    !extent_buffer_uptodate(info->dev_root->node) ||
-	    !extent_buffer_uptodate(info->extent_root->node) ||
 	    !extent_buffer_uptodate(info->chunk_root->node)) {
 		fprintf(stderr, "Critical roots corrupted, unable to fsck the FS\n");
 		return -EIO;
 	}
 
 	root = info->fs_root;
-
 	if (init_extent_tree) {
 		printf("Creating a new extent tree\n");
 		ret = reinit_extent_tree(info);
 		if (ret)
 			return ret;
 	}
+	if (!extent_buffer_uptodate(info->extent_root->node)) {
+		fprintf(stderr, "Critical roots corrupted, unable to fsck the FS\n");
+		return -EIO;
+	}
+
 	fprintf(stderr, "checking extents\n");
 	if (init_csum_tree) {
 		struct btrfs_trans_handle *trans;
diff --git a/disk-io.c b/disk-io.c
index 733714d..c5ee33a 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -877,7 +877,17 @@  int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info,
 				  fs_info->extent_root);
 	if (ret) {
 		printk("Couldn't setup extent tree\n");
-		return -EIO;
+		if (!partial)
+			return -EIO;
+		/* Need a blank node here just so we don't screw up in the
+		 * million of places that assume a root has a valid ->node
+		 */
+		fs_info->extent_root->node =
+			btrfs_find_create_tree_block(fs_info->extent_root, 0,
+						     leafsize);
+		if (!fs_info->extent_root->node)
+			return -ENOMEM;
+		clear_extent_buffer_uptodate(NULL, fs_info->extent_root->node);
 	}
 	fs_info->extent_root->track_dirty = 1;
 
@@ -906,7 +916,9 @@  int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info,
 
 	fs_info->generation = generation;
 	fs_info->last_trans_committed = generation;
-	btrfs_read_block_groups(fs_info->tree_root);
+
+	if (extent_buffer_uptodate(fs_info->extent_root->node))
+		btrfs_read_block_groups(fs_info->tree_root);
 
 	key.objectid = BTRFS_FS_TREE_OBJECTID;
 	key.type = BTRFS_ROOT_ITEM_KEY;