btrfs recovery
diff mbox

Message ID 0ab48f84-7e37-02aa-1de9-612fac3f02da@mendix.com
State New
Headers show

Commit Message

Hans van Kranenburg Jan. 29, 2017, 4:44 p.m. UTC
On 01/29/2017 03:02 AM, Oliver Freyermuth wrote:
> Am 28.01.2017 um 23:27 schrieb Hans van Kranenburg:
>> On 01/28/2017 10:04 PM, Oliver Freyermuth wrote:
>>> Am 26.01.2017 um 12:01 schrieb Oliver Freyermuth:
>>>> Am 26.01.2017 um 11:00 schrieb Hugo Mills:
>>>>>    We can probably talk you through fixing this by hand with a decent
>>>>> hex editor. I've done it before...
>>>>>
>>>> That would be nice! Is it fine via the mailing list? 
>>>> Potentially, the instructions could be helpful for future reference, and "real" IRC is not accessible from my current location. 
>>>>
>>>> Do you have suggestions for a decent hexeditor for this job? Until now, I have been mainly using emacs, 
>>>> classic hexedit (http://rigaux.org/hexedit.html), or okteta (beware, it's graphical!), but of course these were made for a few MiB of files and are not so well suited for a block device. 
>>>>
>>>> The first thing to do would then probably just be to jump to the offset where 0xd89500014da12000 is written (can I get that via inspect-internal, or do I have to search for it?), fix that to read 
>>>> 0x00a800014da12000
>>>> (if I understood correctly) and then probably adapt a checksum? 
>>>>
>>> My external backup via btrfs-restore is now done successfully, so I am ready for anything you throw at me. 
>>> Since I was able to pull all data, though, it would mainly be something educational (for me, and likely other list readers). 
>>> If you think that this manual procedure is not worth it, I can also just scratch and recreate the FS. 
>>
>> OK, let's do it. I also want to practice a bit with stuff like this, so
>> this is a nice example.
>>
>> See if you can dump the chunk tree (tree 3) with btrfs inspect-internal
>> dump-tree -t 3 /dev/xxx
>>
> Yes, I can! :-)
> 
>> You should get a list of objects like this one:
>>
>> item 88 key (FIRST_CHUNK_TREE CHUNK_ITEM 1200384638976) itemoff 9067
>> itemsize 80
>>   chunk length 1073741824 owner 2 stripe_len 65536
>>   type DATA num_stripes 1
>>     stripe 0 devid 1 offset 729108447232
>>     dev uuid: edae9198-4ea9-4553-9992-af8e27aa6578
>>
>> Find the one that contains 35028992
>>
>> So, where it says 1200384638976 and length 1073741824 in the example
>> above, which is the btrfs virtual address space from 1200384638976 to
>> 1200384638976 + 1GiB, you need to find the one where 35028992 is between
>> the start and start+length.
>>
> I found:
>         item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15993 itemsize 112
>                 length 1073741824 owner 2 stripe_len 65536 type METADATA|DUP
>                 io_align 65536 io_width 65536 sector_size 4096
>                 num_stripes 2 sub_stripes 0
>                         stripe 0 devid 1 offset 37748736
>                         dev_uuid 76acfc80-aa73-4a21-890b-34d1d2259728
>                         stripe 1 devid 1 offset 1111490560
>                         dev_uuid 76acfc80-aa73-4a21-890b-34d1d2259728
> 
> So I have Metadata DUP (at least I remembered that correctly). 
> Now, for the calculation:
> 37748736+(35028992-29360128)   =   43417600
> 1111490560+(35028992-29360128) = 1117159424
> 
>> Then, look at the stripe line. If you have DUP metadata, it will be a
>> type METADATA (instead of DATA in the example above) and it will list
>> two stripe lines, which point at the two physical locations in the
>> underlying block device.
>>
>> The place where your 16kiB metadata block is stored is at physical start
>> of stripe + (35028992 - start of virtual address block).
>>
>> Then, dump one of the two mirrored 16kiB from disk with something like
>> `dd if=/dev/sdb1 bs=1 skip=<physical location> count=16384 > foo`
> And the dd'ing:
> dd if=/dev/sdb1 bs=1 skip=43417600 count=16384 > mblock_first
> dd if=/dev/sdb1 bs=1 skip=1117159424 count=16384 > mblock_second
> Just as a cross-check, as expected, the md5sum of both files is the same, so they are identical. 
> 
>>
>> File foo of 16kiB size now contains the data that you dumped in the
>> pastebin before.
>>
>> Using hexedit on this can be a quite confusing experience because of the
>> reordering of bytes in the raw data. When you expect to find
>> 0xd89500014da12000 somewhere, it probably doesn't show up as d8 95 00 01
>> 4d a1 20 00, but in a different order.
>>
> Indeed, that's confusing, luckily I'm used to this a bit since I did some close-to-hardware work. 
> In the dump, starting at offset 0x1FB8, I get:
> 00 20 A1 4D  01 00 95 D8
> so the expected bytes in reverse. 
> So my next step would likely be to change that to:
> 00 20 A1 4D  01 00 A8 00
> and then somehow redo the CRC - correct so far? 

Almost, the 95 d8 was garbage, which needs to be 00 00, and the a8 goes
in place of the 4c, which now causes it do be displayed as UNKNOWN.76
instead of EXTENT_ITEM.

I hope the 303104 value is correct, otherwise we have to also fix that.

> And my very last step would be: 
> dd if=mblock_first of=/dev/sdb1 bs=1 skip=43417600 count=16384
> dd if=mblock_first of=/dev/sdb1 bs=1 skip=1117159424 count=16384
> (of which the "count" is then not really needed, but better safe than sorry). 
> 
>> If you end up here, and if you can find the values in the hexdump
>> already, please put the 16kiB file somewhere online (or pipe it through
>> base64 and pastebin it), so we can help a bit more efficiently.
> I've put it online here (ownCloud instance of our University):
> https://uni-bonn.sciebo.de/index.php/s/3Vdr7nmmfqPtHot/download
> and alternatively as base64 in pastebin:
> http://pastebin.com/K1CzCxqi
> 
>> After getting the bytelevel stuff right again, the block needs a new
>> checksum, and then you have to carefully dd it back in both of the
>> places which are listed in the stripe lines.
>>
>> If everything goes right... bam! Mount again and happy btrfsing again.

Yes, or... do some btrfs-assisted 'hexedit'. I just added some missing
structures for a metadata Node into python-btrfs, in a branch where I'm
playing around a bit with the first steps of offline editing.

If you clone https://github.com/knorrie/python-btrfs/ and checkout the
branch 'bigmomma', you can do this:

~/src/git/python-btrfs (bigmomma) 4-$ ipython
Python 2.7.13 (default, Dec 18 2016, 20:19:42)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import array

In [2]: import btrfs

In [3]: buf = array.array('B', open('mblock_first').read())

In [4]: node = btrfs.ctree.Node(buf)

In [5]: len(node.ptrs)
Out[5]: 376

In [6]: ptr = node.ptrs[243]

In [7]: print(ptr)
key (15606380089319694336 76 303104) block 596459520 gen 20441

In [8]: ptr.key.objectid &= 0xffffffff

In [9]: ptr.key.type = btrfs.ctree.EXTENT_ITEM_KEY

In [10]: print(ptr)
key (1302405120 EXTENT_ITEM 303104) block 596459520 gen 20441

In [11]: ptr.write()

In [12]: node.header.write()

In [13]: buf.tofile(open('mblock_first_fixed', 'wb'))

And voila:

-$ hexdump -C mblock_first > mblock_first.hexdump
-$ hexdump -C mblock_first_fixed > mblock_first_fixed.hexdump
-$ diff -u0 mblock_first.hexdump mblock_first_fixed.hexdump
+00000000  8f c0 96 b0 00 00 00 00  00 00 00 00 00 00 00 00
|................|
@@ -508,2 +508,2 @@
-00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8  |.O.......
.M....|
-00001fc0  4c 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
|L.........@.#...|
+00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 00 00 00 00  |.O.......
.M....|
+00001fc0  a8 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
|..........@.#...|

:-)

Writing back the information to the byte buffer (the node header) also
recomputes the checksum.

If this is the same change that you ended up with while doing it
manually, then try to put it back on disk twice, and see what happens
when mounting.

Comments

Oliver Freyermuth Jan. 29, 2017, 7:09 p.m. UTC | #1
Am 29.01.2017 um 17:44 schrieb Hans van Kranenburg:
> On 01/29/2017 03:02 AM, Oliver Freyermuth wrote:
>> Am 28.01.2017 um 23:27 schrieb Hans van Kranenburg:
>>> On 01/28/2017 10:04 PM, Oliver Freyermuth wrote:
>>>> Am 26.01.2017 um 12:01 schrieb Oliver Freyermuth:
>>>>> Am 26.01.2017 um 11:00 schrieb Hugo Mills:
>>>>>>    We can probably talk you through fixing this by hand with a decent
>>>>>> hex editor. I've done it before...
>>>>>>
>>>>> That would be nice! Is it fine via the mailing list? 
>>>>> Potentially, the instructions could be helpful for future reference, and "real" IRC is not accessible from my current location. 
>>>>>
>>>>> Do you have suggestions for a decent hexeditor for this job? Until now, I have been mainly using emacs, 
>>>>> classic hexedit (http://rigaux.org/hexedit.html), or okteta (beware, it's graphical!), but of course these were made for a few MiB of files and are not so well suited for a block device. 
>>>>>
>>>>> The first thing to do would then probably just be to jump to the offset where 0xd89500014da12000 is written (can I get that via inspect-internal, or do I have to search for it?), fix that to read 
>>>>> 0x00a800014da12000
>>>>> (if I understood correctly) and then probably adapt a checksum? 
>>>>>
>>>> My external backup via btrfs-restore is now done successfully, so I am ready for anything you throw at me. 
>>>> Since I was able to pull all data, though, it would mainly be something educational (for me, and likely other list readers). 
>>>> If you think that this manual procedure is not worth it, I can also just scratch and recreate the FS. 
>>>
>>> OK, let's do it. I also want to practice a bit with stuff like this, so
>>> this is a nice example.
>>>
>>> See if you can dump the chunk tree (tree 3) with btrfs inspect-internal
>>> dump-tree -t 3 /dev/xxx
>>>
>> Yes, I can! :-)
>>
>>> You should get a list of objects like this one:
>>>
>>> item 88 key (FIRST_CHUNK_TREE CHUNK_ITEM 1200384638976) itemoff 9067
>>> itemsize 80
>>>   chunk length 1073741824 owner 2 stripe_len 65536
>>>   type DATA num_stripes 1
>>>     stripe 0 devid 1 offset 729108447232
>>>     dev uuid: edae9198-4ea9-4553-9992-af8e27aa6578
>>>
>>> Find the one that contains 35028992
>>>
>>> So, where it says 1200384638976 and length 1073741824 in the example
>>> above, which is the btrfs virtual address space from 1200384638976 to
>>> 1200384638976 + 1GiB, you need to find the one where 35028992 is between
>>> the start and start+length.
>>>
>> I found:
>>         item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15993 itemsize 112
>>                 length 1073741824 owner 2 stripe_len 65536 type METADATA|DUP
>>                 io_align 65536 io_width 65536 sector_size 4096
>>                 num_stripes 2 sub_stripes 0
>>                         stripe 0 devid 1 offset 37748736
>>                         dev_uuid 76acfc80-aa73-4a21-890b-34d1d2259728
>>                         stripe 1 devid 1 offset 1111490560
>>                         dev_uuid 76acfc80-aa73-4a21-890b-34d1d2259728
>>
>> So I have Metadata DUP (at least I remembered that correctly). 
>> Now, for the calculation:
>> 37748736+(35028992-29360128)   =   43417600
>> 1111490560+(35028992-29360128) = 1117159424
>>
>>> Then, look at the stripe line. If you have DUP metadata, it will be a
>>> type METADATA (instead of DATA in the example above) and it will list
>>> two stripe lines, which point at the two physical locations in the
>>> underlying block device.
>>>
>>> The place where your 16kiB metadata block is stored is at physical start
>>> of stripe + (35028992 - start of virtual address block).
>>>
>>> Then, dump one of the two mirrored 16kiB from disk with something like
>>> `dd if=/dev/sdb1 bs=1 skip=<physical location> count=16384 > foo`
>> And the dd'ing:
>> dd if=/dev/sdb1 bs=1 skip=43417600 count=16384 > mblock_first
>> dd if=/dev/sdb1 bs=1 skip=1117159424 count=16384 > mblock_second
>> Just as a cross-check, as expected, the md5sum of both files is the same, so they are identical. 
>>
>>>
>>> File foo of 16kiB size now contains the data that you dumped in the
>>> pastebin before.
>>>
>>> Using hexedit on this can be a quite confusing experience because of the
>>> reordering of bytes in the raw data. When you expect to find
>>> 0xd89500014da12000 somewhere, it probably doesn't show up as d8 95 00 01
>>> 4d a1 20 00, but in a different order.
>>>
>> Indeed, that's confusing, luckily I'm used to this a bit since I did some close-to-hardware work. 
>> In the dump, starting at offset 0x1FB8, I get:
>> 00 20 A1 4D  01 00 95 D8
>> so the expected bytes in reverse. 
>> So my next step would likely be to change that to:
>> 00 20 A1 4D  01 00 A8 00
>> and then somehow redo the CRC - correct so far? 
> 
> Almost, the 95 d8 was garbage, which needs to be 00 00, and the a8 goes
> in place of the 4c, which now causes it do be displayed as UNKNOWN.76
> instead of EXTENT_ITEM.
> 
> I hope the 303104 value is correct, otherwise we have to also fix that.
> 
>> And my very last step would be: 
>> dd if=mblock_first of=/dev/sdb1 bs=1 skip=43417600 count=16384
>> dd if=mblock_first of=/dev/sdb1 bs=1 skip=1117159424 count=16384
>> (of which the "count" is then not really needed, but better safe than sorry). 
>>
>>> If you end up here, and if you can find the values in the hexdump
>>> already, please put the 16kiB file somewhere online (or pipe it through
>>> base64 and pastebin it), so we can help a bit more efficiently.
>> I've put it online here (ownCloud instance of our University):
>> https://uni-bonn.sciebo.de/index.php/s/3Vdr7nmmfqPtHot/download
>> and alternatively as base64 in pastebin:
>> http://pastebin.com/K1CzCxqi
>>
>>> After getting the bytelevel stuff right again, the block needs a new
>>> checksum, and then you have to carefully dd it back in both of the
>>> places which are listed in the stripe lines.
>>>
>>> If everything goes right... bam! Mount again and happy btrfsing again.
> 
> Yes, or... do some btrfs-assisted 'hexedit'. I just added some missing
> structures for a metadata Node into python-btrfs, in a branch where I'm
> playing around a bit with the first steps of offline editing.
> 
> If you clone https://github.com/knorrie/python-btrfs/ and checkout the
> branch 'bigmomma', you can do this:
> 
> ~/src/git/python-btrfs (bigmomma) 4-$ ipython
> Python 2.7.13 (default, Dec 18 2016, 20:19:42)
> Type "copyright", "credits" or "license" for more information.
> 
> IPython 5.1.0 -- An enhanced Interactive Python.
> ?         -> Introduction and overview of IPython's features.
> %quickref -> Quick reference.
> help      -> Python's own help system.
> object?   -> Details about 'object', use 'object??' for extra details.
> 
> In [1]: import array
> 
> In [2]: import btrfs
> 
> In [3]: buf = array.array('B', open('mblock_first').read())
> 
> In [4]: node = btrfs.ctree.Node(buf)
> 
> In [5]: len(node.ptrs)
> Out[5]: 376
> 
> In [6]: ptr = node.ptrs[243]
> 
> In [7]: print(ptr)
> key (15606380089319694336 76 303104) block 596459520 gen 20441
> 
> In [8]: ptr.key.objectid &= 0xffffffff
> 
> In [9]: ptr.key.type = btrfs.ctree.EXTENT_ITEM_KEY
> 
> In [10]: print(ptr)
> key (1302405120 EXTENT_ITEM 303104) block 596459520 gen 20441
> 
> In [11]: ptr.write()
> 
> In [12]: node.header.write()
> 
> In [13]: buf.tofile(open('mblock_first_fixed', 'wb'))
> 
> And voila:
> 
> -$ hexdump -C mblock_first > mblock_first.hexdump
> -$ hexdump -C mblock_first_fixed > mblock_first_fixed.hexdump
> -$ diff -u0 mblock_first.hexdump mblock_first_fixed.hexdump
> --- mblock_first.hexdump	2017-01-29 17:31:57.324537433 +0100
> +++ mblock_first_fixed.hexdump	2017-01-29 17:33:48.252683710 +0100
> @@ -1 +1 @@
> -00000000  00 22 16 2b 00 00 00 00  00 00 00 00 00 00 00 00
> |.".+............|
> +00000000  8f c0 96 b0 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> @@ -508,2 +508,2 @@
> -00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8  |.O.......
> .M....|
> -00001fc0  4c 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
> |L.........@.#...|
> +00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 00 00 00 00  |.O.......
> .M....|
> +00001fc0  a8 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
> |..........@.#...|
> 
> :-)
> 
> Writing back the information to the byte buffer (the node header) also
> recomputes the checksum.
> 
> If this is the same change that you ended up with while doing it
> manually, then try to put it back on disk twice, and see what happens
> when mounting.
> 
Wow - this nice python toolset really makes it easy, bigmomma holding your hands ;-) . 

Indeed, I get exactly the same output you did show in your example, which almost matches my manual change, apart from one bit here:
-00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8
+00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 00 00 00 00
I do not understand this change from 01 to 00, is this some parity information which python-btrfs fixed up automatically?

Trusting the output, I did:
dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=43417600 count=16384
dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=1117159424 count=16384
and re-ran "btrfs-debug-tree -b 35028992 /dev/sdb1" to confirm, item 243 is now:
...
        key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
        key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
=>      key (1302405120 EXTENT_ITEM 303104) block 596459520 (36405) gen 20441
        key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
        key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427
...
Sadly, trying to mount, I still get:
[190422.147717] BTRFS info (device sdb1): use lzo compression
[190422.147846] BTRFS info (device sdb1): disk space caching is enabled
[190422.229227] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
[190422.241635] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
[190422.241644] BTRFS error (device sdb1): failed to read block groups: -5
[190422.254824] BTRFS error (device sdb1): open_ctree failed
The notable difference is that previously, the message was:
corrupt node, bad key order: block=35028992, root=1, slot=243
So does this tell me that also item 242 was corrupted?

Cheers and thanks for everything up to now!
	Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans van Kranenburg Jan. 29, 2017, 7:28 p.m. UTC | #2
On 01/29/2017 08:09 PM, Oliver Freyermuth wrote:
>> [..whaaa.. text.. see previous message..]
> Wow - this nice python toolset really makes it easy, bigmomma holding your hands ;-) . 
> 
> Indeed, I get exactly the same output you did show in your example, which almost matches my manual change, apart from one bit here:
> -00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8
> +00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 00 00 00 00
> I do not understand this change from 01 to 00, is this some parity information which python-btrfs fixed up automatically?
> 
> Trusting the output, I did:
> dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=43417600 count=16384
> dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=1117159424 count=16384
> and re-ran "btrfs-debug-tree -b 35028992 /dev/sdb1" to confirm, item 243 is now:
> ...
>         key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
>         key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
> =>      key (1302405120 EXTENT_ITEM 303104) block 596459520 (36405) gen 20441
>         key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
>         key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427

Ehm, oh yes, that was obviously a mistake in what I showed. The
0xffffffff cuts off too much..

>>> 0xd89500014da12000 & 0xffffffff
1302405120L

This is better...

>>> 0xd89500014da12000 & 0xffffffffff
5597372416L

...which is the value Hugo also mentioned to likely be the value that
has to be there, since it nicely fits in between the surrounding keys.

> ...
> Sadly, trying to mount, I still get:
> [190422.147717] BTRFS info (device sdb1): use lzo compression
> [190422.147846] BTRFS info (device sdb1): disk space caching is enabled
> [190422.229227] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
> [190422.241635] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
> [190422.241644] BTRFS error (device sdb1): failed to read block groups: -5
> [190422.254824] BTRFS error (device sdb1): open_ctree failed
> The notable difference is that previously, the message was:
> corrupt node, bad key order: block=35028992, root=1, slot=243
> So does this tell me that also item 242 was corrupted?

No, I was just going too fast.

A nice extra excercise is to look up the block at 596459520, which this
item points to, and then see which object is the first one in the part
of the tree stored in that page. It should be (5597372416 EXTENT_ITEM
303104) I guess.
Oliver Freyermuth Jan. 29, 2017, 7:52 p.m. UTC | #3
Am 29.01.2017 um 20:28 schrieb Hans van Kranenburg:
> On 01/29/2017 08:09 PM, Oliver Freyermuth wrote:
>>> [..whaaa.. text.. see previous message..]
>> Wow - this nice python toolset really makes it easy, bigmomma holding your hands ;-) . 
>>
>> Indeed, I get exactly the same output you did show in your example, which almost matches my manual change, apart from one bit here:
>> -00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8
>> +00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 00 00 00 00
>> I do not understand this change from 01 to 00, is this some parity information which python-btrfs fixed up automatically?
>>
>> Trusting the output, I did:
>> dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=43417600 count=16384
>> dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=1117159424 count=16384
>> and re-ran "btrfs-debug-tree -b 35028992 /dev/sdb1" to confirm, item 243 is now:
>> ...
>>         key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
>>         key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
>> =>      key (1302405120 EXTENT_ITEM 303104) block 596459520 (36405) gen 20441
>>         key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
>>         key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427
> 
> Ehm, oh yes, that was obviously a mistake in what I showed. The
> 0xffffffff cuts off too much..
> 
>>>> 0xd89500014da12000 & 0xffffffff
> 1302405120L
> 
> This is better...
> 
>>>> 0xd89500014da12000 & 0xffffffffff
> 5597372416L
> 
> ...which is the value Hugo also mentioned to likely be the value that
> has to be there, since it nicely fits in between the surrounding keys.
Understood!
Now the diff matches exactly what I would done:
-00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8
-00001fc0  4c 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
+00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 00 00
+00001fc0  a8 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00

It's really nice that python-btrfs takes over all the checksumming stuff. 

Writing things back and re-running "btrfs-debug-tree -b 35028992 /dev/sdb1", I find:

        key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
        key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
=>      key (5597372416 EXTENT_ITEM 303104) block 596459520 (36405) gen 20441
        key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
        key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427

This matches the surroundings much better. 

> 
>> ...
>> Sadly, trying to mount, I still get:
>> [190422.147717] BTRFS info (device sdb1): use lzo compression
>> [190422.147846] BTRFS info (device sdb1): disk space caching is enabled
>> [190422.229227] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
>> [190422.241635] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
>> [190422.241644] BTRFS error (device sdb1): failed to read block groups: -5
>> [190422.254824] BTRFS error (device sdb1): open_ctree failed
>> The notable difference is that previously, the message was:
>> corrupt node, bad key order: block=35028992, root=1, slot=243
>> So does this tell me that also item 242 was corrupted?
> 
> No, I was just going too fast.
> 
> A nice extra excercise is to look up the block at 596459520, which this
> item points to, and then see which object is the first one in the part
> of the tree stored in that page. It should be (5597372416 EXTENT_ITEM
> 303104) I guess.
> 
That indeed matches your expectation, i.e.:
# btrfs-debug-tree -b 596459520 /dev/sdb1
contains:
        item 0 key (5597372416 EXTENT_ITEM 303104) itemoff 16230 itemsize 53

So all looks well! 

And now the final good news:
I can mount, no error messages in the syslog are shown! 


Finally, just to make sure there are no other issues, I ran a btrfs check in readonly mode:
 # btrfs check --readonly /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: cfd16c65-7f3b-4f5e-9029-971f2433d7ab
checking extents
checking free space cache
checking fs roots
invalid location in dir item 120
root 5 inode 177542 errors 2000, link count wrong
        unresolved ref dir 117670 index 29695 namelen 20 name 2016-07-12_10_26.jpg filetype 1 errors 1, no dir item
root 5 inode 18446744073709551361 errors 2001, no inode item, link count wrong
        unresolved ref dir 117670 index 0 namelen 20 name 2016-07-12_10_26.jpg filetype 1 errors 6, no dir index, no inode ref
found 127774183424 bytes used err is 1
total csum bytes: 124401728
total tree bytes: 346046464
total fs tree bytes: 163315712
total extent tree bytes: 35667968
btree space waste bytes: 53986463
file data blocks allocated: 177184325632
 referenced 130490667008

These errors are unrelated and likely caused by an earlier hard poweroff sometime last year. 

Nevertheless, since I'll now try to use this FS (let's see how long it keeps stable), I ran repair:
# btrfs check --repair /dev/sdb1
enabling repair mode
Checking filesystem on /dev/sdb1
UUID: cfd16c65-7f3b-4f5e-9029-971f2433d7ab
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
invalid location in dir item 120
Trying to rebuild inode:18446744073709551361
Failed to reset nlink for inode 18446744073709551361: No such file or directory
        unresolved ref dir 117670 index 0 namelen 20 name 2016-07-12_10_26.jpg filetype 1 errors 6, no dir index, no inode ref
checking csums
checking root refs
found 127774183424 bytes used err is 0
total csum bytes: 124401728
total tree bytes: 346046464
total fs tree bytes: 163315712
total extent tree bytes: 35667968
btree space waste bytes: 53986463
file data blocks allocated: 177184325632
 referenced 130490667008

It still mounts, and now:
[193339.299305] BTRFS info (device sdb1): use lzo compression
[193339.299308] BTRFS info (device sdb1): disk space caching is enabled
[193339.653980] BTRFS info (device sdb1): checking UUID tree

I guess this all is fine :-) . 

So all in all, I have to say a great thanks for all this support - it really was a good educational experience, and I am pretty sure this functionality of python-btrfs will be of help to others, too! 

Cheers and thanks, 
	Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans van Kranenburg Jan. 29, 2017, 8:13 p.m. UTC | #4
On 01/29/2017 08:52 PM, Oliver Freyermuth wrote:
> Am 29.01.2017 um 20:28 schrieb Hans van Kranenburg:
>> On 01/29/2017 08:09 PM, Oliver Freyermuth wrote:
>>>> [..whaaa.. text.. see previous message..]
>>> Wow - this nice python toolset really makes it easy, bigmomma holding your hands ;-) . 

Well, bigmomma is a nickname of someone on IRC that I helped with a
similar issue a few weeks ago, also a quite bizarre case of a random
collection of bytes ending up into a leaf metadata page. While doing
that I started this branch, adding some code for extra data structures
and to write changed values back.

So far the python-btrfs project focused on only working with filesystems
that are already online and mounted and correctly working.

So doing the simple chunk tree lookup we needed to find the location to
dd was already not possible with it now.

The code hacked together already for putting the metadata page into
objects with nice attributes is waiting in an experimental branch for
later, when I'm going to have a look at working with unmounted
filesystems and how to interface to the C code for doing tree plumbing. :)

>>> Indeed, I get exactly the same output you did show in your example, which almost matches my manual change, apart from one bit here:
>>> -00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8
>>> +00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 00 00 00 00
>>> I do not understand this change from 01 to 00, is this some parity information which python-btrfs fixed up automatically?
>>>
>>> Trusting the output, I did:
>>> dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=43417600 count=16384
>>> dd if=mblock_first_fixed of=/dev/sdb1 bs=1 seek=1117159424 count=16384
>>> and re-ran "btrfs-debug-tree -b 35028992 /dev/sdb1" to confirm, item 243 is now:
>>> ...
>>>         key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
>>>         key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
>>> =>      key (1302405120 EXTENT_ITEM 303104) block 596459520 (36405) gen 20441
>>>         key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
>>>         key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427
>>
>> Ehm, oh yes, that was obviously a mistake in what I showed. The
>> 0xffffffff cuts off too much..
>>
>>>>> 0xd89500014da12000 & 0xffffffff
>> 1302405120L
>>
>> This is better...
>>
>>>>> 0xd89500014da12000 & 0xffffffffff
>> 5597372416L
>>
>> ...which is the value Hugo also mentioned to likely be the value that
>> has to be there, since it nicely fits in between the surrounding keys.
> Understood!
> Now the diff matches exactly what I would done:
> -00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 95 d8
> -00001fc0  4c 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
> +00001fb0  d9 4f 00 00 00 00 00 00  00 20 a1 4d 01 00 00 00
> +00001fc0  a8 00 a0 04 00 00 00 00  00 00 40 8d 23 00 00 00
> 
> It's really nice that python-btrfs takes over all the checksumming stuff. 
> 
> Writing things back and re-running "btrfs-debug-tree -b 35028992 /dev/sdb1", I find:
> 
>         key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
>         key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
> =>      key (5597372416 EXTENT_ITEM 303104) block 596459520 (36405) gen 20441
>         key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
>         key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427
> 
> This matches the surroundings much better. 

Yes, good.

>>> ...
>>> Sadly, trying to mount, I still get:
>>> [190422.147717] BTRFS info (device sdb1): use lzo compression
>>> [190422.147846] BTRFS info (device sdb1): disk space caching is enabled
>>> [190422.229227] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
>>> [190422.241635] BTRFS critical (device sdb1): corrupt node, bad key order: block=35028992, root=1, slot=242
>>> [190422.241644] BTRFS error (device sdb1): failed to read block groups: -5
>>> [190422.254824] BTRFS error (device sdb1): open_ctree failed
>>> The notable difference is that previously, the message was:
>>> corrupt node, bad key order: block=35028992, root=1, slot=243
>>> So does this tell me that also item 242 was corrupted?
>>
>> No, I was just going too fast.
>>
>> A nice extra excercise is to look up the block at 596459520, which this
>> item points to, and then see which object is the first one in the part
>> of the tree stored in that page. It should be (5597372416 EXTENT_ITEM
>> 303104) I guess.
>>
> That indeed matches your expectation, i.e.:
> # btrfs-debug-tree -b 596459520 /dev/sdb1
> contains:
>         item 0 key (5597372416 EXTENT_ITEM 303104) itemoff 16230 itemsize 53
> 
> So all looks well! 

Yay.

> And now the final good news:
> I can mount, no error messages in the syslog are shown! 
> 
> 
> Finally, just to make sure there are no other issues, I ran a btrfs check in readonly mode:
>  # btrfs check --readonly /dev/sdb1
> Checking filesystem on /dev/sdb1
> UUID: cfd16c65-7f3b-4f5e-9029-971f2433d7ab
> checking extents
> checking free space cache
> checking fs roots
> invalid location in dir item 120
> root 5 inode 177542 errors 2000, link count wrong
>         unresolved ref dir 117670 index 29695 namelen 20 name 2016-07-12_10_26.jpg filetype 1 errors 1, no dir item
> root 5 inode 18446744073709551361 errors 2001, no inode item, link count wrong
>         unresolved ref dir 117670 index 0 namelen 20 name 2016-07-12_10_26.jpg filetype 1 errors 6, no dir index, no inode ref
> found 127774183424 bytes used err is 1
> total csum bytes: 124401728
> total tree bytes: 346046464
> total fs tree bytes: 163315712
> total extent tree bytes: 35667968
> btree space waste bytes: 53986463
> file data blocks allocated: 177184325632
>  referenced 130490667008
> 
> These errors are unrelated and likely caused by an earlier hard poweroff sometime last year. 
> 
> Nevertheless, since I'll now try to use this FS (let's see how long it keeps stable), I ran repair:
> # btrfs check --repair /dev/sdb1
> enabling repair mode
> Checking filesystem on /dev/sdb1
> UUID: cfd16c65-7f3b-4f5e-9029-971f2433d7ab
> checking extents
> Fixed 0 roots.
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> invalid location in dir item 120
> Trying to rebuild inode:18446744073709551361
> Failed to reset nlink for inode 18446744073709551361: No such file or directory
>         unresolved ref dir 117670 index 0 namelen 20 name 2016-07-12_10_26.jpg filetype 1 errors 6, no dir index, no inode ref
> checking csums
> checking root refs
> found 127774183424 bytes used err is 0
> total csum bytes: 124401728
> total tree bytes: 346046464
> total fs tree bytes: 163315712
> total extent tree bytes: 35667968
> btree space waste bytes: 53986463
> file data blocks allocated: 177184325632
>  referenced 130490667008
> 
> It still mounts, and now:
> [193339.299305] BTRFS info (device sdb1): use lzo compression
> [193339.299308] BTRFS info (device sdb1): disk space caching is enabled
> [193339.653980] BTRFS info (device sdb1): checking UUID tree
> 
> I guess this all is fine :-) . 
> 
> So all in all, I have to say a great thanks for all this support - it really was a good educational experience, and I am pretty sure this functionality of python-btrfs will be of help to others, too! 

Have fun,

Patch
diff mbox

--- mblock_first.hexdump	2017-01-29 17:31:57.324537433 +0100
+++ mblock_first_fixed.hexdump	2017-01-29 17:33:48.252683710 +0100
@@ -1 +1 @@ 
-00000000  00 22 16 2b 00 00 00 00  00 00 00 00 00 00 00 00
|.".+............|