diff mbox

How to fix errors that check --mode lomem finds, but --mode normal doesn't?

Message ID 20170628144348.abvqowzmeveyzssn@merlins.org (mailing list archive)
State New, archived
Headers show

Commit Message

Marc MERLIN June 28, 2017, 2:43 p.m. UTC
[cc trimmed]

On Wed, Jun 28, 2017 at 03:10:27PM +0800, Lu Fengqi wrote:
> Because the output is abnormal, except for the relevant DIR_ITEM and
> DIR_INDEX, I can't find the above mentiond INODE_ITEM and EXTENT_DATA.
> I wonder if the file system is online when this command is executed? If
> so, please re-execute it offline again; if not, could you apply my
> patches re-check it again?

The filesystem was offline and I had those 2 patches applied.

Marc

Comments

Lu Fengqi June 29, 2017, 1:36 p.m. UTC | #1
On Wed, Jun 28, 2017 at 07:43:48AM -0700, Marc MERLIN wrote:
>[cc trimmed]
>
>On Wed, Jun 28, 2017 at 03:10:27PM +0800, Lu Fengqi wrote:
>> Because the output is abnormal, except for the relevant DIR_ITEM and
>> DIR_INDEX, I can't find the above mentiond INODE_ITEM and EXTENT_DATA.
>> I wonder if the file system is online when this command is executed? If
>> so, please re-execute it offline again; if not, could you apply my
>> patches re-check it again?
>
>The filesystem was offline and I had those 2 patches applied.

I am afraid I don't know why the inode item disappers. Besides, if
btrfs-debug-tree can't find the inode item, btrfs check shouldn't report
this inode item's extent data interrupt. Could you check the disk
again? The error output may have changed.

>
>Marc
>-- 
>"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
>Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
Marc MERLIN June 29, 2017, 3:30 p.m. UTC | #2
On Thu, Jun 29, 2017 at 09:36:15PM +0800, Lu Fengqi wrote:
> On Wed, Jun 28, 2017 at 07:43:48AM -0700, Marc MERLIN wrote:
> >[cc trimmed]
> >
> >On Wed, Jun 28, 2017 at 03:10:27PM +0800, Lu Fengqi wrote:
> >> Because the output is abnormal, except for the relevant DIR_ITEM and
> >> DIR_INDEX, I can't find the above mentiond INODE_ITEM and EXTENT_DATA.
> >> I wonder if the file system is online when this command is executed? If
> >> so, please re-execute it offline again; if not, could you apply my
> >> patches re-check it again?
> >
> >The filesystem was offline and I had those 2 patches applied.
> 
> I am afraid I don't know why the inode item disappers. Besides, if
> btrfs-debug-tree can't find the inode item, btrfs check shouldn't report
> this inode item's extent data interrupt. Could you check the disk
> again? The error output may have changed.

I just did but it takes 24H. I just have the results now: 
gargamel:~# btrfs check --mode lowmem  /dev/mapper/dshelf2
Checking filesystem on /dev/mapper/dshelf2
UUID: 85441c59-ad11-4b25-b1fe-974f9e4acede
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
ERROR: root 3862 EXTENT_DATA[18170706 4096] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 16384] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 20480] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt
ERROR: errors found in fs roots
found 5544779124736 bytes used, error(s) found
total csum bytes: 5344523140
total tree bytes: 71323058176
total fs tree bytes: 59288403968
total extent tree bytes: 5378277376
btree space waste bytes: 10912183048
file data blocks allocated: 7830914256896
 referenced 6244104495104


This is looking better, but not 0.
Can I ignore these or should we look into them still?

Marc
Lu Fengqi June 30, 2017, 2:59 p.m. UTC | #3
On Thu, Jun 29, 2017 at 08:30:35AM -0700, Marc MERLIN wrote:
>On Thu, Jun 29, 2017 at 09:36:15PM +0800, Lu Fengqi wrote:
>> On Wed, Jun 28, 2017 at 07:43:48AM -0700, Marc MERLIN wrote:
>> >[cc trimmed]
>> >
>> >On Wed, Jun 28, 2017 at 03:10:27PM +0800, Lu Fengqi wrote:
>> >> Because the output is abnormal, except for the relevant DIR_ITEM and
>> >> DIR_INDEX, I can't find the above mentiond INODE_ITEM and EXTENT_DATA.
>> >> I wonder if the file system is online when this command is executed? If
>> >> so, please re-execute it offline again; if not, could you apply my
>> >> patches re-check it again?
>> >
>> >The filesystem was offline and I had those 2 patches applied.
>> 
>> I am afraid I don't know why the inode item disappers. Besides, if
>> btrfs-debug-tree can't find the inode item, btrfs check shouldn't report
>> this inode item's extent data interrupt. Could you check the disk
>> again? The error output may have changed.
>
>I just did but it takes 24H. I just have the results now: 
>gargamel:~# btrfs check --mode lowmem  /dev/mapper/dshelf2
>Checking filesystem on /dev/mapper/dshelf2
>UUID: 85441c59-ad11-4b25-b1fe-974f9e4acede
>checking extents
>checking free space cache
>cache and super generation don't match, space cache will be invalidated
>checking fs roots
>ERROR: root 3862 EXTENT_DATA[18170706 4096] interrupt
>ERROR: root 3862 EXTENT_DATA[18170706 16384] interrupt
>ERROR: root 3862 EXTENT_DATA[18170706 20480] interrupt
>ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt
>ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt
>ERROR: errors found in fs roots
>found 5544779124736 bytes used, error(s) found
>total csum bytes: 5344523140
>total tree bytes: 71323058176
>total fs tree bytes: 59288403968
>total extent tree bytes: 5378277376
>btree space waste bytes: 10912183048
>file data blocks allocated: 7830914256896
> referenced 6244104495104
>
>
>This is looking better, but not 0.
>Can I ignore these or should we look into them still?
>
>Marc
>-- 
>"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
>Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
>
>

Personally, I think since the normal mode didn't report any error
related this inode, then these error maybe caused by the bug of lowmem
mode and btrfs-debug-tree.

At your convenience, would you please give me all items about this
inode? I think it can provide some clues regarding the disappearance
of inode and the extent interrupt. It can be dumped by this following
command:

# btrfs-debug-tree /dev/mapper/dshelf2 | grep -C 10 18170706

Please pay attention that, this dump may contain filenames, feel free
to mask the filenames.

Thank you for your assistance.
Marc MERLIN July 7, 2017, 5:37 a.m. UTC | #4
I'm still trying to fix my filesystem.
It seems to work well enough since the damage is apparently localized, but
I'd really want check --repair to actually bring it back to a working
state, but now it's crashing

This is btrfs tools from git from a few days ago

Failed to find [4068943577088, 168, 16384]
btrfs unable to find ref byte nr 4068943577088 parent 0 root 4  owner 1 offset 0
Failed to find [5905106075648, 168, 16384]
btrfs unable to find ref byte nr 5906282119168 parent 0 root 4  owner 0 offset 1
Failed to find [21037056, 168, 16384]
btrfs unable to find ref byte nr 21037056 parent 0 root 3  owner 1 offset 0
Failed to find [21053440, 168, 16384]
btrfs unable to find ref byte nr 21053440 parent 0 root 3  owner 0 offset 1
Failed to find [21299200, 168, 16384]
btrfs unable to find ref byte nr 21299200 parent 0 root 3  owner 0 offset 1
Failed to find [5523931971584, 168, 16384]
btrfs unable to find ref byte nr 5524037566464 parent 0 root 3861  owner 3 offset 0
ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5
btrfs(+0x113cf)[0x5651e60443cf]
btrfs(__btrfs_cow_block+0x576)[0x5651e6045848]
btrfs(btrfs_cow_block+0xea)[0x5651e6045dc6]
btrfs(btrfs_search_slot+0x11df)[0x5651e604969d]
btrfs(+0x59184)[0x5651e608c184]
btrfs(cmd_check+0x2bd4)[0x5651e60987b3]
btrfs(main+0x85)[0x5651e60442c3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f34f523d2b1]
btrfs(_start+0x2a)[0x5651e6043e3a]


Full log:
enabling repair mode
Checking filesystem on /dev/mapper/dshelf2
UUID: 85441c59-ad11-4b25-b1fe-974f9e4acede
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checksum verify failed on 3037243965440 found 179689AF wanted 82B97043
checksum verify failed on 3037243965440 found 179689AF wanted 82B97043
checksum verify failed on 3037243998208 found 60EA5C5B wanted 0CF5948F
checksum verify failed on 3037243998208 found 60EA5C5B wanted 0CF5948F
checksum verify failed on 3037244293120 found 38382803 wanted 39E4F85E
checksum verify failed on 3037244293120 found 38382803 wanted 39E4F85E
checksum verify failed on 3037244342272 found E84F1D8F wanted 472DA98C
checksum verify failed on 3037244342272 found E84F1D8F wanted 472DA98C
checksum verify failed on 3037244669952 found 2F6E4C0E wanted E00BBF09
checksum verify failed on 3037244669952 found 2F6E4C0E wanted E00BBF09
checksum verify failed on 3037248913408 found CE2E4AEE wanted EF22F9CA
checksum verify failed on 3037248913408 found CE2E4AEE wanted EF22F9CA
checksum verify failed on 3037248929792 found C989CB0E wanted E27527BC
checksum verify failed on 3037248929792 found C989CB0E wanted E27527BC
checksum verify failed on 3037247569920 found 05848C79 wanted EF3D5598
checksum verify failed on 3037247569920 found 05848C79 wanted EF3D5598
checksum verify failed on 3037247586304 found 9D1E4E39 wanted F1EC8135
checksum verify failed on 3037247586304 found 9D1E4E39 wanted F1EC8135
checksum verify failed on 3037247619072 found BFE40520 wanted 627DB20D
checksum verify failed on 3037247619072 found BFE40520 wanted 627DB20D
checksum verify failed on 3037249208320 found A6B5775F wanted B1E6C0FC
checksum verify failed on 3037249208320 found A6B5775F wanted B1E6C0FC
checksum verify failed on 3037252534272 found 207AD7DF wanted DE72BDF7
checksum verify failed on 3037252534272 found 207AD7DF wanted DE72BDF7
checksum verify failed on 3111569391616 found 3C623707 wanted D955D668
checksum verify failed on 3111569391616 found 3C623707 wanted D955D668
checksum verify failed on 3111569768448 found 0C129F3C wanted C509003A
checksum verify failed on 3111569768448 found 0C129F3C wanted C509003A
checksum verify failed on 3111569735680 found E94C9D41 wanted 55836DD2
checksum verify failed on 3111569735680 found E94C9D41 wanted 55836DD2
checksum verify failed on 3037253435392 found 8E124EB5 wanted A3291C35
checksum verify failed on 3037253435392 found 8E124EB5 wanted A3291C35
checksum verify failed on 3037253746688 found 2B6A4DCD wanted 4323B339
checksum verify failed on 3037253746688 found 2B6A4DCD wanted 4323B339
checksum verify failed on 3111569702912 found 1048610C wanted 9856BB43
checksum verify failed on 3111569702912 found 1048610C wanted 9856BB43
checksum verify failed on 3111569801216 found CD7AAF82 wanted C1DA44DF
checksum verify failed on 3111569801216 found CD7AAF82 wanted C1DA44DF
checksum verify failed on 3037251878912 found 86FB02F3 wanted 728772CE
checksum verify failed on 3037251878912 found 86FB02F3 wanted 728772CE
checksum verify failed on 3037252861952 found CFD54426 wanted E91774C0
checksum verify failed on 3037252861952 found CFD54426 wanted E91774C0
checksum verify failed on 3037255974912 found E3655B7C wanted 8163FDDE
checksum verify failed on 3037255974912 found E3655B7C wanted 8163FDDE
checksum verify failed on 3037252927488 found E7AD88A3 wanted F6BA5B10
checksum verify failed on 3037252927488 found E7AD88A3 wanted F6BA5B10
checksum verify failed on 3037253500928 found 514A55B2 wanted 3611CD81
checksum verify failed on 3037253500928 found 514A55B2 wanted 3611CD81
checksum verify failed on 3037256105984 found 41ADA274 wanted 8F7F0A0B
checksum verify failed on 3037256105984 found 41ADA274 wanted 8F7F0A0B
Csum didn't match
The following tree block(s) is corrupted in tree 3861:
	tree block bytenr: 1710573748224, level: 1, node key: (1073956, 12, 959325)
Try to repair the btree for root 3861
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Csum didn't match
Failed to find [4068943577088, 168, 16384]
btrfs unable to find ref byte nr 4068943577088 parent 0 root 4  owner 1 offset 0
Failed to find [5905106075648, 168, 16384]
btrfs unable to find ref byte nr 5906282119168 parent 0 root 4  owner 0 offset 1
Failed to find [21037056, 168, 16384]
btrfs unable to find ref byte nr 21037056 parent 0 root 3  owner 1 offset 0
Failed to find [21053440, 168, 16384]
btrfs unable to find ref byte nr 21053440 parent 0 root 3  owner 0 offset 1
Failed to find [21299200, 168, 16384]
btrfs unable to find ref byte nr 21299200 parent 0 root 3  owner 0 offset 1
Failed to find [5523931971584, 168, 16384]
btrfs unable to find ref byte nr 5524037566464 parent 0 root 3861  owner 3 offset 0
ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5
btrfs(+0x113cf)[0x5651e60443cf]
btrfs(__btrfs_cow_block+0x576)[0x5651e6045848]
btrfs(btrfs_cow_block+0xea)[0x5651e6045dc6]
btrfs(btrfs_search_slot+0x11df)[0x5651e604969d]
btrfs(+0x59184)[0x5651e608c184]
btrfs(cmd_check+0x2bd4)[0x5651e60987b3]
btrfs(main+0x85)[0x5651e60442c3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f34f523d2b1]
btrfs(_start+0x2a)[0x5651e6043e3a]
Aborted
gargamel:~#
Marc MERLIN July 7, 2017, 5:39 a.m. UTC | #5
On Thu, Jul 06, 2017 at 10:37:18PM -0700, Marc MERLIN wrote:
> I'm still trying to fix my filesystem.
> It seems to work well enough since the damage is apparently localized, but
> I'd really want check --repair to actually bring it back to a working
> state, but now it's crashing
> 
> This is btrfs tools from git from a few days ago
> 
> Failed to find [4068943577088, 168, 16384]
> btrfs unable to find ref byte nr 4068943577088 parent 0 root 4  owner 1 offset 0
> Failed to find [5905106075648, 168, 16384]
> btrfs unable to find ref byte nr 5906282119168 parent 0 root 4  owner 0 offset 1
> Failed to find [21037056, 168, 16384]
> btrfs unable to find ref byte nr 21037056 parent 0 root 3  owner 1 offset 0
> Failed to find [21053440, 168, 16384]
> btrfs unable to find ref byte nr 21053440 parent 0 root 3  owner 0 offset 1
> Failed to find [21299200, 168, 16384]
> btrfs unable to find ref byte nr 21299200 parent 0 root 3  owner 0 offset 1
> Failed to find [5523931971584, 168, 16384]
> btrfs unable to find ref byte nr 5524037566464 parent 0 root 3861  owner 3 offset 0
> ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5
> btrfs(+0x113cf)[0x5651e60443cf]
> btrfs(__btrfs_cow_block+0x576)[0x5651e6045848]
> btrfs(btrfs_cow_block+0xea)[0x5651e6045dc6]
> btrfs(btrfs_search_slot+0x11df)[0x5651e604969d]
> btrfs(+0x59184)[0x5651e608c184]
> btrfs(cmd_check+0x2bd4)[0x5651e60987b3]
> btrfs(main+0x85)[0x5651e60442c3]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f34f523d2b1]
> btrfs(_start+0x2a)[0x5651e6043e3a]

Mmmh, never mind, it seems that the software raid suffered yet another
double disk failure due to some undermined flakiness in the underlying block
device cabling :-/
That would likely explain the failures here.

 
> Full log:
> enabling repair mode
> Checking filesystem on /dev/mapper/dshelf2
> UUID: 85441c59-ad11-4b25-b1fe-974f9e4acede
> checking extents
> Fixed 0 roots.
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> checksum verify failed on 3037243965440 found 179689AF wanted 82B97043
> checksum verify failed on 3037243965440 found 179689AF wanted 82B97043
> checksum verify failed on 3037243998208 found 60EA5C5B wanted 0CF5948F
> checksum verify failed on 3037243998208 found 60EA5C5B wanted 0CF5948F
> checksum verify failed on 3037244293120 found 38382803 wanted 39E4F85E
> checksum verify failed on 3037244293120 found 38382803 wanted 39E4F85E
> checksum verify failed on 3037244342272 found E84F1D8F wanted 472DA98C
> checksum verify failed on 3037244342272 found E84F1D8F wanted 472DA98C
> checksum verify failed on 3037244669952 found 2F6E4C0E wanted E00BBF09
> checksum verify failed on 3037244669952 found 2F6E4C0E wanted E00BBF09
> checksum verify failed on 3037248913408 found CE2E4AEE wanted EF22F9CA
> checksum verify failed on 3037248913408 found CE2E4AEE wanted EF22F9CA
> checksum verify failed on 3037248929792 found C989CB0E wanted E27527BC
> checksum verify failed on 3037248929792 found C989CB0E wanted E27527BC
> checksum verify failed on 3037247569920 found 05848C79 wanted EF3D5598
> checksum verify failed on 3037247569920 found 05848C79 wanted EF3D5598
> checksum verify failed on 3037247586304 found 9D1E4E39 wanted F1EC8135
> checksum verify failed on 3037247586304 found 9D1E4E39 wanted F1EC8135
> checksum verify failed on 3037247619072 found BFE40520 wanted 627DB20D
> checksum verify failed on 3037247619072 found BFE40520 wanted 627DB20D
> checksum verify failed on 3037249208320 found A6B5775F wanted B1E6C0FC
> checksum verify failed on 3037249208320 found A6B5775F wanted B1E6C0FC
> checksum verify failed on 3037252534272 found 207AD7DF wanted DE72BDF7
> checksum verify failed on 3037252534272 found 207AD7DF wanted DE72BDF7
> checksum verify failed on 3111569391616 found 3C623707 wanted D955D668
> checksum verify failed on 3111569391616 found 3C623707 wanted D955D668
> checksum verify failed on 3111569768448 found 0C129F3C wanted C509003A
> checksum verify failed on 3111569768448 found 0C129F3C wanted C509003A
> checksum verify failed on 3111569735680 found E94C9D41 wanted 55836DD2
> checksum verify failed on 3111569735680 found E94C9D41 wanted 55836DD2
> checksum verify failed on 3037253435392 found 8E124EB5 wanted A3291C35
> checksum verify failed on 3037253435392 found 8E124EB5 wanted A3291C35
> checksum verify failed on 3037253746688 found 2B6A4DCD wanted 4323B339
> checksum verify failed on 3037253746688 found 2B6A4DCD wanted 4323B339
> checksum verify failed on 3111569702912 found 1048610C wanted 9856BB43
> checksum verify failed on 3111569702912 found 1048610C wanted 9856BB43
> checksum verify failed on 3111569801216 found CD7AAF82 wanted C1DA44DF
> checksum verify failed on 3111569801216 found CD7AAF82 wanted C1DA44DF
> checksum verify failed on 3037251878912 found 86FB02F3 wanted 728772CE
> checksum verify failed on 3037251878912 found 86FB02F3 wanted 728772CE
> checksum verify failed on 3037252861952 found CFD54426 wanted E91774C0
> checksum verify failed on 3037252861952 found CFD54426 wanted E91774C0
> checksum verify failed on 3037255974912 found E3655B7C wanted 8163FDDE
> checksum verify failed on 3037255974912 found E3655B7C wanted 8163FDDE
> checksum verify failed on 3037252927488 found E7AD88A3 wanted F6BA5B10
> checksum verify failed on 3037252927488 found E7AD88A3 wanted F6BA5B10
> checksum verify failed on 3037253500928 found 514A55B2 wanted 3611CD81
> checksum verify failed on 3037253500928 found 514A55B2 wanted 3611CD81
> checksum verify failed on 3037256105984 found 41ADA274 wanted 8F7F0A0B
> checksum verify failed on 3037256105984 found 41ADA274 wanted 8F7F0A0B
> Csum didn't match
> The following tree block(s) is corrupted in tree 3861:
> 	tree block bytenr: 1710573748224, level: 1, node key: (1073956, 12, 959325)
> Try to repair the btree for root 3861
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Csum didn't match
> Failed to find [4068943577088, 168, 16384]
> btrfs unable to find ref byte nr 4068943577088 parent 0 root 4  owner 1 offset 0
> Failed to find [5905106075648, 168, 16384]
> btrfs unable to find ref byte nr 5906282119168 parent 0 root 4  owner 0 offset 1
> Failed to find [21037056, 168, 16384]
> btrfs unable to find ref byte nr 21037056 parent 0 root 3  owner 1 offset 0
> Failed to find [21053440, 168, 16384]
> btrfs unable to find ref byte nr 21053440 parent 0 root 3  owner 0 offset 1
> Failed to find [21299200, 168, 16384]
> btrfs unable to find ref byte nr 21299200 parent 0 root 3  owner 0 offset 1
> Failed to find [5523931971584, 168, 16384]
> btrfs unable to find ref byte nr 5524037566464 parent 0 root 3861  owner 3 offset 0
> ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5
> btrfs(+0x113cf)[0x5651e60443cf]
> btrfs(__btrfs_cow_block+0x576)[0x5651e6045848]
> btrfs(btrfs_cow_block+0xea)[0x5651e6045dc6]
> btrfs(btrfs_search_slot+0x11df)[0x5651e604969d]
> btrfs(+0x59184)[0x5651e608c184]
> btrfs(cmd_check+0x2bd4)[0x5651e60987b3]
> btrfs(main+0x85)[0x5651e60442c3]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f34f523d2b1]
> btrfs(_start+0x2a)[0x5651e6043e3a]
> Aborted
> gargamel:~# 
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/
Lu Fengqi July 7, 2017, 9:33 a.m. UTC | #6
On Thu, Jul 06, 2017 at 10:39:53PM -0700, Marc MERLIN wrote:
>On Thu, Jul 06, 2017 at 10:37:18PM -0700, Marc MERLIN wrote:
>> I'm still trying to fix my filesystem.
>> It seems to work well enough since the damage is apparently localized, but
>> I'd really want check --repair to actually bring it back to a working
>> state, but now it's crashing

I apologise for my late reply. As a colleague left, I have to take over his
work recently.

>> 
>> This is btrfs tools from git from a few days ago
>> 
>> Failed to find [4068943577088, 168, 16384]
>> btrfs unable to find ref byte nr 4068943577088 parent 0 root 4  owner 1 offset 0
>> Failed to find [5905106075648, 168, 16384]
>> btrfs unable to find ref byte nr 5906282119168 parent 0 root 4  owner 0 offset 1
>> Failed to find [21037056, 168, 16384]
>> btrfs unable to find ref byte nr 21037056 parent 0 root 3  owner 1 offset 0
>> Failed to find [21053440, 168, 16384]
>> btrfs unable to find ref byte nr 21053440 parent 0 root 3  owner 0 offset 1
>> Failed to find [21299200, 168, 16384]
>> btrfs unable to find ref byte nr 21299200 parent 0 root 3  owner 0 offset 1
>> Failed to find [5523931971584, 168, 16384]
>> btrfs unable to find ref byte nr 5524037566464 parent 0 root 3861  owner 3 offset 0
>> ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5
>> btrfs(+0x113cf)[0x5651e60443cf]
>> btrfs(__btrfs_cow_block+0x576)[0x5651e6045848]
>> btrfs(btrfs_cow_block+0xea)[0x5651e6045dc6]
>> btrfs(btrfs_search_slot+0x11df)[0x5651e604969d]
>> btrfs(+0x59184)[0x5651e608c184]
>> btrfs(cmd_check+0x2bd4)[0x5651e60987b3]
>> btrfs(main+0x85)[0x5651e60442c3]
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f34f523d2b1]
>> btrfs(_start+0x2a)[0x5651e6043e3a]
>
>Mmmh, never mind, it seems that the software raid suffered yet another
>double disk failure due to some undermined flakiness in the underlying block
>device cabling :-/
>That would likely explain the failures here.

I'm sorry for hear this. Which raid level are you using? So could you recover
from this double disk failure?
Marc MERLIN July 7, 2017, 4:38 p.m. UTC | #7
On Fri, Jul 07, 2017 at 05:33:20PM +0800, Lu Fengqi wrote:
> I apologise for my late reply. As a colleague left, I have to take over his
> work recently.
 
no worries.

> >Mmmh, never mind, it seems that the software raid suffered yet another
> >double disk failure due to some undermined flakiness in the underlying block
> >device cabling :-/
> >That would likely explain the failures here.
> 
> I'm sorry for hear this. Which raid level are you using? So could you recover
> from this double disk failure?

The disks aren't failed, and the array wasn't being written to.
It's just a matter of putting the disks back in the md raid5 array in the
right order.

Marc
Marc MERLIN July 9, 2017, 4:34 a.m. UTC | #8
Sigh,

This is now the 3rd filesystem I have (on 3 different machines) that is
getting corruption of some kind (on 4.11.6).
This is starting to look suspicious :-/

Can I fix this filesystem in some other way?
gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 
enabling repair mode
Checking filesystem on /dev/mapper/crypt_bcache2
UUID: c4e6f9ca-e9a2-43d7-befa-763fc2cd5a57
checking extents
ref mismatch on [14655689654272 16384] extent item 0, found 1
Backref 14655689654272 parent 15455 root 15455 not found in extent tree
backpointer mismatch on [14655689654272 16384]
owner ref check failed [14655689654272 16384]
repair deleting extent record: key 14655689654272 169 1
adding new tree backref on start 14655689654272 len 16384 parent 0 root 15455
Repaired extent references for 14655689654272
root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
ERROR: failed to repair root items: Invalid argument

Recreating the filesystem is going to take me a week of work, a lot of if
manual, and I'm not feeling very good with doing this since the backup
server this is a backup of, is also seeing some hopefully minor) problems
too.

I really hope there isn't a new corruption problem in 4.11, because when
I'm getting corruption on my laptop, my backup server, and the backup of my
backup server, I'm starting to run out of redundant backups :(
(and I'm not mentioning all the time this is costing me)

Marc
Marc MERLIN July 9, 2017, 5:05 a.m. UTC | #9
+Chris

On Sat, Jul 08, 2017 at 09:34:17PM -0700, Marc MERLIN wrote:
> gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 
> enabling repair mode
> Checking filesystem on /dev/mapper/crypt_bcache2
> UUID: c4e6f9ca-e9a2-43d7-befa-763fc2cd5a57
> checking extents
> ref mismatch on [14655689654272 16384] extent item 0, found 1
> Backref 14655689654272 parent 15455 root 15455 not found in extent tree
> backpointer mismatch on [14655689654272 16384]
> owner ref check failed [14655689654272 16384]
> repair deleting extent record: key 14655689654272 169 1
> adding new tree backref on start 14655689654272 len 16384 parent 0 root 15455
> Repaired extent references for 14655689654272
> root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
> ERROR: failed to repair root items: Invalid argument

On this note, getting hit 3 times on 3 different filesystems, that are not
badly damaged, but in none of those caess can btrfs check --repair put them
in a working state, is really bringing home the problem with lack of proper
fsck.

I understand that some errors are hard to fix without unknown data loss, but
btrfs check --repair should just do what it takes to put the filesystem back
into a consistent state, never mind what data is lost.
Restoring 10 to 20TB of data is getting old and is not really an acceptable
answer as the only way out.
I should not have to recreate a filesystem as the only way to bring it back
to a working state. 

Before Duncan tells me my filesystem is too big, and I should keep to very
small filesystems so that it's less work for each time btrfs gets corrupted
again, and fails again to bring back the filesystem to a usable state after
discarding some data, that's just not an acceptable answer long term, and by
long term honestly I mean now.
I just have data that doesn't segment well and the more small filesystems I
make the more time I'm going to waste managing them all and dealing with
which one gets full first :(

So, whether 4.11 has a corruption problem, or not, please put some resources
behind btrfs check --repair, be it the lowmem mode, or not.

Thank you
Marc
Marc MERLIN July 9, 2017, 6:34 a.m. UTC | #10
On Sat, Jul 08, 2017 at 09:34:17PM -0700, Marc MERLIN wrote:
> Sigh,
> 
> This is now the 3rd filesystem I have (on 3 different machines) that is
> getting corruption of some kind (on 4.11.6).
> This is starting to look suspicious :-/
> 
> Can I fix this filesystem in some other way?
> gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 
> enabling repair mode
> Checking filesystem on /dev/mapper/crypt_bcache2
> UUID: c4e6f9ca-e9a2-43d7-befa-763fc2cd5a57
> checking extents
> ref mismatch on [14655689654272 16384] extent item 0, found 1
> Backref 14655689654272 parent 15455 root 15455 not found in extent tree
> backpointer mismatch on [14655689654272 16384]
> owner ref check failed [14655689654272 16384]
> repair deleting extent record: key 14655689654272 169 1
> adding new tree backref on start 14655689654272 len 16384 parent 0 root 15455
> Repaired extent references for 14655689654272
> root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
> ERROR: failed to repair root items: Invalid argument

Mmmh, actually to be fair, this was the 2nd run, I didn't scroll back
enough and missed the first run (doing too many recoveries at once,
I'm getting mixed up).
This first run looks like a lot more things happened:
http://marc.merlins.org/tmp/btrfs_check_ds5.txt

The amount of things that went wrong here are very worrisome, given that
there were no issues with those drives and that array has been working
for over a year without problems, until I recently upgraded to 4.11 :(

Now mind you, despite the 21MB of things that got fixed, I still kind of
have the expectation that btrfs check --repairs continues and fixes
everything until the filesystem is clean again, just like e2fsck -f
would, but I understand that this filesystem somehow got corrupted to a
point that it's maybe not that simple to do so.

Marc
Martin Steigerwald July 9, 2017, 7:57 a.m. UTC | #11
Hello Marc.

Marc MERLIN - 08.07.17, 21:34:
> Sigh,
> 
> This is now the 3rd filesystem I have (on 3 different machines) that is
> getting corruption of some kind (on 4.11.6).

Anyone else getting corruptions with 4.11?

I happily switch back to 4.10.17 or even 4.9 if that is the case. I may even 
do so just from your reports. Well, yes, I will do exactly that. I just switch 
back for 4.10 for now. Better be safe, than sorry.

I know how you feel, Marc. I posted about a corruption on one of my backup 
harddisks here some time ago that btrfs check --repair wasn´t able to handle. 
I redid that disk from scratch and it took a long, long time.

I agree with you that this has to stop. Before that I will never *ever* 
recommend this to a customer. Ideally no corruptions in stable kernels, 
especially when its a .6 at the end of the version number. But if so… then 
fixable. Other filesystems like Ext4 and XFS can do it… so this should be 
possible with BTRFS as well.

Thanks,
Paul Jones July 9, 2017, 9:16 a.m. UTC | #12
> -----Original Message-----

> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-

> owner@vger.kernel.org] On Behalf Of Martin Steigerwald

> Sent: Sunday, 9 July 2017 5:58 PM

> To: Marc MERLIN <marc@merlins.org>

> Cc: Lu Fengqi <lufq.fnst@cn.fujitsu.com>; Btrfs BTRFS <linux-

> btrfs@vger.kernel.org>; David Sterba <dsterba@suse.cz>

> Subject: Re: 4.11.6 / more corruption / root 15455 has a root item with a more

> recent gen (33682) compared to the found root node (0)

> 

> Hello Marc.

> 

> Marc MERLIN - 08.07.17, 21:34:

> > Sigh,

> >

> > This is now the 3rd filesystem I have (on 3 different machines) that

> > is getting corruption of some kind (on 4.11.6).

> 

> Anyone else getting corruptions with 4.11?

> 

> I happily switch back to 4.10.17 or even 4.9 if that is the case. I may even do

> so just from your reports. Well, yes, I will do exactly that. I just switch back

> for 4.10 for now. Better be safe, than sorry.


No corruption for me - I've been on 4.11 since about .2 and everything seems fine. Currently on 4.11.8

Paul.
Duncan July 9, 2017, 11:17 a.m. UTC | #13
Paul Jones posted on Sun, 09 Jul 2017 09:16:36 +0000 as excerpted:

>> Marc MERLIN - 08.07.17, 21:34:
>> >
>> > This is now the 3rd filesystem I have (on 3 different machines) that
>> > is getting corruption of some kind (on 4.11.6).
>> 
>> Anyone else getting corruptions with 4.11?
>> 
>> I happily switch back to 4.10.17 or even 4.9 if that is the case. I may
>> even do so just from your reports. Well, yes, I will do exactly that. I
>> just switch back for 4.10 for now. Better be safe, than sorry.
> 
> No corruption for me - I've been on 4.11 since about .2 and everything
> seems fine. Currently on 4.11.8

No corruptions here either. 4.12.0 now, previously 4.12-rc5(ish, git), 
before that 4.11.0.

I have however just upgraded to new ssds then wiped and setup the old 
ones as another backup set, so everything is on brand new filesystems on 
fast ssds, no possibility of old undetected corruption suddenly 
triggering problems.

Also, all my btrfs are raid1 or dup for checksummed redundancy, and 
relatively small, the largest now 80 GiB per device, after the upgrade.  
And my use-case doesn't involve snapshots or subvolumes.  

So any bug that is most likely on older filesystems, say those without 
the no-holes feature, for instance, or that doesn't tend to hit raid1 or 
dup mode, or that is less likely on small filesystems on fast ssds, or 
that triggers most often with reflinks and thus on filesystems with 
snapshots, is unlikely to hit me.
Martin Steigerwald July 9, 2017, 1 p.m. UTC | #14
Hello Duncan.

Duncan - 09.07.17, 11:17:
> Paul Jones posted on Sun, 09 Jul 2017 09:16:36 +0000 as excerpted:
> >> Marc MERLIN - 08.07.17, 21:34:
> >> > This is now the 3rd filesystem I have (on 3 different machines) that
> >> > is getting corruption of some kind (on 4.11.6).
> >> 
> >> Anyone else getting corruptions with 4.11?
> >> 
> >> I happily switch back to 4.10.17 or even 4.9 if that is the case. I may
> >> even do so just from your reports. Well, yes, I will do exactly that. I
> >> just switch back for 4.10 for now. Better be safe, than sorry.
> > 
> > No corruption for me - I've been on 4.11 since about .2 and everything
> > seems fine. Currently on 4.11.8
> 
> No corruptions here either. 4.12.0 now, previously 4.12-rc5(ish, git),
> before that 4.11.0.
> 
> I have however just upgraded to new ssds then wiped and setup the old
[…]
> Also, all my btrfs are raid1 or dup for checksummed redundancy, and
> relatively small, the largest now 80 GiB per device, after the upgrade.
> And my use-case doesn't involve snapshots or subvolumes.
> 
> So any bug that is most likely on older filesystems, say those without
> the no-holes feature, for instance, or that doesn't tend to hit raid1 or
> dup mode, or that is less likely on small filesystems on fast ssds, or
> that triggers most often with reflinks and thus on filesystems with
> snapshots, is unlikely to hit me.

Hmmm, the BTRFS filesystems on my laptop 3 to 5 or even more years old. I stick 
with 4.10 for now, I think.

The older ones are RAID 1 across two SSDs, the newer one is single device, on 
one SSD.

These filesystems didn´t fail me in years and since 4.5 or 4.6 even the "I 
search for free space" kernel hang (hung tasks and all that) is gone as well.

Thanks,
Imran Geriskovan July 29, 2017, 7:29 p.m. UTC | #15
On 7/9/17, Duncan <1i5t5.duncan@cox.net> wrote:
> I have however just upgraded to new ssds then wiped and setup the old
> ones as another backup set, so everything is on brand new filesystems on
> fast ssds, no possibility of old undetected corruption suddenly
> triggering problems.
>
> Also, all my btrfs are raid1 or dup for checksummed redundancy

Do you have any experience/advice/comment regarding
dup data on ssds?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan July 29, 2017, 11:38 p.m. UTC | #16
Imran Geriskovan posted on Sat, 29 Jul 2017 21:29:46 +0200 as excerpted:

> On 7/9/17, Duncan <1i5t5.duncan@cox.net> wrote:
>> I have however just upgraded to new ssds then wiped and setup the old
>> ones as another backup set, so everything is on brand new filesystems 
on
>> fast ssds, no possibility of old undetected corruption suddenly
>> triggering problems.
>>
>> Also, all my btrfs are raid1 or dup for checksummed redundancy
> 
> Do you have any experience/advice/comment regarding
> dup data on ssds?

Very good question. =:^)

Limited.  Most of my btrfs are raid1, with dup only used on the device-
respective /boot btrfs (of which there are four, one on each of the two 
ssds that otherwise form the btrfs raid1 pairs, for each of the working 
and backup copy pairs -- I can use BIOS to select any of the four to 
boot), and those are all sub-GiB mixed-bg mode.

So all my dup experience is sub-GiB mixed-blockgroup mode.

Within that limitation, my only btrfs problem has been that at my 
initially chosen size of 256 MiB, mkfs.btrfs at least used to create an 
initial data/metadata chunk of 64 MiB.  Remember, this is dup mode, so 
there's two of them = 128 MiB.  Because there's also a system chunk, that 
means the initial chunk cannot be balanced even with an entirely empty 
filesystem, because there's not enough space to write a second 64 MiB 
chunk duped to 128 MiB.

Between that and the 256 MiB in dup mode size meaning under 128 MiB 
usable, and the fact that I routinely run and sometimes need to bisect 
pre-release kernels, I was routinely running out of space, then cleaning 
up, but not being able to do a full cleanup without a blow-away and new 
mkfs.btrfs, because I couldn't balance.

When I recently purchased the second pair of (now larger) ssds in ordered 
to put everything, including the media and backups that were previously 
still on spinning rust, on ssd, I redid the layout and made the /boots 
512 MiB, still mixed-bg dup mode.  That seems to have solved the problem, 
and I can now rebalance the first mkfs.btrfs-created mixed-bg chunk, as 
it's now small enough that it's less than half the filesystem even when 
duped.

Because it's now 512 MiB, however, I can't say for sure whether the 
previous problem with mkfs.btrfs creating an initial mixed-bg chunk of a 
quarter the 256 MiB filesystem size, so in dup mode it can't be balanced 
because it's half the total filesystem size and with the system chunk as 
well, the other half is partially used so there's no space to write the 
balance destination chunks, is fixed, or not.  What I can say is that the 
problem doesn't affect the new 512 MiB size, at least with btrfs-progs 
4.11.x, which is what I used to mkfs.btrfs the new layout.
Imran Geriskovan July 30, 2017, 2:54 p.m. UTC | #17
On 7/30/17, Duncan <1i5t5.duncan@cox.net> wrote:
>>> Also, all my btrfs are raid1 or dup for checksummed redundancy

>> Do you have any experience/advice/comment regarding
>> dup data on ssds?

> Very good question. =:^)

> Limited.  Most of my btrfs are raid1, with dup only used on the device-
> respective /boot btrfs (of which there are four, one on each of the two
> ssds that otherwise form the btrfs raid1 pairs, for each of the working
> and backup copy pairs -- I can use BIOS to select any of the four to
> boot), and those are all sub-GiB mixed-bg mode.

Is this a military or deep space device? ;)

> So all my dup experience is sub-GiB mixed-blockgroup mode.
>
> Within that limitation, my only btrfs problem has been that at my
> initially chosen size of 256 MiB, mkfs.btrfs at least used to create an
> initial data/metadata chunk of 64 MiB.  Remember, this is dup mode, so
> there's two of them = 128 MiB.  Because there's also a system chunk, that
> means the initial chunk cannot be balanced even with an entirely empty
> filesystem, because there's not enough space to write a second 64 MiB
> chunk duped to 128 MiB.

For /boot, I've also tried dup data.

But because of combinations of constraints you've mentioned,
I totally give-up trying to have a bullet proof /boot
as my poor laptop is not mission critical as your device and
as I do always have bootable backups and always carry
some bootable sdcards.

Perhaps that has something to do with me kicking
out all systemd, inits, initramfs, mkinitcpio, dracut, etc, etc.

Now the init on /boot is a "19 lines" shell script, including lines
for keymap, hdparm, crytpsetup. And let's not forget this is
possible by a custom kernel, its reliable buddy syslinux.

Interestingly my seach for reliability started with
"dup data" and ended up here. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan July 31, 2017, 4:53 a.m. UTC | #18
Imran Geriskovan posted on Sun, 30 Jul 2017 16:54:25 +0200 as excerpted:

> On 7/30/17, Duncan <1i5t5.duncan@cox.net> wrote:
>>>> Also, all my btrfs are raid1 or dup for checksummed redundancy
> 
>>> Do you have any experience/advice/comment regarding dup data on ssds?
> 
>> Very good question. =:^)
> 
>> Limited.  Most of my btrfs are raid1, with dup only used on the device-
>> respective /boot btrfs (of which there are four, one on each of the two
>> ssds that otherwise form the btrfs raid1 pairs, for each of the working
>> and backup copy pairs -- I can use BIOS to select any of the four to
>> boot), and those are all sub-GiB mixed-bg mode.
> 
> Is this a military or deep space device? ;)

Just happens to have four physical ssds, two pairs, with everything but 
/boot being paired btrfs raid1.  Because I wanted similar partition 
layout for ease of management, that's a /boot on each one, and because 
bios can only point to one at a time, that's four separate grub installs
[1], each of which is configured to load its own /boot.

While four is a bit much, three can certainly be very useful, because it 
allows a bad grub upgrade to be core-installed to one BIOS-boot 
partition, while allowing me to fat-finger point it to the wrong /boot on 
a second device destroying my ability to boot to it as well, and still 
have a third untouched to boot from.  The forth is simply bonus insurance 
on that, more by accident due to having two pair than because I really 
needed it.

A minimum of three /boots is also quite convenient for my kernel update 
routine, given I routinely test and sometimes bisect pre-release 
kernels.  The default/working /boot gets the prereleases with a release 
and stable fallback, the first backup the releases and a stable fallback, 
and the secondary backups get updated less frequently, generally when I'm 
doing a / backup cycle as well and there has been either a kernel config 
or system change substantial enough that I'm no longer confident the 
older kernels will work correctly with the updated system.

Of course the same general testing/release/stable /boot system works well 
for other related updates, say to the grub menu (I use grub2's bash-like 
scripting language directly, not the high level stuff which I find too 
difficult to tweak to my liking) or the initrd, which I attach to the 
individual kernels at build-time, so a tested kernel selection is a 
tested initramfs selection as well.

> For /boot, I've also tried dup data.
> 
> But because of combinations of constraints you've mentioned,
> I totally give-up trying to have a bullet proof /boot as my poor laptop
> is not mission critical as your device and as I do always have bootable
> backups and always carry some bootable sdcards.

When I complained about the 64-MiB default mixed-bg mode chunk size on a 
256 MiB filesystem being too big to allow balance in dup mode, a dev 
answered that in theory chunk sizes are supposed to be limited to 1/8 
filesystem size (down to something like a 16 MiB minimum chunk size I 
think, but might be 8 or 32), but something about my setup, likely the 
mixed-bg mode as it's less tested, was short-circuiting that, thus the 
quarter-fs-size 64 MiB chunk sizes, which he agreed didn't make much 
sense on a 256 MiB filesystem in dup mode.

He was able to duplicate the problem, and there seemed no disagreement is 
was a bug, but I'm not sure if mkfs.btrfs was ever patched to fix it, and 
of course now with the bigger half-gig filesystem the same 64-MiB initial 
chunk size is fine.

And my other quarter-gig btrfs, log, is raid1, quarter-gig per device, so 
I'd not see the problem there, mixed-mode or not.  (As mentioned in the 
footnote below, at least in this go-round it's not... more by accident 
than intent.)

Meanwhile, such bugs come with the territory when you're running what 
might be roughly compared at the commercial software level to late beta 
or rc level software, or even initial release, pre-service-release-1, 
level, which I'd argue is a more accurate btrfs comparison at this 
point.  As long as you stay within the known stable areas the danger of 
it eating your data is relatively small now, but the full feature set 
isn't there yet, and some of the features that are there are 
significantly less mature and stable than others.

> Perhaps that has something to do with me kicking out all systemd, inits,
> initramfs, mkinitcpio, dracut, etc, etc.
> 
> Now the init on /boot is a "19 lines" shell script, including lines for
> keymap, hdparm, crytpsetup. And let's not forget this is possible by a
> custom kernel, its reliable buddy syslinux.

FWIW...

I really like grub2, especially it's quite flexible bash-like scripting 
language (the higher level stuff intended for normal users just isn't 
flexible enough for me, so I need the scripting language anyway, and once 
I knew that, the higher level stuff only got in the way) and command line 
that allow all sorts of stuff like browsing for kernel commandline 
documentation at the boot prompt that I never imagined possible in a boot 
manager.

And after holding off for awhile, I'm now a cautious adopter and 
supporter of systemd in general, tho I don't use its solutions for 
/everything/ and don't like its extremely aggressive feature expansion.

And after resisting an initr* for years as unnecessary, I've been a 
reluctant adopter since a btrfs raid1 root effectively requires it 
(rootflags=device= doesn't seem to work, for whatever reason, or at least 
didn't when I initially converted to btrfs, so at least a limited initr* 
seems the only viable solution for a btrfs raid1 root).

And I'm using dracut for that, tho quite cut down from its default, with 
a monolithic kernel and only installing necessary dracut modules.

But particularly after the last dracut update pulled in kmod as a 
mandatory dep as it now links against its libs, despite my monolithic 
kernel built without module support, I've been considering similar initr* 
alternatives, including hand-rolling my own initr* build scripts.

Because I'm still not happy having to run an initr* at all, especially 
since there's more "magic" there than I'm particularly comfortable with 
since I like to grok the boot and thus potential recovery process better 
than I do this, and dracut was just the most convenient option at the 
time.

But kmod isn't a /huge/ dep, particularly with the executables and docs 
install-masked so it's only the library, headers and *.pc config file 
installed, and the current dracut solution works /reasonably/ well, so 
finding/creating an alternative isn't particularly high on my priority 
list, and I'll probably never do it unless dracut suddenly decides some 
of its other modules are going to need mandatory deps, or something else 
radically changes the current fragile balance and I really do need that 
currently lacking initr* grok.

> Interestingly my seach for reliability started with "dup data" and ended
> up here. :)

=:^)

---
[1] Grub and partition layout:  I install grub-core (i386-pc) to a raw 
GPT legacy BIOS boot partition.  While this only requires a partition 
size of about a third of a MiB, I use gdisk's default 1 MiB alignment and 
the first MiB is the GTP and the alignment gap, so this first BIOS boot 
partition starts at 1 MiB and must be a whole MiB unit in size.  Because 
I wanted plenty of room, however, and wanted additional partitions a 
minimum of 4 MiB aligned, I configured a 3 MiB BIOS boot partition for 
grub to use, thus accomplishing that 4 MiB alignment for further 
partitions.

The second partition is a currently unused GPT EFI partition for forward 
compatibility, 252 MiB in size so further partitions are quarter-GiB 
aligned.

The third partition is the /boot partition we've been discussing, a half 
GiB in size, thus ending at 3/4 GiB.  It's my only btrfs mixed-mode dup 
in the layout, so a half gig in size but a quarter gig usable.  As 
mentioned, with four physical ssds that's a total of four /boots, each 
pointed at by the grub-core installation in the first partition on the 
corresponding ssd.

Partition 4 is the log partition, a quarter GiB in size as log rotation 
keeps typical usage under 50 MiB, but the quarter gig size means it ends 
on the 1 GiB boundary and further partitions are GiB aligned.  In the 
last layout generation this was a half gig and /boot a quarter gig, but I 
decided /boot could use the extra quarter gig more than log so I traded 
sizes.  This, like all further partitions, is btrfs raid1.  I intended to 
make it mixed-bg mode, as it was in the previous generation layout, but 
forgot the mkfs.btrfs switch for that and it no longer defaults to mixed 
at under a gig, so I got standard mode.  Never-the-less, with raid1 
instead of dup, and low normal usage, the chunk size is small enough that 
balance shouldn't be an issue, and if it is I can always blow it away and 
recreate in mixed mode.

All further partitions are gig-aligned btrfs raid1 pair-device, three 
copies, working/0 and backups 1 and 2, on two separate pairs of ssds.  
The older pair is 256GB/238GiB with the backup/1 copy, the newer pair is 
1TB/931GiB with working/0 and backup/2.  The partition size and layout is 
identical on all four thru the sub-GiB and first copy, with the second 
copy on the larger pair being a same-sequence same-size repeat of the 
first, beyond the non-duplicated sub-GiB, of course.  So as long as the 
GPT on one of the four remains intact and bootable, I can easily recreate 
the other three.
Imran Geriskovan July 31, 2017, 8:32 p.m. UTC | #19
>>>> Do you have any experience/advice/comment regarding dup data on ssds?

>>> Very good question. =:^)

>> Now the init on /boot is a "19 lines" shell script, including lines for
>> keymap, hdparm, crytpsetup. And let's not forget this is possible by a
>> custom kernel and its reliable buddy syslinux.
>
> FWIW...
> And I'm using dracut for that, tho quite cut down from its default, with
> a monolithic kernel and only installing necessary dracut modules.

Just create minimal bootable /boot for running below init.
(Your initramfs/rd is a bloated and packaged version of
this anyway.) Kick the rest. Since you a have your own
kernel you are not far away from it.


#!/bin/sh
# This is actually busybox ash or hush. Cant remember now.
# You may compile/customize your busybox as well. Easy.

mount proc /proc -t proc
mount sys  /sys  -t sysfs
mount run  /run  -t tmpfs
mkdir /dev/pts /dev/shm /run/lock
mount devpts /dev/pts -t devpts &
mount shm    /dev/shm -t tmpfs &
mount -o remount,rw,noatime / &

# '&' is for backgrounding/parallel_execution.
# Use responsibly double checking its side effects
# depending on your setup.

hdparm -B 254 /dev/sda &
loadkmap < /boot/trq.bkmap

cryptsetup -T 10 luksOpen /dev/sdXX sdXX
mount /dev/mapper/sdXX /mnt/new_root -t btrfs -o noatime,compress=lzo

cd /mnt/new_root
mount --move /dev  ./dev
mount --move /proc ./proc
mount --move /sys  ./sys
mount --move /run  ./run
pivot_root . boot

exec chroot . busybox init
# Jump to your real roots init. Whatever it may be.


> But particularly after the last dracut update pulled in kmod as a
> mandatory dep as it now links against its libs, despite my monolithic
> kernel built without module support, I've been considering similar initr*
> alternatives, including hand-rolling my own initr* build scripts.
>
> Because I'm still not happy having to run an initr* at all, especially
> since there's more "magic" there than I'm particularly comfortable with
> since I like to grok the boot and thus potential recovery process better
> than I do this, and dracut was just the most convenient option at the
> time.

>> Interestingly my seach for reliability started with "dup data" and ended
>> up here. :)
> =:^)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ivan Sizov July 31, 2017, 9:07 p.m. UTC | #20
2017-07-09 10:57 GMT+03:00 Martin Steigerwald <martin@lichtvoll.de>:
> Hello Marc.
>
> Marc MERLIN - 08.07.17, 21:34:
>> Sigh,
>>
>> This is now the 3rd filesystem I have (on 3 different machines) that is
>> getting corruption of some kind (on 4.11.6).
>
> Anyone else getting corruptions with 4.11?
Yes, a lot. There are at least 3 cases, probably I've missed something.
https://www.spinics.net/lists/linux-btrfs/msg67177.html
https://www.spinics.net/lists/linux-btrfs/msg67681.html
https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing/369275

If an additional debug info is needed, I'm ready to provide it.
Marc MERLIN July 31, 2017, 9:17 p.m. UTC | #21
On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote:
> 2017-07-09 10:57 GMT+03:00 Martin Steigerwald <martin@lichtvoll.de>:
> > Hello Marc.
> >
> > Marc MERLIN - 08.07.17, 21:34:
> >> Sigh,
> >>
> >> This is now the 3rd filesystem I have (on 3 different machines) that is
> >> getting corruption of some kind (on 4.11.6).
> >
> > Anyone else getting corruptions with 4.11?
> Yes, a lot. There are at least 3 cases, probably I've missed something.
> https://www.spinics.net/lists/linux-btrfs/msg67177.html
> https://www.spinics.net/lists/linux-btrfs/msg67681.html
> https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing/369275

Indeed. My main server is happy back on 4.9.36 and while my laptop is
stuck on 4.11 due to other kernel issues that prevent me from going back
to 4.9, it only corrupted a single filesystem so far, and no other ones
that I've noticed yet.
Hopefully that will hold :-/

Marc
Ivan Sizov July 31, 2017, 9:39 p.m. UTC | #22
2017-08-01 0:17 GMT+03:00 Marc MERLIN <marc@merlins.org>:
> On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote:
>> 2017-07-09 10:57 GMT+03:00 Martin Steigerwald <martin@lichtvoll.de>:
>> > Hello Marc.
>> >
>> > Marc MERLIN - 08.07.17, 21:34:
>> >> Sigh,
>> >>
>> >> This is now the 3rd filesystem I have (on 3 different machines) that is
>> >> getting corruption of some kind (on 4.11.6).
>> >
>> > Anyone else getting corruptions with 4.11?
>> Yes, a lot. There are at least 3 cases, probably I've missed something.
>> https://www.spinics.net/lists/linux-btrfs/msg67177.html
>> https://www.spinics.net/lists/linux-btrfs/msg67681.html
>> https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing/369275
>
> Indeed. My main server is happy back on 4.9.36 and while my laptop is
> stuck on 4.11 due to other kernel issues that prevent me from going back
> to 4.9, it only corrupted a single filesystem so far, and no other ones
> that I've noticed yet.
> Hopefully that will hold :-/
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

I want to try mounting and checking FS under Live images with
different kernels tomorrow. Today's Fedora Rawhide image seems to be
built incorrectly. Can you advice me where to get a fresh live image
with 4.12 kernel (it's not important which distro that will be)?
Justin Maggard July 31, 2017, 10 p.m. UTC | #23
On Mon, Jul 31, 2017 at 2:17 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote:
>> 2017-07-09 10:57 GMT+03:00 Martin Steigerwald <martin@lichtvoll.de>:
>> > Hello Marc.
>> >
>> > Marc MERLIN - 08.07.17, 21:34:
>> >> Sigh,
>> >>
>> >> This is now the 3rd filesystem I have (on 3 different machines) that is
>> >> getting corruption of some kind (on 4.11.6).
>> >
>> > Anyone else getting corruptions with 4.11?
>> Yes, a lot. There are at least 3 cases, probably I've missed something.
>> https://www.spinics.net/lists/linux-btrfs/msg67177.html
>> https://www.spinics.net/lists/linux-btrfs/msg67681.html
>> https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing/369275
>
> Indeed. My main server is happy back on 4.9.36 and while my laptop is
> stuck on 4.11 due to other kernel issues that prevent me from going back
> to 4.9, it only corrupted a single filesystem so far, and no other ones
> that I've noticed yet.
> Hopefully that will hold :-/
>

Marc, do you have quotas enabled?  IIRC, you're a send/receive user.
The combination of quotas and btrfs receive can corrupt your
filesystem, as shown by the xfstest I sent to the list a little while
ago.

-Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan Aug. 1, 2017, 1:36 a.m. UTC | #24
Imran Geriskovan posted on Mon, 31 Jul 2017 22:32:39 +0200 as excerpted:

>>> Now the init on /boot is a "19 lines" shell script, including lines
>>> for keymap, hdparm, crytpsetup. And let's not forget this is possible
>>> by a custom kernel and its reliable buddy syslinux.
>>
>> FWIW...
>> And I'm using dracut for that, tho quite cut down from its default,
>> with a monolithic kernel and only installing necessary dracut modules.
> 
> Just create minimal bootable /boot for running below init.
> (Your initramfs/rd is a bloated and packaged version of this anyway.)
> Kick the rest. Since you a have your own kernel you are not far away
> from it.

Thanks.  You just solved my primary problem of needing to take the time 
to actually research all the steps and in what order I needed to do them, 
for a hand-rolled script. =:^)

Unfortunately, while I've been laid-up the last ~5 days due to a twisted 
knee and have been spending more time on the lists, etc, and would have 
loved to spend a day or so testing and setting this up, I'm back to work 
tomorrow, so I've no idea when I'll actually get to play with this.

But meanwhile, I'm saving your message for reference when the time 
comes.  It should be /very/ useful!  =:^)
Marc MERLIN Aug. 1, 2017, 6:38 a.m. UTC | #25
On Mon, Jul 31, 2017 at 03:00:53PM -0700, Justin Maggard wrote:
> Marc, do you have quotas enabled?  IIRC, you're a send/receive user.
> The combination of quotas and btrfs receive can corrupt your
> filesystem, as shown by the xfstest I sent to the list a little while
> ago.

Thanks for checking. I do not use quota given the problems I had with
them early on over 2y ago.

Marc
Imran Geriskovan Aug. 1, 2017, 3:18 p.m. UTC | #26
On 8/1/17, Duncan <1i5t5.duncan@cox.net> wrote:
> Imran Geriskovan posted on Mon, 31 Jul 2017 22:32:39 +0200 as excerpted:
>>>> Now the init on /boot is a "19 lines" shell script, including lines
>>>> for keymap, hdparm, crytpsetup. And let's not forget this is possible
>>>> by a custom kernel and its reliable buddy syslinux.

>>> And I'm using dracut for that, tho quite cut down from its default,
>>> with a monolithic kernel and only installing necessary dracut modules.

>> Just create minimal bootable /boot for running below init.
>> (Your initramfs/rd is a bloated and packaged version of this anyway.)
>> Kick the rest. Since you a have your own kernel you are not far away
>> from it.

> Thanks.  You just solved my primary problem of needing to take the time
> to actually research all the steps and in what order I needed to do them,
> for a hand-rolled script. =:^)

It's just a minimal one. But it is a good start. For possible extensions
extract your initramfs and explore it. Dracut is bloated. Try mkinitcpio.

Once your have your self hosting bootmng, kernel, modules, /boot, init, etc
chain, you'll be shocked to realize you have been spending so much time for
that bullshit while trying to keep them up..

Get to this point in the shortest possible time. Save your precious
time. And reclaim your systems reliability.

For X, you'll still need udev or eudev.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ivan Sizov Aug. 1, 2017, 4:41 p.m. UTC | #27
2017-08-01 0:39 GMT+03:00 Ivan Sizov <sivan606@gmail.com>:
> 2017-08-01 0:17 GMT+03:00 Marc MERLIN <marc@merlins.org>:
>> On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote:
>>> 2017-07-09 10:57 GMT+03:00 Martin Steigerwald <martin@lichtvoll.de>:
>>> > Hello Marc.
>>> >
>>> > Marc MERLIN - 08.07.17, 21:34:
>>> >> Sigh,
>>> >>
>>> >> This is now the 3rd filesystem I have (on 3 different machines) that is
>>> >> getting corruption of some kind (on 4.11.6).
>>> >
>>> > Anyone else getting corruptions with 4.11?
>>> Yes, a lot. There are at least 3 cases, probably I've missed something.
>>> https://www.spinics.net/lists/linux-btrfs/msg67177.html
>>> https://www.spinics.net/lists/linux-btrfs/msg67681.html
>>> https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing/369275
>>
>> Indeed. My main server is happy back on 4.9.36 and while my laptop is
>> stuck on 4.11 due to other kernel issues that prevent me from going back
>> to 4.9, it only corrupted a single filesystem so far, and no other ones
>> that I've noticed yet.
>> Hopefully that will hold :-/
>>
>> Marc
>> --
>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> Microsoft is to operating systems ....
>>                                       .... what McDonalds is to gourmet cooking
>> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
>
> I want to try mounting and checking FS under Live images with
> different kernels tomorrow. Today's Fedora Rawhide image seems to be
> built incorrectly. Can you advice me where to get a fresh live image
> with 4.12 kernel (it's not important which distro that will be)?
>
> --
> Ivan Sizov
Mounting problem persists:
on 4.13.0 with btrfs-progs v4.11.1 (latest Fedora Rawhide Live)
on 4.10.0 with btrfs-progs v4.9.1 (Ubuntu 17.04 Live)
on 4.9.0 with btrfs-progs v 4.7.3 (Debian 9 Stretch Live)
"btrfs check --readonly" also gives the same output on 4.11, 4.10 and 4.9.

Marc, how did you roll back and fix those errors?
diff mbox

Patch

From lufq.fnst@cn.fujitsu.com Mon Jun 26 03:37:41 2017
Received: from [59.151.112.132] (port=50126 helo=heian.cn.fujitsu.com)
	by mail1.merlins.org with esmtp (Exim 4.87 #1)
	id 1dPROj-0001kT-Tq
	for <marc@merlins.org>; Mon, 26 Jun 2017 03:37:41 -0700
X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; 
   d="scan'208";a="20491848"
Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5])
  by heian.cn.fujitsu.com with ESMTP; 26 Jun 2017 18:37:30 +0800
Received: from G08CNEXCHPEKD02.g08.fujitsu.local (unknown [10.167.33.83])
	by cn.fujitsu.com (Postfix) with ESMTP id B3C5047E64D5;
	Mon, 26 Jun 2017 18:37:30 +0800 (CST)
Received: from lufq.5F.lufq.5F (10.167.225.63) by
 G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server
 (TLS) id 14.3.319.2; Mon, 26 Jun 2017 18:37:32 +0800
From: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
CC: <marc@merlins.org>
Date: Mon, 26 Jun 2017 18:37:25 +0800
Message-ID: <20170626103727.8945-2-lufq.fnst@cn.fujitsu.com>
X-Mailer: git-send-email 2.13.1
In-Reply-To: <20170626103727.8945-1-lufq.fnst@cn.fujitsu.com>
References: <20170626103727.8945-1-lufq.fnst@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain
X-Originating-IP: [10.167.225.63]
X-yoursite-MailScanner-ID: B3C5047E64D5.AC56F
X-yoursite-MailScanner: Found to be clean
X-yoursite-MailScanner-From: lufq.fnst@cn.fujitsu.com
X-Broken-Reverse-DNS: no host name for IP address 59.151.112.132
X-SA-Exim-Connect-IP: 59.151.112.132
X-SA-Exim-Rcpt-To: marc@merlins.org
X-SA-Exim-Mail-From: lufq.fnst@cn.fujitsu.com
X-Spam-Checker-Version: SpamAssassin 3.4.1-mmrules_20121111 (2015-04-28) on
	magic.merlins.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=7.0 tests=BAYES_00,GREYLIST_ISWHITE,
	RDNS_NONE autolearn=ham autolearn_force=no version=3.4.1-mmrules_20121111
X-Spam-Report: 
	* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
	*      [score: 0.0000]
	*  0.8 RDNS_NONE Delivered to internal network by a host with no rDNS
	* -1.5 GREYLIST_ISWHITE The incoming server has been whitelisted for this
	*      receipient and sender
Subject: [PATCH v3 2/4] btrfs-progs: lowmem check: Fix false alert about referencer count mismatch
X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000)
X-SA-Exim-Scanned: Yes (on mail1.merlins.org)
Status: O
Content-Length: 915
Lines: 29

The normal back reference counting doesn't care about the extent referred
by the extent data in the shared leaf. The check_extent_data_backref
function need to skip the leaf that owner mismatch with the root_id.

Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
---
 cmds-check.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 70d2b7f2..f42968cd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -10692,7 +10692,8 @@  static int check_extent_data_backref(struct btrfs_fs_info *fs_info,
 		leaf = path.nodes[0];
 		slot = path.slots[0];
 
-		if (slot >= btrfs_header_nritems(leaf))
+		if (slot >= btrfs_header_nritems(leaf) ||
+		    btrfs_header_owner(leaf) != root_id)
 			goto next;
 		btrfs_item_key_to_cpu(leaf, &key, slot);
 		if (key.objectid != objectid || key.type != BTRFS_EXTENT_DATA_KEY)
-- 
2.13.1