xfs_repair: junk leaf attribute if count == 0
diff mbox

Message ID 725190d9-6db0-4f6c-628b-76f2dca3071f@redhat.com
State Accepted
Headers show

Commit Message

Eric Sandeen Dec. 8, 2016, 6:06 p.m. UTC
We have recently seen a case where, during log replay, the
attr3 leaf verifier reported corruption when encountering a
leaf attribute with a count of 0 in the header.

We chalked this up to a transient state when a shortform leaf
was created, the attribute didn't fit, and we promoted the
(empty) attribute to the larger leaf form.

I've recently been given a metadump of unknown provenance which actually
contains a leaf attribute with count 0 on disk.  This causes the
verifier to fire every time xfs_repair is run:

 Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000

If this 0-count state is detected, we should just junk the leaf, same
as we would do if the count was too high.  With this change, we now
remedy the problem:

 Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
 bad attribute count 0 in attr block 0, inode 12587828
 problem with attribute contents in inode 12587828
 clearing inode 12587828 attributes
 correcting nblocks for inode 12587828, was 2 - counted 1

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster Dec. 12, 2016, 6:36 p.m. UTC | #1
On Thu, Dec 08, 2016 at 12:06:03PM -0600, Eric Sandeen wrote:
> We have recently seen a case where, during log replay, the
> attr3 leaf verifier reported corruption when encountering a
> leaf attribute with a count of 0 in the header.
> 
> We chalked this up to a transient state when a shortform leaf
> was created, the attribute didn't fit, and we promoted the
> (empty) attribute to the larger leaf form.
> 
> I've recently been given a metadump of unknown provenance which actually
> contains a leaf attribute with count 0 on disk.  This causes the
> verifier to fire every time xfs_repair is run:
> 
>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
> 
> If this 0-count state is detected, we should just junk the leaf, same
> as we would do if the count was too high.  With this change, we now
> remedy the problem:
> 
>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
>  bad attribute count 0 in attr block 0, inode 12587828
>  problem with attribute contents in inode 12587828
>  clearing inode 12587828 attributes
>  correcting nblocks for inode 12587828, was 2 - counted 1
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

> 
> diff --git a/repair/attr_repair.c b/repair/attr_repair.c
> index 40cb5f7..b855a10 100644
> --- a/repair/attr_repair.c
> +++ b/repair/attr_repair.c
> @@ -593,7 +593,8 @@ process_leaf_attr_block(
>  	stop = xfs_attr3_leaf_hdr_size(leaf);
>  
>  	/* does the count look sorta valid? */
> -	if (leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
> +	if (!leafhdr.count ||
> +	    leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
>  						mp->m_sb.sb_blocksize) {
>  		do_warn(
>  	_("bad attribute count %d in attr block %u, inode %" PRIu64 "\n"),
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Dec. 13, 2016, 10:52 a.m. UTC | #2
Hello,
should this patch possibly fix errors i reported in this thread?
https://www.spinics.net/lists/linux-xfs/msg01728.html

Is it safe to test it? (i do have backups)

Thanks,
Libor

On čtvrtek 8. prosince 2016 12:06:03 CET Eric Sandeen wrote:
> We have recently seen a case where, during log replay, the
> attr3 leaf verifier reported corruption when encountering a
> leaf attribute with a count of 0 in the header.
> 
> We chalked this up to a transient state when a shortform leaf
> was created, the attribute didn't fit, and we promoted the
> (empty) attribute to the larger leaf form.
> 
> I've recently been given a metadump of unknown provenance which actually
> contains a leaf attribute with count 0 on disk.  This causes the
> verifier to fire every time xfs_repair is run:
> 
>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
> 
> If this 0-count state is detected, we should just junk the leaf, same
> as we would do if the count was too high.  With this change, we now
> remedy the problem:
> 
>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
>  bad attribute count 0 in attr block 0, inode 12587828
>  problem with attribute contents in inode 12587828
>  clearing inode 12587828 attributes
>  correcting nblocks for inode 12587828, was 2 - counted 1
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/repair/attr_repair.c b/repair/attr_repair.c
> index 40cb5f7..b855a10 100644
> --- a/repair/attr_repair.c
> +++ b/repair/attr_repair.c
> @@ -593,7 +593,8 @@ process_leaf_attr_block(
>  	stop = xfs_attr3_leaf_hdr_size(leaf);
>  
>  	/* does the count look sorta valid? */
> -	if (leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
> +	if (!leafhdr.count ||
> +	    leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
>  						mp->m_sb.sb_blocksize) {
>  		do_warn(
>  	_("bad attribute count %d in attr block %u, inode %" PRIu64 "\n"),
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 13, 2016, 4:04 p.m. UTC | #3
On 12/13/16 4:52 AM, Libor Klepáč wrote:
> Hello,
> should this patch possibly fix errors i reported in this thread?
> https://www.spinics.net/lists/linux-xfs/msg01728.html
> 
> Is it safe to test it? (i do have backups)

It should be safe, yes.

(you could always run xfs_repair -n first to be extra careful).

Were those errors during mount/replay, though, or was it when the
filesystem was up and running?

I ask because I also sent a patch to ignore these empty attributes
in the verifier during log replay.

FWIW, Only some of your reported buffers look "empty" though, the one
at 514098.682726 may have had something else wrong.

Anyway, yes, probably worth checking with xfs_repair (-n) with this
patch added.  Let us know what you find! :)

-Eric

 
> Thanks,
> Libor
> 
> On čtvrtek 8. prosince 2016 12:06:03 CET Eric Sandeen wrote:
>> We have recently seen a case where, during log replay, the
>> attr3 leaf verifier reported corruption when encountering a
>> leaf attribute with a count of 0 in the header.
>>
>> We chalked this up to a transient state when a shortform leaf
>> was created, the attribute didn't fit, and we promoted the
>> (empty) attribute to the larger leaf form.
>>
>> I've recently been given a metadump of unknown provenance which actually
>> contains a leaf attribute with count 0 on disk.  This causes the
>> verifier to fire every time xfs_repair is run:
>>
>>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
>>
>> If this 0-count state is detected, we should just junk the leaf, same
>> as we would do if the count was too high.  With this change, we now
>> remedy the problem:
>>
>>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
>>  bad attribute count 0 in attr block 0, inode 12587828
>>  problem with attribute contents in inode 12587828
>>  clearing inode 12587828 attributes
>>  correcting nblocks for inode 12587828, was 2 - counted 1
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/repair/attr_repair.c b/repair/attr_repair.c
>> index 40cb5f7..b855a10 100644
>> --- a/repair/attr_repair.c
>> +++ b/repair/attr_repair.c
>> @@ -593,7 +593,8 @@ process_leaf_attr_block(
>>  	stop = xfs_attr3_leaf_hdr_size(leaf);
>>  
>>  	/* does the count look sorta valid? */
>> -	if (leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
>> +	if (!leafhdr.count ||
>> +	    leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
>>  						mp->m_sb.sb_blocksize) {
>>  		do_warn(
>>  	_("bad attribute count %d in attr block %u, inode %" PRIu64 "\n"),
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Dec. 15, 2016, 8:48 p.m. UTC | #4
Hello,

On úterý 13. prosince 2016 10:04:27 CET Eric Sandeen wrote:
> On 12/13/16 4:52 AM, Libor Klep�? wrote:
> > Hello,
> > should this patch possibly fix errors i reported in this thread?
> > https://www.spinics.net/lists/linux-xfs/msg01728.html
> > 
> > Is it safe to test it? (i do have backups)
> 
> It should be safe, yes.
>
ok,  will try it soon(ish) with machines named vps1 and vps3 in my mails. vps2 
is problematic to shutdown and test.
 
> (you could always run xfs_repair -n first to be extra careful).
> 
> Were those errors during mount/replay, though, or was it when the
> filesystem was up and running?
> 
It was always when running, some errors were just logged, some caused fs 
shutdown

> I ask because I also sent a patch to ignore these empty attributes
> in the verifier during log replay.
> 

> FWIW, Only some of your reported buffers look "empty" though, the one
> at 514098.682726 may have had something else wrong.
> 
Ok, i see, it has some "data" in it

> Anyway, yes, probably worth checking with xfs_repair (-n) with this
> patch added.  Let us know what you find! :)
> 
> -Eric
> 

Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 24, 2016, 5:50 p.m. UTC | #5
On 12/21/16 2:25 AM, Libor Klepáč wrote:
> Hello,
> 
> On úterý 13. prosince 2016 10:04:27 CET Eric Sandeen wrote:
>> On 12/13/16 4:52 AM, Libor Klep�? wrote:
>>> Hello,
>>> should this patch possibly fix errors i reported in this thread?
>>> https://www.spinics.net/lists/linux-xfs/msg01728.html
>>>
>>> Is it safe to test it? (i do have backups)
>>
>> It should be safe, yes.
>>
>> (you could always run xfs_repair -n first to be extra careful).
>>
>> Were those errors during mount/replay, though, or was it when the
>> filesystem was up and running?
>>
>> I ask because I also sent a patch to ignore these empty attributes
>> in the verifier during log replay.
> 
> Is that patch in 4.8.11 kernel?

nope

> I ask, because i rebooted machine from this email
> https://www.spinics.net/lists/linux-xfs/msg02672.html
> 
> with kernel 4.8.11-1~bpo8+1 (from debian)
> mount was clean 
> [    3.220692] SGI XFS with ACLs, security attributes, realtime, no debug enabled
> [    3.222135] XFS (dm-2): Mounting V4 Filesystem
> [    3.284697] XFS (dm-2): Ending clean mount
> 
> 
> and i ran xfs_repair -n -v (xfsprogs 4.8.0 without your patch) and it came clean I think
> ---------------------------------

older repair did not detect the problem, that was the point
of the patch.

-Eric

> #xfs_repair -n -v /dev/dm-2
> Phase 1 - find and verify superblock...
>         - block cache size set to 758064 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 5035 tail block 5035
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> 
>         XFS_REPAIR Summary    Tue Dec 20 06:01:14 2016
> 
> Phase           Start           End             Duration
> Phase 1:        12/20 05:49:02  12/20 05:49:02
> Phase 2:        12/20 05:49:02  12/20 05:49:02
> Phase 3:        12/20 05:49:02  12/20 05:59:24  10 minutes, 22 seconds
> Phase 4:        12/20 05:59:24  12/20 06:00:17  53 seconds
> Phase 5:        Skipped
> Phase 6:        12/20 06:00:17  12/20 06:01:13  56 seconds
> Phase 7:        12/20 06:01:13  12/20 06:01:14  1 second
> 
> Total run time: 12 minutes, 12 seconds
> ---------------------------------
> 
> then i ran metadump and then i ran out of time of my repair window ;)
> 
> 
> 
> Libor
> 
> 
>>
>> FWIW, Only some of your reported buffers look "empty" though, the one
>> at 514098.682726 may have had something else wrong.
>>
>> Anyway, yes, probably worth checking with xfs_repair (-n) with this
>> patch added.  Let us know what you find! :)
>>
>> -Eric
>>
>>  
>>> Thanks,
>>> Libor
>>>
>>> On ?tvrtek 8. prosince 2016 12:06:03 CET Eric Sandeen wrote:
>>>> We have recently seen a case where, during log replay, the
>>>> attr3 leaf verifier reported corruption when encountering a
>>>> leaf attribute with a count of 0 in the header.
>>>>
>>>> We chalked this up to a transient state when a shortform leaf
>>>> was created, the attribute didn't fit, and we promoted the
>>>> (empty) attribute to the larger leaf form.
>>>>
>>>> I've recently been given a metadump of unknown provenance which actually
>>>> contains a leaf attribute with count 0 on disk.  This causes the
>>>> verifier to fire every time xfs_repair is run:
>>>>
>>>>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
>>>>
>>>> If this 0-count state is detected, we should just junk the leaf, same
>>>> as we would do if the count was too high.  With this change, we now
>>>> remedy the problem:
>>>>
>>>>  Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
>>>>  bad attribute count 0 in attr block 0, inode 12587828
>>>>  problem with attribute contents in inode 12587828
>>>>  clearing inode 12587828 attributes
>>>>  correcting nblocks for inode 12587828, was 2 - counted 1
>>>>
>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>>> ---
>>>>
>>>> diff --git a/repair/attr_repair.c b/repair/attr_repair.c
>>>> index 40cb5f7..b855a10 100644
>>>> --- a/repair/attr_repair.c
>>>> +++ b/repair/attr_repair.c
>>>> @@ -593,7 +593,8 @@ process_leaf_attr_block(
>>>>  	stop = xfs_attr3_leaf_hdr_size(leaf);
>>>>  
>>>>  	/* does the count look sorta valid? */
>>>> -	if (leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
>>>> +	if (!leafhdr.count ||
>>>> +	    leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
>>>>  						mp->m_sb.sb_blocksize) {
>>>>  		do_warn(
>>>>  	_("bad attribute count %d in attr block %u, inode %" PRIu64 "\n"),
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 
> 
> --------
> [1] mailto:libor.klepac@bcom.cz
> [2] tel:+420377457676
> [3] http://www.bcom.cz
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Jan. 31, 2017, 8:03 a.m. UTC | #6
Hello,
sorry for late reply. It didn't crash since than and i forgot and moved on to another tasks.

Yesterday it crashed on one of machines (running 4.8.11)
-------------------------
Jan 30 07:18:13 vps2 kernel: [5881831.379547] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40
Jan 30 07:18:13 vps2 kernel: [5881831.381721] XFS (dm-2): Unmount and run xfs_repair
Jan 30 07:18:13 vps2 kernel: [5881831.382750] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Jan 30 07:18:13 vps2 kernel: [5881831.387810] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8
Jan 30 07:26:02 vps2 kernel: [5882300.524528] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12645ef8
Jan 30 07:26:02 vps2 kernel: [5882300.525993] XFS (dm-2): Unmount and run xfs_repair
Jan 30 07:26:02 vps2 kernel: [5882300.526539] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Jan 30 07:26:02 vps2 kernel: [5882300.529224] XFS (dm-2): metadata I/O error: block 0x12645ef8 ("xfs_trans_read_buf_map") error 117 numblks 8
Jan 30 10:00:27 vps2 kernel: [5891564.682483] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x127b5578
Jan 30 10:00:27 vps2 kernel: [5891564.683962] XFS (dm-2): Unmount and run xfs_repair
Jan 30 10:00:27 vps2 kernel: [5891564.684536] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Jan 30 10:00:27 vps2 kernel: [5891564.687223] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1250 of file /build/linux-lVEVrl/linux-4.7.8/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06747f2
Jan 30 10:00:27 vps2 kernel: [5891564.687230] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Jan 30 10:00:27 vps2 kernel: [5891564.687778] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

and later 
Jan 30 21:10:31 vps2 kernel: [39747.917831] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24c17ba8
Jan 30 21:10:31 vps2 kernel: [39747.918130] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8
-------------------------

I have scheduled repair on today, all these blocks were repaired using xfsprogs 4.9.0
Kernel is now 4.8.15

-------------------------
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
bad attribute count 0 in attr block 0, inode 1074268922
problem with attribute contents in inode 1074268922
clearing inode 1074268922 attributes
correcting nblocks for inode 1074268922, was 1 - counted 0
Metadata corruption detected at xfs_attr3_leaf block 0x127b5578/0x1000
bad attribute count 0 in attr block 0, inode 1077334032
problem with attribute contents in inode 1077334032
clearing inode 1077334032 attributes
correcting nblocks for inode 1077334032, was 1 - counted 0
Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
bad attribute count 0 in attr block 0, inode 1093437859
problem with attribute contents in inode 1093437859
clearing inode 1093437859 attributes
correcting nblocks for inode 1093437859, was 1 - counted 0
        - agno = 2
Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
bad attribute count 0 in attr block 0, inode 2147673775
problem with attribute contents in inode 2147673775
clearing inode 2147673775 attributes
correcting nblocks for inode 2147673775, was 1 - counted 0
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
bad attribute format 1 in inode 1074268922, resetting value
bad attribute format 1 in inode 1077334032, resetting value
bad attribute format 1 in inode 1093437859, resetting value
        - agno = 2
bad attribute format 1 in inode 2147673775, resetting value
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
-------------------------

Thank you very much for patch, it has done it's work

Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Feb. 1, 2017, 12:48 p.m. UTC | #7
Hello,
we tried also on vps1 reported here in bottom of email
https://www.spinics.net/lists/linux-xfs/msg01728.html
and vps3 from this email
https://www.spinics.net/lists/linux-xfs/msg02672.html

Both came clean. Does it mean, that corruption was really only in memory 
and did not made it to disks?
Both machines are on 4.8.15 and xfsprogs 4.9.0

#root@vps3 # xfs_repair /dev/mapper/vg2Disk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - quota info will be regenerated on next quota mount.
done

---------------------------------
#root@vps1:~# xfs_repair /dev/mapper/vgVPS1Disk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
-------------------------

Thanks, 
Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Feb. 1, 2017, 10:49 p.m. UTC | #8
On 2/1/17 6:48 AM, Libor Klepáč wrote:
> Hello,
> we tried also on vps1 reported here in bottom of email
> https://www.spinics.net/lists/linux-xfs/msg01728.html
> and vps3 from this email
> https://www.spinics.net/lists/linux-xfs/msg02672.html
> 
> Both came clean. Does it mean, that corruption was really only in memory 
> and did not made it to disks?

The point of the xfs_repair patch was to fix the problem
on-disk.

I'm not sure what steps your machines have been through, so I
can't really speak to their situation, but assuming the repairs
below were with the patch in place, it appears that they are
clean now.

-Eric

> Both machines are on 4.8.15 and xfsprogs 4.9.0
> 
> #root@vps3 # xfs_repair /dev/mapper/vg2Disk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Note - quota info will be regenerated on next quota mount.
> done
> 
> ---------------------------------
> #root@vps1:~# xfs_repair /dev/mapper/vgVPS1Disk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> -------------------------
> 
> Thanks, 
> Libor
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Feb. 2, 2017, 8:35 a.m. UTC | #9
Hi,

On středa 1. února 2017 16:49:49 CET Eric Sandeen wrote:
> On 2/1/17 6:48 AM, Libor Klepáč wrote:
> > Hello,
> > we tried also on vps1 reported here in bottom of email
> > https://www.spinics.net/lists/linux-xfs/msg01728.html
> > and vps3 from this email
> > https://www.spinics.net/lists/linux-xfs/msg02672.html
> > 
> > Both came clean. Does it mean, that corruption was really only in memory 
> > and did not made it to disks?
> 
> The point of the xfs_repair patch was to fix the problem
> on-disk.
Yes, i understand that

> 
> I'm not sure what steps your machines have been through, so I
> can't really speak to their situation, but assuming the repairs
> below were with the patch in place, it appears that they are
> clean now.
> 
Ok, seems fair ;)

> -Eric

Thanks,
Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Feb. 22, 2017, 11:42 a.m. UTC | #10
Hi,
it happened again on one machine vps3 from last mail, which had clean xfs_repair run
It's running kernel 4.9.0-0.bpo.1-amd64 (so it's 4.9.2) since 6. Feb. It was upgraded from 4.8.15.

Error was
Feb 22 11:04:21 vps3 kernel: [1316281.466922] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0xa000718
Feb 22 11:04:21 vps3 kernel: [1316281.468665] XFS (dm-2): Unmount and run xfs_repair
Feb 22 11:04:21 vps3 kernel: [1316281.469440] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Feb 22 11:04:21 vps3 kernel: [1316281.470212] ffffa06e686ac000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Feb 22 11:04:21 vps3 kernel: [1316281.470964] ffffa06e686ac010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Feb 22 11:04:21 vps3 kernel: [1316281.471691] ffffa06e686ac020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Feb 22 11:04:21 vps3 kernel: [1316281.472431] ffffa06e686ac030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Feb 22 11:04:21 vps3 kernel: [1316281.473129] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1322 of file /home/zumbi/linux-4.9.2/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc05e0dc4
Feb 22 11:04:21 vps3 kernel: [1316281.473685] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Feb 22 11:04:21 vps3 kernel: [1316281.474402] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

After reboot, there was once this:
Feb 22 11:46:41 vps3 kernel: [ 2440.571092] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0xa000718
Feb 22 11:46:41 vps3 kernel: [ 2440.571160] XFS (dm-2): Unmount and run xfs_repair
Feb 22 11:46:41 vps3 kernel: [ 2440.571177] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Feb 22 11:46:41 vps3 kernel: [ 2440.571198] ffff8c46fdbe5000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Feb 22 11:46:41 vps3 kernel: [ 2440.571225] ffff8c46fdbe5010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Feb 22 11:46:41 vps3 kernel: [ 2440.571252] ffff8c46fdbe5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Feb 22 11:46:41 vps3 kernel: [ 2440.571278] ffff8c46fdbe5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Feb 22 11:46:41 vps3 kernel: [ 2440.571313] XFS (dm-2): metadata I/O error: block 0xa000718 ("xfs_trans_read_buf_map") error 117 numblks 8

We will run repair tomorrow. Is it worth upgrading xfsprogs from 4.9.0 to 4.10.0-rc1 before repair?

Thanks,
Libor




On středa 1. února 2017 13:48:57 CET Libor Klepáč wrote:
> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> Hello,
> we tried also on vps1 reported here in bottom of email
> https://www.spinics.net/lists/linux-xfs/msg01728.html
> and vps3 from this email
> https://www.spinics.net/lists/linux-xfs/msg02672.html
> 
> Both came clean. Does it mean, that corruption was really only in memory
> and did not made it to disks?
> Both machines are on 4.8.15 and xfsprogs 4.9.0
> 
> #root@vps3 # xfs_repair /dev/mapper/vg2Disk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Note - quota info will be regenerated on next quota mount.
> done
> 
> ---------------------------------
> #root@vps1:~# xfs_repair /dev/mapper/vgVPS1Disk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> -------------------------
> 
> Thanks,
> Libor
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--------
[1] mailto:libor.klepac@bcom.cz
[2] tel:+420377457676
[3] http://www.bcom.cz

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Feb. 22, 2017, 1:45 p.m. UTC | #11
On 2/22/17 5:42 AM, Libor Klepáč wrote:
> Hi,
> it happened again on one machine vps3 from last mail, which had clean xfs_repair run
> It's running kernel 4.9.0-0.bpo.1-amd64 (so it's 4.9.2) since 6. Feb. It was upgraded from 4.8.15.
> 
> Error was
> Feb 22 11:04:21 vps3 kernel: [1316281.466922] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0xa000718
> Feb 22 11:04:21 vps3 kernel: [1316281.468665] XFS (dm-2): Unmount and run xfs_repair
> Feb 22 11:04:21 vps3 kernel: [1316281.469440] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Feb 22 11:04:21 vps3 kernel: [1316281.470212] ffffa06e686ac000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Feb 22 11:04:21 vps3 kernel: [1316281.470964] ffffa06e686ac010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Feb 22 11:04:21 vps3 kernel: [1316281.471691] ffffa06e686ac020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Feb 22 11:04:21 vps3 kernel: [1316281.472431] ffffa06e686ac030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Feb 22 11:04:21 vps3 kernel: [1316281.473129] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1322 of file /home/zumbi/linux-4.9.2/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc05e0dc4
> Feb 22 11:04:21 vps3 kernel: [1316281.473685] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> Feb 22 11:04:21 vps3 kernel: [1316281.474402] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

Ok, and what happened to this machine in the meantime?
I don't understand why this has been showing up for you; it'd be
nice to know if anything "interesting" happened prior to this -
any other shutdown and log replay, for example?  Or what
is the workload that's leading to this, if you can tell?

If you run repair and it tells you which inode it is, go find
that inode and see if there is anything "interesting" about its
lifetime or attribute use, perhaps?

> After reboot, there was once this:
> Feb 22 11:46:41 vps3 kernel: [ 2440.571092] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0xa000718
> Feb 22 11:46:41 vps3 kernel: [ 2440.571160] XFS (dm-2): Unmount and run xfs_repair
> Feb 22 11:46:41 vps3 kernel: [ 2440.571177] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Feb 22 11:46:41 vps3 kernel: [ 2440.571198] ffff8c46fdbe5000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Feb 22 11:46:41 vps3 kernel: [ 2440.571225] ffff8c46fdbe5010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Feb 22 11:46:41 vps3 kernel: [ 2440.571252] ffff8c46fdbe5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Feb 22 11:46:41 vps3 kernel: [ 2440.571278] ffff8c46fdbe5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Feb 22 11:46:41 vps3 kernel: [ 2440.571313] XFS (dm-2): metadata I/O error: block 0xa000718 ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> We will run repair tomorrow. Is it worth upgrading xfsprogs from 4.9.0 to 4.10.0-rc1 before repair?

Should be no need, though always happy to have testing.  :)

> Thanks,
> Libor
> 
> 
> 
> 
> On středa 1. února 2017 13:48:57 CET Libor Klepáč wrote:
>> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
>>
>> Hello,
>> we tried also on vps1 reported here in bottom of email
>> https://www.spinics.net/lists/linux-xfs/msg01728.html
>> and vps3 from this email
>> https://www.spinics.net/lists/linux-xfs/msg02672.html
>>
>> Both came clean. Does it mean, that corruption was really only in memory
>> and did not made it to disks?
>> Both machines are on 4.8.15 and xfsprogs 4.9.0
>>
>> #root@vps3 # xfs_repair /dev/mapper/vg2Disk2-lvData
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>         - zero log...
>>         - scan filesystem freespace and inode maps...
>>         - found root inode chunk
>> Phase 3 - for each AG...
>>         - scan and clear agi unlinked lists...
>>         - process known inodes and perform inode discovery...
>>         - agno = 0
>>         - agno = 1
>>         - agno = 2
>>         - agno = 3
>>         - agno = 4
>>         - agno = 5
>>         - agno = 6
>>         - agno = 7
>>         - agno = 8
>>         - agno = 9
>>         - agno = 10
>>         - agno = 11
>>         - agno = 12
>>         - agno = 13
>>         - agno = 14
>>         - agno = 15
>>         - agno = 16
>>         - agno = 17
>>         - agno = 18
>>         - agno = 19
>>         - agno = 20
>>         - agno = 21
>>         - agno = 22
>>         - agno = 23
>>         - agno = 24
>>         - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>>         - setting up duplicate extent list...
>>         - check for inodes claiming duplicate blocks...
>>         - agno = 0
>>         - agno = 1
>>         - agno = 2
>>         - agno = 3
>>         - agno = 4
>>         - agno = 5
>>         - agno = 6
>>         - agno = 7
>>         - agno = 8
>>         - agno = 9
>>         - agno = 10
>>         - agno = 11
>>         - agno = 12
>>         - agno = 13
>>         - agno = 14
>>         - agno = 15
>>         - agno = 16
>>         - agno = 17
>>         - agno = 18
>>         - agno = 19
>>         - agno = 20
>>         - agno = 21
>>         - agno = 22
>>         - agno = 23
>>         - agno = 24
>> Phase 5 - rebuild AG headers and trees...
>>         - reset superblock...
>> Phase 6 - check inode connectivity...
>>         - resetting contents of realtime bitmap and summary inodes
>>         - traversing filesystem ...
>>         - traversal finished ...
>>         - moving disconnected inodes to lost+found ...
>> Phase 7 - verify and correct link counts...
>> Note - quota info will be regenerated on next quota mount.
>> done
>>
>> ---------------------------------
>> #root@vps1:~# xfs_repair /dev/mapper/vgVPS1Disk2-lvData
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>         - zero log...
>>         - scan filesystem freespace and inode maps...
>>         - found root inode chunk
>> Phase 3 - for each AG...
>>         - scan and clear agi unlinked lists...
>>         - process known inodes and perform inode discovery...
>>         - agno = 0
>>         - agno = 1
>>         - agno = 2
>>         - agno = 3
>>         - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>>         - setting up duplicate extent list...
>>         - check for inodes claiming duplicate blocks...
>>         - agno = 0
>>         - agno = 1
>>         - agno = 2
>>         - agno = 3
>> Phase 5 - rebuild AG headers and trees...
>>         - reset superblock...
>> Phase 6 - check inode connectivity...
>>         - resetting contents of realtime bitmap and summary inodes
>>         - traversing filesystem ...
>>         - traversal finished ...
>>         - moving disconnected inodes to lost+found ...
>> Phase 7 - verify and correct link counts...
>> done
>> -------------------------
>>
>> Thanks,
>> Libor
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> --------
> [1] mailto:libor.klepac@bcom.cz
> [2] tel:+420377457676
> [3] http://www.bcom.cz
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Feb. 22, 2017, 2:19 p.m. UTC | #12
On středa 22. února 2017 7:45:38 CET Eric Sandeen wrote:
> On 2/22/17 5:42 AM, Libor Klepáč wrote:
> > Hi,
> > it happened again on one machine vps3 from last mail, which had clean xfs_repair run
> > It's running kernel 4.9.0-0.bpo.1-amd64 (so it's 4.9.2) since 6. Feb. It was upgraded from 4.8.15.
> > 
> > Error was
> > Feb 22 11:04:21 vps3 kernel: [1316281.466922] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0xa000718
> > Feb 22 11:04:21 vps3 kernel: [1316281.468665] XFS (dm-2): Unmount and run xfs_repair
> > Feb 22 11:04:21 vps3 kernel: [1316281.469440] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> > Feb 22 11:04:21 vps3 kernel: [1316281.470212] ffffa06e686ac000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> > Feb 22 11:04:21 vps3 kernel: [1316281.470964] ffffa06e686ac010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> > Feb 22 11:04:21 vps3 kernel: [1316281.471691] ffffa06e686ac020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > Feb 22 11:04:21 vps3 kernel: [1316281.472431] ffffa06e686ac030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > Feb 22 11:04:21 vps3 kernel: [1316281.473129] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1322 of file /home/zumbi/linux-4.9.2/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc05e0dc4
> > Feb 22 11:04:21 vps3 kernel: [1316281.473685] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> > Feb 22 11:04:21 vps3 kernel: [1316281.474402] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> 
> Ok, and what happened to this machine in the meantime?
> I don't understand why this has been showing up for you; it'd be
> nice to know if anything "interesting" happened prior to this -
> any other shutdown and log replay, for example?  Or what
> is the workload that's leading to this, if you can tell?

Machine was running all the time, only rebooted for kernel upgrade (cleanly) to 4.9.2 on 6 Feb. 

It's a web hosting (apache + php + mysql) vps. All data are on XFS partition, meaning web site data, mysql databases, apache access/error logs.
Group quota is enabled, it's used for web site data, not for mysql or apache logs.

Server is not overloaded, serving just few sites.
But before this problem, it received around 10 thousand hits on one website, webshop created with Prestashop.
Load spiked to 100.
But no other error message than reported one appeared in kernel logs.

> 
> If you run repair and it tells you which inode it is, go find
> that inode and see if there is anything "interesting" about its
> lifetime or attribute use, perhaps?

Ok, will do

> 
> > After reboot, there was once this:
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571092] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0xa000718
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571160] XFS (dm-2): Unmount and run xfs_repair
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571177] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571198] ffff8c46fdbe5000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571225] ffff8c46fdbe5010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571252] ffff8c46fdbe5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571278] ffff8c46fdbe5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > Feb 22 11:46:41 vps3 kernel: [ 2440.571313] XFS (dm-2): metadata I/O error: block 0xa000718 ("xfs_trans_read_buf_map") error 117 numblks 8
> > 
> > We will run repair tomorrow. Is it worth upgrading xfsprogs from 4.9.0 to 4.10.0-rc1 before repair?
> 
> Should be no need, though always happy to have testing.  :)
> 

Ok, i will prepare package for server

Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč Feb. 23, 2017, 9:05 a.m. UTC | #13
Hello,
so repair did something, in log it says "block 0", is that ok?

Inode=335629253 is now in lost+found - empty dir belonging to user cust1.
When looking at 
u.sfdir2.hdr.parent.i4 = 319041478
from xfs_db before repair, it seemed to be in dir, which is owned by cust2, 
totally unrelated to cust1, is that possible? 
From our view, it's not possible because of FS rights (homedir of each user has rights 0750)

Second inode=1992635 , repaired by xfs_repair, is directory lost+found

Libor

Commands output follows:

Kernel 4.9.2, xfsprogs 4.10.0-rc1
-----------------------------------------------------------------------------
---- check
# xfs_repair -n /dev/mapper/vgDisk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
agi unlinked bucket 5 is 84933 in ag 20 (inode=335629253)
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
bad attribute count 0 in attr block 0, inode 335629253
problem with attribute contents in inode 335629253
would clear attr fork
bad nblocks 1 for inode 335629253, would reset to 0
bad anextents 1 for inode 335629253, would reset to 0
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
       - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 335629253, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 335629253 nlinks from 0 to 2
No modify flag set, skipping filesystem flush and exiting.
-----------------------------------------------------------------------------
---- repair
# xfs_repair  /dev/mapper/vgDisk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
agi unlinked bucket 5 is 84933 in ag 20 (inode=335629253)
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
bad attribute count 0 in attr block 0, inode 335629253
problem with attribute contents in inode 335629253
clearing inode 335629253 attributes
correcting nblocks for inode 335629253, was 1 - counted 0
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
bad attribute format 1 in inode 335629253, resetting value
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 335629253, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 1992635 nlinks from 2 to 3
resetting inode 335629253 nlinks from 0 to 2
Note - quota info will be regenerated on next quota mount.
Done
-----------------------------------------------------------------------------
---- xfs_db before repair
# xfs_db -r /dev/vgDisk2/lvData
xfs_db> inode 335629253
xfs_db> print
core.magic = 0x494e
core.mode = 040775
core.version = 2
core.format = 1 (local)
core.nlinkv2 = 0
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 10106
core.gid = 10106
core.flushiter = 2
core.atime.sec = Wed Feb 22 11:04:21 2017
core.atime.nsec = 464104444
core.mtime.sec = Wed Feb 22 11:46:41 2017
core.mtime.nsec = 548670485
core.ctime.sec = Wed Feb 22 11:46:41 2017
core.ctime.nsec = 548670485
core.size = 6
core.nblocks = 1
core.extsize = 0
core.nextents = 0
core.naextents = 1
core.forkoff = 15
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 1322976790
next_unlinked = null
u.sfdir2.hdr.count = 0
u.sfdir2.hdr.i8count = 0
u.sfdir2.hdr.parent.i4 = 319041478
a.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,20976867,1,0]

-----------------------------------------------------------------------------
---- xfs_db after repair
# xfs_db -r /dev/vgDisk2/lvData
xfs_db> inode 335629253
xfs_db> print
core.magic = 0x494e
core.mode = 040775
core.version = 2
core.format = 1 (local)
core.nlinkv2 = 2
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 10106
core.gid = 10106
core.flushiter = 2
core.atime.sec = Wed Feb 22 11:04:21 2017
core.atime.nsec = 464104444
core.mtime.sec = Wed Feb 22 11:46:41 2017
core.mtime.nsec = 548670485
core.ctime.sec = Wed Feb 22 11:46:41 2017
core.ctime.nsec = 548670485
core.size = 6
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 1322976790
next_unlinked = null
u.sfdir2.hdr.count = 0
u.sfdir2.hdr.i8count = 0
u.sfdir2.hdr.parent.i4 = 1992635
xfs_db> quit

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč March 13, 2017, 1:48 p.m. UTC | #14
Hello,
problem with this host again, after running uninterrupted from last email/repair on kernel 4.8.15. (so since 31. January)

Today, metadata corruption occured again.
Mar 13 11:16:31 vps2 kernel: [3563991.623260] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
Mar 13 11:16:31 vps2 kernel: [3563991.624321] XFS (dm-2): Unmount and run xfs_repair
Mar 13 11:16:31 vps2 kernel: [3563991.624696] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Mar 13 11:16:31 vps2 kernel: [3563991.625085] ffff994543410000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Mar 13 11:16:31 vps2 kernel: [3563991.625511] ffff994543410010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Mar 13 11:16:31 vps2 kernel: [3563991.625983] ffff994543410020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 13 11:16:31 vps2 kernel: [3563991.626398] ffff994543410030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 13 11:16:31 vps2 kernel: [3563991.626829] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1322 of file /build/linux-aPrr8L/linux-4.8.15/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc08295c4
Mar 13 11:16:31 vps2 kernel: [3563991.627210] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
Mar 13 11:16:31 vps2 kernel: [3563991.627212] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Mar 13 11:16:31 vps2 kernel: [3563991.627215] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Mar 13 11:16:31 vps2 kernel: [3563991.628752] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3420 of file /build/linux-aPrr8L/linux-4.8.15/fs/xfs/xfs_inode.c.  Return address = 0xffffffffc083fc1e
Mar 13 11:16:48 vps2 kernel: [3564008.557340] XFS (dm-2): xfs_log_force: error -5 returned.

After reboot, sometimes it logs
Mar 13 12:51:10 vps2 kernel: [ 5283.025665] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
Mar 13 12:51:10 vps2 kernel: [ 5283.026879] XFS (dm-2): Unmount and run xfs_repair
Mar 13 12:51:10 vps2 kernel: [ 5283.027471] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Mar 13 12:51:10 vps2 kernel: [ 5283.028074] ffff933f16f8c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Mar 13 12:51:10 vps2 kernel: [ 5283.028669] ffff933f16f8c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Mar 13 12:51:10 vps2 kernel: [ 5283.029240] ffff933f16f8c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 13 12:51:10 vps2 kernel: [ 5283.029814] ffff933f16f8c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 13 12:51:10 vps2 kernel: [ 5283.030428] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8
Mar 13 12:51:10 vps2 kernel: [ 5283.036222] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
Mar 13 12:51:10 vps2 kernel: [ 5283.037443] XFS (dm-2): Unmount and run xfs_repair
Mar 13 12:51:10 vps2 kernel: [ 5283.038049] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Mar 13 12:51:10 vps2 kernel: [ 5283.038644] ffff933f16f8c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Mar 13 12:51:10 vps2 kernel: [ 5283.039257] ffff933f16f8c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Mar 13 12:51:10 vps2 kernel: [ 5283.039838] ffff933f16f8c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 13 12:51:10 vps2 kernel: [ 5283.040397] ffff933f16f8c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 13 12:51:10 vps2 kernel: [ 5283.041482] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8

I have installed kernel 4.9.13 from backports and installed tools 4.10.0.
Collegue will reboot yesterday and do the repair.

It seems to be the same pattern again. Do you have any clue where it comes from? How can we prevent it from happening?

Thanks,
Libor

On úterý 31. ledna 2017 9:03:02 CET Libor Klepáč wrote:
> 
> Hello,
> sorry for late reply. It didn't crash since than and i forgot and moved on to another tasks.
> 
> Yesterday it crashed on one of machines (running 4.8.11)
> -------------------------
> Jan 30 07:18:13 vps2 kernel: [5881831.379547] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40
> Jan 30 07:18:13 vps2 kernel: [5881831.381721] XFS (dm-2): Unmount and run xfs_repair
> Jan 30 07:18:13 vps2 kernel: [5881831.382750] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Jan 30 07:18:13 vps2 kernel: [5881831.387810] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8
> Jan 30 07:26:02 vps2 kernel: [5882300.524528] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12645ef8
> Jan 30 07:26:02 vps2 kernel: [5882300.525993] XFS (dm-2): Unmount and run xfs_repair
> Jan 30 07:26:02 vps2 kernel: [5882300.526539] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Jan 30 07:26:02 vps2 kernel: [5882300.529224] XFS (dm-2): metadata I/O error: block 0x12645ef8 ("xfs_trans_read_buf_map") error 117 numblks 8
> Jan 30 10:00:27 vps2 kernel: [5891564.682483] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x127b5578
> Jan 30 10:00:27 vps2 kernel: [5891564.683962] XFS (dm-2): Unmount and run xfs_repair
> Jan 30 10:00:27 vps2 kernel: [5891564.684536] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Jan 30 10:00:27 vps2 kernel: [5891564.687223] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1250 of file /build/linux-lVEVrl/linux-4.7.8/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06747f2
> Jan 30 10:00:27 vps2 kernel: [5891564.687230] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> Jan 30 10:00:27 vps2 kernel: [5891564.687778] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> 
> and later
> Jan 30 21:10:31 vps2 kernel: [39747.917831] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24c17ba8
> Jan 30 21:10:31 vps2 kernel: [39747.918130] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8
> -------------------------
> 
> I have scheduled repair on today, all these blocks were repaired using xfsprogs 4.9.0
> Kernel is now 4.8.15
> 
> -------------------------
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
> Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> bad attribute count 0 in attr block 0, inode 1074268922
> problem with attribute contents in inode 1074268922
> clearing inode 1074268922 attributes
> correcting nblocks for inode 1074268922, was 1 - counted 0
> Metadata corruption detected at xfs_attr3_leaf block 0x127b5578/0x1000
> bad attribute count 0 in attr block 0, inode 1077334032
> problem with attribute contents in inode 1077334032
> clearing inode 1077334032 attributes
> correcting nblocks for inode 1077334032, was 1 - counted 0
> Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
> bad attribute count 0 in attr block 0, inode 1093437859
> problem with attribute contents in inode 1093437859
> clearing inode 1093437859 attributes
> correcting nblocks for inode 1093437859, was 1 - counted 0
>         - agno = 2
> Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
> bad attribute count 0 in attr block 0, inode 2147673775
> problem with attribute contents in inode 2147673775
> clearing inode 2147673775 attributes
> correcting nblocks for inode 2147673775, was 1 - counted 0
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
> bad attribute format 1 in inode 1074268922, resetting value
> bad attribute format 1 in inode 1077334032, resetting value
> bad attribute format 1 in inode 1093437859, resetting value
>         - agno = 2
> bad attribute format 1 in inode 2147673775, resetting value
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> -------------------------
> 
> Thank you very much for patch, it has done it's work
> 
> Libor
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 13, 2017, 2:14 p.m. UTC | #15
On 3/13/17 8:48 AM, Libor Klepáč wrote:
> Hello,
> problem with this host again, after running uninterrupted from last email/repair on kernel 4.8.15. (so since 31. January)
> 
> Today, metadata corruption occured again.
> Mar 13 11:16:31 vps2 kernel: [3563991.623260] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
> Mar 13 11:16:31 vps2 kernel: [3563991.624321] XFS (dm-2): Unmount and run xfs_repair

Ok, interesting that you hit this when writing an attr.

Can you turn the logging level way up:
# echo 11 > /proc/sys/fs/xfs/error_level

and then things like the force shutdown and the metadata will give you a backtrace, which might be useful (noisy, but useful)...

-Eric

> Mar 13 11:16:31 vps2 kernel: [3563991.624696] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Mar 13 11:16:31 vps2 kernel: [3563991.625085] ffff994543410000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Mar 13 11:16:31 vps2 kernel: [3563991.625511] ffff994543410010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Mar 13 11:16:31 vps2 kernel: [3563991.625983] ffff994543410020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 13 11:16:31 vps2 kernel: [3563991.626398] ffff994543410030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 13 11:16:31 vps2 kernel: [3563991.626829] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1322 of file /build/linux-aPrr8L/linux-4.8.15/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc08295c4
> Mar 13 11:16:31 vps2 kernel: [3563991.627210] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> Mar 13 11:16:31 vps2 kernel: [3563991.627212] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> Mar 13 11:16:31 vps2 kernel: [3563991.627215] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> Mar 13 11:16:31 vps2 kernel: [3563991.628752] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3420 of file /build/linux-aPrr8L/linux-4.8.15/fs/xfs/xfs_inode.c.  Return address = 0xffffffffc083fc1e
> Mar 13 11:16:48 vps2 kernel: [3564008.557340] XFS (dm-2): xfs_log_force: error -5 returned.
> 
> After reboot, sometimes it logs
> Mar 13 12:51:10 vps2 kernel: [ 5283.025665] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
> Mar 13 12:51:10 vps2 kernel: [ 5283.026879] XFS (dm-2): Unmount and run xfs_repair
> Mar 13 12:51:10 vps2 kernel: [ 5283.027471] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Mar 13 12:51:10 vps2 kernel: [ 5283.028074] ffff933f16f8c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Mar 13 12:51:10 vps2 kernel: [ 5283.028669] ffff933f16f8c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Mar 13 12:51:10 vps2 kernel: [ 5283.029240] ffff933f16f8c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 13 12:51:10 vps2 kernel: [ 5283.029814] ffff933f16f8c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 13 12:51:10 vps2 kernel: [ 5283.030428] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar 13 12:51:10 vps2 kernel: [ 5283.036222] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
> Mar 13 12:51:10 vps2 kernel: [ 5283.037443] XFS (dm-2): Unmount and run xfs_repair
> Mar 13 12:51:10 vps2 kernel: [ 5283.038049] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Mar 13 12:51:10 vps2 kernel: [ 5283.038644] ffff933f16f8c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Mar 13 12:51:10 vps2 kernel: [ 5283.039257] ffff933f16f8c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Mar 13 12:51:10 vps2 kernel: [ 5283.039838] ffff933f16f8c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 13 12:51:10 vps2 kernel: [ 5283.040397] ffff933f16f8c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 13 12:51:10 vps2 kernel: [ 5283.041482] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> I have installed kernel 4.9.13 from backports and installed tools 4.10.0.
> Collegue will reboot yesterday and do the repair.
> 
> It seems to be the same pattern again. Do you have any clue where it comes from? How can we prevent it from happening?
> 
> Thanks,
> Libor
> 
> On úterý 31. ledna 2017 9:03:02 CET Libor Klepáč wrote:
>>
>> Hello,
>> sorry for late reply. It didn't crash since than and i forgot and moved on to another tasks.
>>
>> Yesterday it crashed on one of machines (running 4.8.11)
>> -------------------------
>> Jan 30 07:18:13 vps2 kernel: [5881831.379547] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40
>> Jan 30 07:18:13 vps2 kernel: [5881831.381721] XFS (dm-2): Unmount and run xfs_repair
>> Jan 30 07:18:13 vps2 kernel: [5881831.382750] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
>> Jan 30 07:18:13 vps2 kernel: [5881831.387810] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Jan 30 07:26:02 vps2 kernel: [5882300.524528] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12645ef8
>> Jan 30 07:26:02 vps2 kernel: [5882300.525993] XFS (dm-2): Unmount and run xfs_repair
>> Jan 30 07:26:02 vps2 kernel: [5882300.526539] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
>> Jan 30 07:26:02 vps2 kernel: [5882300.529224] XFS (dm-2): metadata I/O error: block 0x12645ef8 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Jan 30 10:00:27 vps2 kernel: [5891564.682483] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x127b5578
>> Jan 30 10:00:27 vps2 kernel: [5891564.683962] XFS (dm-2): Unmount and run xfs_repair
>> Jan 30 10:00:27 vps2 kernel: [5891564.684536] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
>> Jan 30 10:00:27 vps2 kernel: [5891564.687223] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1250 of file /build/linux-lVEVrl/linux-4.7.8/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06747f2
>> Jan 30 10:00:27 vps2 kernel: [5891564.687230] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
>> Jan 30 10:00:27 vps2 kernel: [5891564.687778] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
>>
>> and later
>> Jan 30 21:10:31 vps2 kernel: [39747.917831] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24c17ba8
>> Jan 30 21:10:31 vps2 kernel: [39747.918130] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8
>> -------------------------
>>
>> I have scheduled repair on today, all these blocks were repaired using xfsprogs 4.9.0
>> Kernel is now 4.8.15
>>
>> -------------------------
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>         - zero log...
>>         - scan filesystem freespace and inode maps...
>>         - found root inode chunk
>> Phase 3 - for each AG...
>>         - scan and clear agi unlinked lists...
>>         - process known inodes and perform inode discovery...
>>         - agno = 0
>>         - agno = 1
>> Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
>> bad attribute count 0 in attr block 0, inode 1074268922
>> problem with attribute contents in inode 1074268922
>> clearing inode 1074268922 attributes
>> correcting nblocks for inode 1074268922, was 1 - counted 0
>> Metadata corruption detected at xfs_attr3_leaf block 0x127b5578/0x1000
>> bad attribute count 0 in attr block 0, inode 1077334032
>> problem with attribute contents in inode 1077334032
>> clearing inode 1077334032 attributes
>> correcting nblocks for inode 1077334032, was 1 - counted 0
>> Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
>> bad attribute count 0 in attr block 0, inode 1093437859
>> problem with attribute contents in inode 1093437859
>> clearing inode 1093437859 attributes
>> correcting nblocks for inode 1093437859, was 1 - counted 0
>>         - agno = 2
>> Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
>> bad attribute count 0 in attr block 0, inode 2147673775
>> problem with attribute contents in inode 2147673775
>> clearing inode 2147673775 attributes
>> correcting nblocks for inode 2147673775, was 1 - counted 0
>>         - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>>         - setting up duplicate extent list...
>>         - check for inodes claiming duplicate blocks...
>>         - agno = 0
>>         - agno = 1
>> bad attribute format 1 in inode 1074268922, resetting value
>> bad attribute format 1 in inode 1077334032, resetting value
>> bad attribute format 1 in inode 1093437859, resetting value
>>         - agno = 2
>> bad attribute format 1 in inode 2147673775, resetting value
>> Phase 5 - rebuild AG headers and trees...
>>         - reset superblock...
>> Phase 6 - check inode connectivity...
>>         - resetting contents of realtime bitmap and summary inodes
>>         - traversing filesystem ...
>>         - traversal finished ...
>>         - moving disconnected inodes to lost+found ...
>> Phase 7 - verify and correct link counts...
>> done
>> -------------------------
>>
>> Thank you very much for patch, it has done it's work
>>
>> Libor
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč March 14, 2017, 8:15 a.m. UTC | #16
Hello,
i have this during night with error_level = 11, no force shutdown
(kernel 4.8.15)
Mar 14 02:36:29 vps2 kernel: [54799.061956] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
Mar 14 02:36:29 vps2 kernel: [54799.063194] XFS (dm-2): Unmount and run xfs_repair
Mar 14 02:36:29 vps2 kernel: [54799.063786] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Mar 14 02:36:29 vps2 kernel: [54799.064377] ffff933db1988000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Mar 14 02:36:29 vps2 kernel: [54799.064972] ffff933db1988010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Mar 14 02:36:29 vps2 kernel: [54799.065569] ffff933db1988020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 14 02:36:29 vps2 kernel: [54799.066141] ffff933db1988030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 14 02:36:29 vps2 kernel: [54799.066683] CPU: 1 PID: 22609 Comm: kworker/1:30 Tainted: G            E   4.8.0-0.bpo.2-amd64 #1 Debian 4.8.15-2~bpo8+2
Mar 14 02:36:29 vps2 kernel: [54799.066684] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
Mar 14 02:36:29 vps2 kernel: [54799.066710] Workqueue: xfs-buf/dm-2 xfs_buf_ioend_work [xfs]
Mar 14 02:36:29 vps2 kernel: [54799.066712]  0000000000000286 00000000940e3248 ffffffff92322b05 ffff933daff0e300
Mar 14 02:36:29 vps2 kernel: [54799.066714]  ffff934016e70480 ffffffffc073895a ffff9341bfc98700 00000000940e3248
Mar 14 02:36:29 vps2 kernel: [54799.066715]  ffff933daff0e300 ffff934016e70480 0000000000000000 ffffffffc0779a34
Mar 14 02:36:29 vps2 kernel: [54799.066716] Call Trace:
Mar 14 02:36:29 vps2 kernel: [54799.066736]  [<ffffffff92322b05>] ? dump_stack+0x5c/0x77
Mar 14 02:36:29 vps2 kernel: [54799.066751]  [<ffffffffc073895a>] ? xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs]
Mar 14 02:36:29 vps2 kernel: [54799.066768]  [<ffffffffc0779a34>] ? xfs_buf_ioend+0x54/0x1d0 [xfs]
Mar 14 02:36:29 vps2 kernel: [54799.066776]  [<ffffffff9209000b>] ? process_one_work+0x14b/0x410
Mar 14 02:36:29 vps2 kernel: [54799.066777]  [<ffffffff92090ac5>] ? worker_thread+0x65/0x4a0
Mar 14 02:36:29 vps2 kernel: [54799.066778]  [<ffffffff92090a60>] ? rescuer_thread+0x340/0x340
Mar 14 02:36:29 vps2 kernel: [54799.066780]  [<ffffffff92095d5f>] ? kthread+0xdf/0x100
Mar 14 02:36:29 vps2 kernel: [54799.066805]  [<ffffffff9202478b>] ? __switch_to+0x2bb/0x710
Mar 14 02:36:29 vps2 kernel: [54799.066809]  [<ffffffff925ecf2f>] ? ret_from_fork+0x1f/0x40
Mar 14 02:36:29 vps2 kernel: [54799.066810]  [<ffffffff92095c80>] ? kthread_park+0x50/0x50
Mar 14 02:36:29 vps2 kernel: [54799.067562] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8

This one is from now, when i tried to find inode=2152616264 which corresponds to block 0x24e70268                                                                                                                   
(kernel 4.9.13)
Mar 14 09:01:31 vps2 kernel: [10644.526839] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268                                                                                                                   
Mar 14 09:01:31 vps2 kernel: [10644.526896] XFS (dm-2): Unmount and run xfs_repair                                                                                                                                                                                                     
Mar 14 09:01:31 vps2 kernel: [10644.526917] XFS (dm-2): First 64 bytes of corrupted metadata buffer:                                                                                                                                                                                   
Mar 14 09:01:31 vps2 kernel: [10644.526944] ffff9f0b5b5d2000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................                                                                                                                                                        
Mar 14 09:01:31 vps2 kernel: [10644.526980] ffff9f0b5b5d2010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........                                                                                                                                                        
Mar 14 09:01:31 vps2 kernel: [10644.527015] ffff9f0b5b5d2020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................                                                                                                                                                        
Mar 14 09:01:31 vps2 kernel: [10644.527049] ffff9f0b5b5d2030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................                                                                                                                                                        
Mar 14 09:01:31 vps2 kernel: [10644.527086] CPU: 2 PID: 30689 Comm: kworker/2:3 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1                                                                                                                                
Mar 14 09:01:31 vps2 kernel: [10644.527087] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
Mar 14 09:01:31 vps2 kernel: [10644.527119] Workqueue: xfs-buf/dm-2 xfs_buf_ioend_work [xfs]
Mar 14 09:01:31 vps2 kernel: [10644.527121]  0000000000000000 ffffffff9cf29cd5 ffff9f0b91105b00 ffff9f0b6acd2900
Mar 14 09:01:31 vps2 kernel: [10644.527123]  ffffffffc0749aca ffff9f0e7fd18700 000000006076a0c0 ffff9f0b91105b00
Mar 14 09:01:31 vps2 kernel: [10644.527124]  ffff9f0b6acd2900 0000000000000000 ffffffffc0794414 ffff9f0b91105ba8
Mar 14 09:01:31 vps2 kernel: [10644.527126] Call Trace:
Mar 14 09:01:31 vps2 kernel: [10644.527132]  [<ffffffff9cf29cd5>] ? dump_stack+0x5c/0x77
Mar 14 09:01:31 vps2 kernel: [10644.527155]  [<ffffffffc0749aca>] ? xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs]
Mar 14 09:01:31 vps2 kernel: [10644.527184]  [<ffffffffc0794414>] ? xfs_buf_ioend+0x54/0x1d0 [xfs]
Mar 14 09:01:31 vps2 kernel: [10644.527187]  [<ffffffff9cc9171b>] ? process_one_work+0x14b/0x410
Mar 14 09:01:31 vps2 kernel: [10644.527189]  [<ffffffff9cc921d5>] ? worker_thread+0x65/0x4a0
Mar 14 09:01:31 vps2 kernel: [10644.527190]  [<ffffffff9cc92170>] ? rescuer_thread+0x340/0x340
Mar 14 09:01:31 vps2 kernel: [10644.527191]  [<ffffffff9cc92170>] ? rescuer_thread+0x340/0x340
Mar 14 09:01:31 vps2 kernel: [10644.527194]  [<ffffffff9cc03b81>] ? do_syscall_64+0x81/0x190
Mar 14 09:01:31 vps2 kernel: [10644.527197]  [<ffffffff9cc7c730>] ? SyS_exit_group+0x10/0x10
Mar 14 09:01:31 vps2 kernel: [10644.527198]  [<ffffffff9cc974c0>] ? kthread+0xe0/0x100
Mar 14 09:01:31 vps2 kernel: [10644.527200]  [<ffffffff9cc2476b>] ? __switch_to+0x2bb/0x700
Mar 14 09:01:31 vps2 kernel: [10644.527202]  [<ffffffff9cc973e0>] ? kthread_park+0x60/0x60
Mar 14 09:01:31 vps2 kernel: [10644.527205]  [<ffffffff9d1fb675>] ? ret_from_fork+0x25/0x30
Mar 14 09:01:31 vps2 kernel: [10644.527213] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8


---------------------------------------------------------
There are logs from my collegue, he wasn't able to do repair, device was busy, probably something forgoten to stop (container with private mounts?)

# xfs_repair -n /dev/mapper/vgDisk-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
sb_fdblocks 2245059, counted 2253251
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
bad attribute count 0 in attr block 0, inode 2152616264
problem with attribute contents in inode 2152616264
would clear attr fork
bad nblocks 1 for inode 2152616264, would reset to 0
bad anextents 1 for inode 2152616264, would reset to 0
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
 
# xfs_db -r /dev/mapper/vgDisk2-lvData
xfs_db> inode 2152616264
xfs_db> print
core.magic = 0x494e
core.mode = 0100664
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 20045
core.gid = 20045
core.flushiter = 0
core.atime.sec = Mon Mar 13 11:16:31 2017
core.atime.nsec = 815645008
core.mtime.sec = Mon Mar 13 11:16:31 2017
core.mtime.nsec = 815645008
core.ctime.sec = Mon Mar 13 11:16:31 2017
core.ctime.nsec = 815645008
core.size = 0
core.nblocks = 1
core.extsize = 0
core.nextents = 0
core.naextents = 1
core.forkoff = 15
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 3279391285
next_unlinked = null
u = (empty)
a.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,134538317,1,0]
 
# xfs_repair /dev/mapper/vgDisk2-lvData
xfs_repair: cannot open /dev/mapper/vgDisk2-lvData: Device or resource busy



Libor



On pondělí 13. března 2017 9:14:53 CET Eric Sandeen wrote:
> On 3/13/17 8:48 AM, Libor Klepáč wrote:

> > Hello,

> > problem with this host again, after running uninterrupted from last email/repair on kernel 4.8.15. (so since 31. January)

> > 

> > Today, metadata corruption occured again.

> > Mar 13 11:16:31 vps2 kernel: [3563991.623260] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x24e70268

> > Mar 13 11:16:31 vps2 kernel: [3563991.624321] XFS (dm-2): Unmount and run xfs_repair

> 

> Ok, interesting that you hit this when writing an attr.

> 

> Can you turn the logging level way up:

> # echo 11 > /proc/sys/fs/xfs/error_level

> 

> and then things like the force shutdown and the metadata will give you a backtrace, which might be useful (noisy, but useful)...

> 

> -Eric

> 

> > Mar 13 11:16:31 vps2 kernel: [3563991.624696] XFS (dm-2): First 64 bytes of corrupted metadata buffer:

> > Mar 13 11:16:31 vps2 kernel: [3563991.625085] ffff994543410000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................

> > Mar 13 11:16:31 vps2 kernel: [3563991.625511] ffff994543410010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........

> > Mar 13 11:16:31 vps2 kernel: [3563991.625983] ffff994543410020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

> > Mar 13 11:16:31 vps2 kernel: [3563991.626398] ffff994543410030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

> > Mar 13 11:16:31 vps2 kernel: [3563991.626829] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1322 of file /build/linux-aPrr8L/linux-4.8.15/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc08295c4

> > Mar 13 11:16:31 vps2 kernel: [3563991.627210] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.

> > Mar 13 11:16:31 vps2 kernel: [3563991.627212] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem

> > Mar 13 11:16:31 vps2 kernel: [3563991.627215] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

> > Mar 13 11:16:31 vps2 kernel: [3563991.628752] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3420 of file /build/linux-aPrr8L/linux-4.8.15/fs/xfs/xfs_inode.c.  Return address = 0xffffffffc083fc1e

> > Mar 13 11:16:48 vps2 kernel: [3564008.557340] XFS (dm-2): xfs_log_force: error -5 returned.

> > 

> > After reboot, sometimes it logs

> > Mar 13 12:51:10 vps2 kernel: [ 5283.025665] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268

> > Mar 13 12:51:10 vps2 kernel: [ 5283.026879] XFS (dm-2): Unmount and run xfs_repair

> > Mar 13 12:51:10 vps2 kernel: [ 5283.027471] XFS (dm-2): First 64 bytes of corrupted metadata buffer:

> > Mar 13 12:51:10 vps2 kernel: [ 5283.028074] ffff933f16f8c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................

> > Mar 13 12:51:10 vps2 kernel: [ 5283.028669] ffff933f16f8c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........

> > Mar 13 12:51:10 vps2 kernel: [ 5283.029240] ffff933f16f8c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

> > Mar 13 12:51:10 vps2 kernel: [ 5283.029814] ffff933f16f8c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

> > Mar 13 12:51:10 vps2 kernel: [ 5283.030428] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8

> > Mar 13 12:51:10 vps2 kernel: [ 5283.036222] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268

> > Mar 13 12:51:10 vps2 kernel: [ 5283.037443] XFS (dm-2): Unmount and run xfs_repair

> > Mar 13 12:51:10 vps2 kernel: [ 5283.038049] XFS (dm-2): First 64 bytes of corrupted metadata buffer:

> > Mar 13 12:51:10 vps2 kernel: [ 5283.038644] ffff933f16f8c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................

> > Mar 13 12:51:10 vps2 kernel: [ 5283.039257] ffff933f16f8c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........

> > Mar 13 12:51:10 vps2 kernel: [ 5283.039838] ffff933f16f8c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

> > Mar 13 12:51:10 vps2 kernel: [ 5283.040397] ffff933f16f8c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

> > Mar 13 12:51:10 vps2 kernel: [ 5283.041482] XFS (dm-2): metadata I/O error: block 0x24e70268 ("xfs_trans_read_buf_map") error 117 numblks 8

> > 

> > I have installed kernel 4.9.13 from backports and installed tools 4.10.0.

> > Collegue will reboot yesterday and do the repair.

> > 

> > It seems to be the same pattern again. Do you have any clue where it comes from? How can we prevent it from happening?

> > 

> > Thanks,

> > Libor

> > 

> > On úterý 31. ledna 2017 9:03:02 CET Libor Klepáč wrote:

> >>

> >> Hello,

> >> sorry for late reply. It didn't crash since than and i forgot and moved on to another tasks.

> >>

> >> Yesterday it crashed on one of machines (running 4.8.11)

> >> -------------------------

> >> Jan 30 07:18:13 vps2 kernel: [5881831.379547] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40

> >> Jan 30 07:18:13 vps2 kernel: [5881831.381721] XFS (dm-2): Unmount and run xfs_repair

> >> Jan 30 07:18:13 vps2 kernel: [5881831.382750] XFS (dm-2): First 64 bytes of corrupted metadata buffer:

> >> Jan 30 07:18:13 vps2 kernel: [5881831.387810] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8

> >> Jan 30 07:26:02 vps2 kernel: [5882300.524528] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12645ef8

> >> Jan 30 07:26:02 vps2 kernel: [5882300.525993] XFS (dm-2): Unmount and run xfs_repair

> >> Jan 30 07:26:02 vps2 kernel: [5882300.526539] XFS (dm-2): First 64 bytes of corrupted metadata buffer:

> >> Jan 30 07:26:02 vps2 kernel: [5882300.529224] XFS (dm-2): metadata I/O error: block 0x12645ef8 ("xfs_trans_read_buf_map") error 117 numblks 8

> >> Jan 30 10:00:27 vps2 kernel: [5891564.682483] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x127b5578

> >> Jan 30 10:00:27 vps2 kernel: [5891564.683962] XFS (dm-2): Unmount and run xfs_repair

> >> Jan 30 10:00:27 vps2 kernel: [5891564.684536] XFS (dm-2): First 64 bytes of corrupted metadata buffer:

> >> Jan 30 10:00:27 vps2 kernel: [5891564.687223] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1250 of file /build/linux-lVEVrl/linux-4.7.8/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06747f2

> >> Jan 30 10:00:27 vps2 kernel: [5891564.687230] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem

> >> Jan 30 10:00:27 vps2 kernel: [5891564.687778] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

> >>

> >> and later

> >> Jan 30 21:10:31 vps2 kernel: [39747.917831] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24c17ba8

> >> Jan 30 21:10:31 vps2 kernel: [39747.918130] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8

> >> -------------------------

> >>

> >> I have scheduled repair on today, all these blocks were repaired using xfsprogs 4.9.0

> >> Kernel is now 4.8.15

> >>

> >> -------------------------

> >> Phase 1 - find and verify superblock...

> >> Phase 2 - using internal log

> >>         - zero log...

> >>         - scan filesystem freespace and inode maps...

> >>         - found root inode chunk

> >> Phase 3 - for each AG...

> >>         - scan and clear agi unlinked lists...

> >>         - process known inodes and perform inode discovery...

> >>         - agno = 0

> >>         - agno = 1

> >> Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000

> >> bad attribute count 0 in attr block 0, inode 1074268922

> >> problem with attribute contents in inode 1074268922

> >> clearing inode 1074268922 attributes

> >> correcting nblocks for inode 1074268922, was 1 - counted 0

> >> Metadata corruption detected at xfs_attr3_leaf block 0x127b5578/0x1000

> >> bad attribute count 0 in attr block 0, inode 1077334032

> >> problem with attribute contents in inode 1077334032

> >> clearing inode 1077334032 attributes

> >> correcting nblocks for inode 1077334032, was 1 - counted 0

> >> Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000

> >> bad attribute count 0 in attr block 0, inode 1093437859

> >> problem with attribute contents in inode 1093437859

> >> clearing inode 1093437859 attributes

> >> correcting nblocks for inode 1093437859, was 1 - counted 0

> >>         - agno = 2

> >> Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000

> >> bad attribute count 0 in attr block 0, inode 2147673775

> >> problem with attribute contents in inode 2147673775

> >> clearing inode 2147673775 attributes

> >> correcting nblocks for inode 2147673775, was 1 - counted 0

> >>         - process newly discovered inodes...

> >> Phase 4 - check for duplicate blocks...

> >>         - setting up duplicate extent list...

> >>         - check for inodes claiming duplicate blocks...

> >>         - agno = 0

> >>         - agno = 1

> >> bad attribute format 1 in inode 1074268922, resetting value

> >> bad attribute format 1 in inode 1077334032, resetting value

> >> bad attribute format 1 in inode 1093437859, resetting value

> >>         - agno = 2

> >> bad attribute format 1 in inode 2147673775, resetting value

> >> Phase 5 - rebuild AG headers and trees...

> >>         - reset superblock...

> >> Phase 6 - check inode connectivity...

> >>         - resetting contents of realtime bitmap and summary inodes

> >>         - traversing filesystem ...

> >>         - traversal finished ...

> >>         - moving disconnected inodes to lost+found ...

> >> Phase 7 - verify and correct link counts...

> >> done

> >> -------------------------

> >>

> >> Thank you very much for patch, it has done it's work

> >>

> >> Libor

> >>

> >> --

> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in

> >> the body of a message to majordomo@vger.kernel.org

> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

> >>

> > 

> > 

> --

> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html

> 



--------
[1] mailto:libor.klepac@bcom.cz
[2] tel:+420377457676
[3] http://www.bcom.cz
Eric Sandeen March 14, 2017, 4:54 p.m. UTC | #17
On 3/14/17 3:15 AM, Libor Klepáč wrote:
> Hello,
> i have this during night with error_level = 11, no force shutdown

Just to double check - are these all getting freshly created?  I.e.
you repair and fix the filesystem each time you see this, but they
keep re-appearing?

> (kernel 4.8.15)
> Mar 14 02:36:29 vps2 kernel: [54799.061956] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268

Unfortunately the read path is a bit less interesting.  We found something
on disk, but we're not sure how it got there.

If we could catch a write verifier failing that /might/ be a little more
useful.

I'm at a loss as to why you seem to be uniquely able to hit this problem.  :(

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 14, 2017, 6:51 p.m. UTC | #18
On 3/14/17 11:54 AM, Eric Sandeen wrote:
> On 3/14/17 3:15 AM, Libor Klepáč wrote:
>> Hello,
>> i have this during night with error_level = 11, no force shutdown
> 
> Just to double check - are these all getting freshly created?  I.e.
> you repair and fix the filesystem each time you see this, but they
> keep re-appearing?

And is anything else happening in between - unclean shutdowns,
crashes, storage problems etc or are you just riding along, and
they show up out of nowhere on an otherwise normal box?

(sorry if that's elsewhere in this long thread, but a summary
of the problem at this point might help).

-Eric

>> (kernel 4.8.15)
>> Mar 14 02:36:29 vps2 kernel: [54799.061956] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
> 
> Unfortunately the read path is a bit less interesting.  We found something
> on disk, but we're not sure how it got there.
> 
> If we could catch a write verifier failing that /might/ be a little more
> useful.
> 
> I'm at a loss as to why you seem to be uniquely able to hit this problem.  :(
> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč March 15, 2017, 10:07 a.m. UTC | #19
Hello,

all VMs are running smoothly between these problems with XFS, no HW problems, no crashes, no force reboots.

On úterý 14. března 2017 11:54:37 CET Eric Sandeen wrote:
> On 3/14/17 3:15 AM, Libor Klepáč wrote:
> > Hello,
> > i have this during night with error_level = 11, no force shutdown
> 
> Just to double check - are these all getting freshly created?  I.e.
> you repair and fix the filesystem each time you see this, but they
> keep re-appearing?

I have looked to old emails and every xfs_repair fixes problem and next time different block/inode pops out.
I'm trying to lookup files using inode numbers from old emails.

If i remember correctly, files/dirs are usually created in some website cache directory. (ie. generated from template by php and saved to disk for later use).
So my lookup will probably come to nothing. Last reported force shutdown was on cache file.

> 
> > (kernel 4.8.15)
> > Mar 14 02:36:29 vps2 kernel: [54799.061956] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x24e70268
> 
> Unfortunately the read path is a bit less interesting.  We found something
> on disk, but we're not sure how it got there.
> If we could catch a write verifier failing that /might/ be a little more
> useful.

I'm prepared to run all affected hosts with error_level=11 , if it doesn't mean performance hit.

> 
> I'm at a loss as to why you seem to be uniquely able to hit this problem.  :(

And I was questioning my googling skills ;)

> 
> -Eric
> 

Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 15, 2017, 3:22 p.m. UTC | #20
On 3/15/17 5:07 AM, Libor Klepáč wrote:
> Hello,

...

>> Unfortunately the read path is a bit less interesting.  We found something
>> on disk, but we're not sure how it got there.
>> If we could catch a write verifier failing that /might/ be a little more
>> useful.
> 
> I'm prepared to run all affected hosts with error_level=11 , if it doesn't mean performance hit.

It shouldn't.  It only changes logging behavior on error.  The printks to the
console probably take a bit of extra time but at that point you've already lost,
right?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč March 16, 2017, 8:58 a.m. UTC | #21
Hi,
On středa 15. března 2017 10:22:05 CET Eric Sandeen wrote:
> On 3/15/17 5:07 AM, Libor Klepáč wrote:
> > Hello,
> 
> ...
> 
> >> Unfortunately the read path is a bit less interesting.  We found something
> >> on disk, but we're not sure how it got there.
> >> If we could catch a write verifier failing that /might/ be a little more
> >> useful.
> > 
> > I'm prepared to run all affected hosts with error_level=11 , if it doesn't mean performance hit.
> 
> It shouldn't.  It only changes logging behavior on error.  The printks to the
> console probably take a bit of extra time but at that point you've already lost,
> right?

Ok, thanks for clarification, I was wondering whether higher error_level isn't triggering some extra code paths during normal operation.
I will leave it on 11 on all machines with problems.

Btw. i naively created 
/etc/sysctl.d/fs_xfs_error_level.conf
with 
fs.xfs.error_level=11
inside.
But it's not set after reboot. May it be because XFS module (not root filesystem) is loaded after sysctl is set?

Bellow is result of repair.
I was expecting 
bad attribute count 0 in attr block 0, inode 2152616264
to be
bad attribute count 0 in attr block 0x24e70268, inode 2152616264

Function process_leaf_attr_block should be called with non-zero da_bno in process_leaf_attr_level , right?
But it's called with da_bno = 0 in process_longform_attr .

Of course I don't know the code.

Libor

# xfs_repair  /dev/mapper/vgDisk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
bad attribute count 0 in attr block 0, inode 2152616264
problem with attribute contents in inode 2152616264
clearing inode 2152616264 attributes
correcting nblocks for inode 2152616264, was 1 - counted 0
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
bad attribute format 1 in inode 2152616264, resetting value
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Done


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 16, 2017, 3:21 p.m. UTC | #22
On 3/16/17 1:58 AM, Libor Klepáč wrote:
> Hi,
> On středa 15. března 2017 10:22:05 CET Eric Sandeen wrote:
>> On 3/15/17 5:07 AM, Libor Klepáč wrote:
>>> Hello,
>>
>> ...
>>
>>>> Unfortunately the read path is a bit less interesting.  We found something
>>>> on disk, but we're not sure how it got there.
>>>> If we could catch a write verifier failing that /might/ be a little more
>>>> useful.
>>>
>>> I'm prepared to run all affected hosts with error_level=11 , if it doesn't mean performance hit.
>>
>> It shouldn't.  It only changes logging behavior on error.  The printks to the
>> console probably take a bit of extra time but at that point you've already lost,
>> right?
> 
> Ok, thanks for clarification, I was wondering whether higher error_level isn't triggering some extra code paths during normal operation.
> I will leave it on 11 on all machines with problems.

Understood.  Yup, it's really not, it only changes reporting post-error.

> Btw. i naively created 
> /etc/sysctl.d/fs_xfs_error_level.conf
> with 
> fs.xfs.error_level=11
> inside.
> But it's not set after reboot. May it be because XFS module (not root filesystem) is loaded after sysctl is set?

Must be, the systctl file will not appear until xfs is loaded.
 
> Bellow is result of repair.
> I was expecting 
> bad attribute count 0 in attr block 0, inode 2152616264
> to be
> bad attribute count 0 in attr block 0x24e70268, inode 2152616264
> 
> Function process_leaf_attr_block should be called with non-zero da_bno in process_leaf_attr_level , right?
> But it's called with da_bno = 0 in process_longform_attr .

typedef __uint32_t      xfs_dablk_t;    /* dir/attr block number (in file) */

It's the relative block number in the attribute, i.e. block 0 (first block)

> Of course I don't know the code.
> 
> Libor
> 
> # xfs_repair  /dev/mapper/vgDisk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
> bad attribute count 0 in attr block 0, inode 2152616264
> problem with attribute contents in inode 2152616264
> clearing inode 2152616264 attributes
> correcting nblocks for inode 2152616264, was 1 - counted 0
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
> bad attribute format 1 in inode 2152616264, resetting value
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Done
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč March 29, 2017, 1:33 p.m. UTC | #23
Hello,
so here is trace from today.
Hope it helps you

Libor

Mar 29 15:05:12 vps2 kernel: [1154612.424431] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x24dd5310
Mar 29 15:05:12 vps2 kernel: [1154612.424882] XFS (dm-2): Unmount and run xfs_repair
Mar 29 15:05:12 vps2 kernel: [1154612.425072] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Mar 29 15:05:12 vps2 kernel: [1154612.425262] ffff979235612000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Mar 29 15:05:12 vps2 kernel: [1154612.425478] ffff979235612010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Mar 29 15:05:12 vps2 kernel: [1154612.425708] ffff979235612020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 29 15:05:12 vps2 kernel: [1154612.425960] ffff979235612030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Mar 29 15:05:12 vps2 kernel: [1154612.426259] CPU: 2 PID: 453 Comm: xfsaild/dm-2 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
Mar 29 15:05:12 vps2 kernel: [1154612.426260] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
Mar 29 15:05:12 vps2 kernel: [1154612.426262]  0000000000000000 ffffffffb1b29cd5 ffff9794ac29c980 ffff9791295d83c0
Mar 29 15:05:12 vps2 kernel: [1154612.426264]  ffffffffc064aa58 0000000000000000 0000000001d9a9ff 0000000000000000
Mar 29 15:05:12 vps2 kernel: [1154612.426266]  ffffbd1ac1fb3d40 ffffffffc0695997 ffff9792b0dd60c0 ffffffffc0693cc4
Mar 29 15:05:12 vps2 kernel: [1154612.426268] Call Trace:
Mar 29 15:05:12 vps2 kernel: [1154612.426277]  [<ffffffffb1b29cd5>] ? dump_stack+0x5c/0x77
Mar 29 15:05:12 vps2 kernel: [1154612.426339]  [<ffffffffc064aa58>] ? xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426361]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426384]  [<ffffffffc0693cc4>] ? _xfs_buf_ioapply+0x84/0x450 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426389]  [<ffffffffb18a2b30>] ? wake_up_q+0x60/0x60
Mar 29 15:05:12 vps2 kernel: [1154612.426411]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426446]  [<ffffffffc06956d7>] ? xfs_buf_submit+0x67/0x1f0 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426473]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426505]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426531]  [<ffffffffc06bf2f9>] ? xfs_inode_item_push+0x99/0x140 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426558]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426585]  [<ffffffffc06c74b0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.426589]  [<ffffffffb18974c0>] ? kthread+0xe0/0x100
Mar 29 15:05:12 vps2 kernel: [1154612.426592]  [<ffffffffb182476b>] ? __switch_to+0x2bb/0x700
Mar 29 15:05:12 vps2 kernel: [1154612.426594]  [<ffffffffb18973e0>] ? kthread_park+0x60/0x60
Mar 29 15:05:12 vps2 kernel: [1154612.426597]  [<ffffffffb1dfb675>] ? ret_from_fork+0x25/0x30
Mar 29 15:05:12 vps2 kernel: [1154612.426602] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1323 of file /home/zumbi/linux-4.9.13/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc0694084
Mar 29 15:05:12 vps2 kernel: [1154612.427033] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Mar 29 15:05:12 vps2 kernel: [1154612.427323] CPU: 2 PID: 453 Comm: xfsaild/dm-2 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
Mar 29 15:05:12 vps2 kernel: [1154612.427323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
Mar 29 15:05:12 vps2 kernel: [1154612.427324]  0000000000000000 ffffffffb1b29cd5 0000000000000008 ffff979528c42000
Mar 29 15:05:12 vps2 kernel: [1154612.427326]  ffffffffc069c1fa ffffffffc0694084 0000000000000000 ffffbd1ac1fb3d40
Mar 29 15:05:12 vps2 kernel: [1154612.427328]  ffffffffc0695997 ffff9792b0dd60c0 ffffffffc0694084 0000000000000000
Mar 29 15:05:12 vps2 kernel: [1154612.427329] Call Trace:
Mar 29 15:05:12 vps2 kernel: [1154612.427335]  [<ffffffffb1b29cd5>] ? dump_stack+0x5c/0x77
Mar 29 15:05:12 vps2 kernel: [1154612.427380]  [<ffffffffc069c1fa>] ? xfs_do_force_shutdown+0x13a/0x140 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427408]  [<ffffffffc0694084>] ? _xfs_buf_ioapply+0x444/0x450 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427432]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427458]  [<ffffffffc0694084>] ? _xfs_buf_ioapply+0x444/0x450 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427462]  [<ffffffffb18a2b30>] ? wake_up_q+0x60/0x60
Mar 29 15:05:12 vps2 kernel: [1154612.427485]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427508]  [<ffffffffc06956d7>] ? xfs_buf_submit+0x67/0x1f0 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427530]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427562]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427596]  [<ffffffffc06bf2f9>] ? xfs_inode_item_push+0x99/0x140 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427625]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427652]  [<ffffffffc06c74b0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Mar 29 15:05:12 vps2 kernel: [1154612.427654]  [<ffffffffb18974c0>] ? kthread+0xe0/0x100
Mar 29 15:05:12 vps2 kernel: [1154612.427656]  [<ffffffffb182476b>] ? __switch_to+0x2bb/0x700
Mar 29 15:05:12 vps2 kernel: [1154612.427657]  [<ffffffffb18973e0>] ? kthread_park+0x60/0x60
Mar 29 15:05:12 vps2 kernel: [1154612.427659]  [<ffffffffb1dfb675>] ? ret_from_fork+0x25/0x30
Mar 29 15:05:12 vps2 kernel: [1154612.427671] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč April 11, 2017, 11:23 a.m. UTC | #24
Hello,
did you get anything useful from this last trace?
It happened again today, sadly I don't have trace, it seems, I forgot to raise error level last time.
PS: I don't know, if I mentioned it somewhere - it's XFSv4 filesystem

Thanks,
Libor

On středa 29. března 2017 15:33:00 CEST Libor Klepáč wrote:
> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> Hello,
> so here is trace from today.
> Hope it helps you
> 
> Libor
> 
> Mar 29 15:05:12 vps2 kernel: [1154612.424431] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x24dd5310
> Mar 29 15:05:12 vps2 kernel: [1154612.424882] XFS (dm-2): Unmount and run xfs_repair
> Mar 29 15:05:12 vps2 kernel: [1154612.425072] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Mar 29 15:05:12 vps2 kernel: [1154612.425262] ffff979235612000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Mar 29 15:05:12 vps2 kernel: [1154612.425478] ffff979235612010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Mar 29 15:05:12 vps2 kernel: [1154612.425708] ffff979235612020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 29 15:05:12 vps2 kernel: [1154612.425960] ffff979235612030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Mar 29 15:05:12 vps2 kernel: [1154612.426259] CPU: 2 PID: 453 Comm: xfsaild/dm-2 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
> Mar 29 15:05:12 vps2 kernel: [1154612.426260] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
> Mar 29 15:05:12 vps2 kernel: [1154612.426262]  0000000000000000 ffffffffb1b29cd5 ffff9794ac29c980 ffff9791295d83c0
> Mar 29 15:05:12 vps2 kernel: [1154612.426264]  ffffffffc064aa58 0000000000000000 0000000001d9a9ff 0000000000000000
> Mar 29 15:05:12 vps2 kernel: [1154612.426266]  ffffbd1ac1fb3d40 ffffffffc0695997 ffff9792b0dd60c0 ffffffffc0693cc4
> Mar 29 15:05:12 vps2 kernel: [1154612.426268] Call Trace:
> Mar 29 15:05:12 vps2 kernel: [1154612.426277]  [<ffffffffb1b29cd5>] ? dump_stack+0x5c/0x77
> Mar 29 15:05:12 vps2 kernel: [1154612.426339]  [<ffffffffc064aa58>] ? xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426361]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426384]  [<ffffffffc0693cc4>] ? _xfs_buf_ioapply+0x84/0x450 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426389]  [<ffffffffb18a2b30>] ? wake_up_q+0x60/0x60
> Mar 29 15:05:12 vps2 kernel: [1154612.426411]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426446]  [<ffffffffc06956d7>] ? xfs_buf_submit+0x67/0x1f0 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426473]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426505]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426531]  [<ffffffffc06bf2f9>] ? xfs_inode_item_push+0x99/0x140 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426558]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426585]  [<ffffffffc06c74b0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.426589]  [<ffffffffb18974c0>] ? kthread+0xe0/0x100
> Mar 29 15:05:12 vps2 kernel: [1154612.426592]  [<ffffffffb182476b>] ? __switch_to+0x2bb/0x700
> Mar 29 15:05:12 vps2 kernel: [1154612.426594]  [<ffffffffb18973e0>] ? kthread_park+0x60/0x60
> Mar 29 15:05:12 vps2 kernel: [1154612.426597]  [<ffffffffb1dfb675>] ? ret_from_fork+0x25/0x30
> Mar 29 15:05:12 vps2 kernel: [1154612.426602] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1323 of file /home/zumbi/linux-4.9.13/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc0694084
> Mar 29 15:05:12 vps2 kernel: [1154612.427033] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> Mar 29 15:05:12 vps2 kernel: [1154612.427323] CPU: 2 PID: 453 Comm: xfsaild/dm-2 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
> Mar 29 15:05:12 vps2 kernel: [1154612.427323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
> Mar 29 15:05:12 vps2 kernel: [1154612.427324]  0000000000000000 ffffffffb1b29cd5 0000000000000008 ffff979528c42000
> Mar 29 15:05:12 vps2 kernel: [1154612.427326]  ffffffffc069c1fa ffffffffc0694084 0000000000000000 ffffbd1ac1fb3d40
> Mar 29 15:05:12 vps2 kernel: [1154612.427328]  ffffffffc0695997 ffff9792b0dd60c0 ffffffffc0694084 0000000000000000
> Mar 29 15:05:12 vps2 kernel: [1154612.427329] Call Trace:
> Mar 29 15:05:12 vps2 kernel: [1154612.427335]  [<ffffffffb1b29cd5>] ? dump_stack+0x5c/0x77
> Mar 29 15:05:12 vps2 kernel: [1154612.427380]  [<ffffffffc069c1fa>] ? xfs_do_force_shutdown+0x13a/0x140 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427408]  [<ffffffffc0694084>] ? _xfs_buf_ioapply+0x444/0x450 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427432]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427458]  [<ffffffffc0694084>] ? _xfs_buf_ioapply+0x444/0x450 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427462]  [<ffffffffb18a2b30>] ? wake_up_q+0x60/0x60
> Mar 29 15:05:12 vps2 kernel: [1154612.427485]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427508]  [<ffffffffc06956d7>] ? xfs_buf_submit+0x67/0x1f0 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427530]  [<ffffffffc0695997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427562]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427596]  [<ffffffffc06bf2f9>] ? xfs_inode_item_push+0x99/0x140 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427625]  [<ffffffffc06c777b>] ? xfsaild+0x2cb/0x7b0 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427652]  [<ffffffffc06c74b0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
> Mar 29 15:05:12 vps2 kernel: [1154612.427654]  [<ffffffffb18974c0>] ? kthread+0xe0/0x100
> Mar 29 15:05:12 vps2 kernel: [1154612.427656]  [<ffffffffb182476b>] ? __switch_to+0x2bb/0x700
> Mar 29 15:05:12 vps2 kernel: [1154612.427657]  [<ffffffffb18973e0>] ? kthread_park+0x60/0x60
> Mar 29 15:05:12 vps2 kernel: [1154612.427659]  [<ffffffffb1dfb675>] ? ret_from_fork+0x25/0x30
> Mar 29 15:05:12 vps2 kernel: [1154612.427671] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč May 24, 2017, 11:18 a.m. UTC | #25
Hi,
after quiet two months , another crash, trace is same as before.
There are no XFS lines before this, kern.log goes back to 23. april.

May 24 12:05:12 vps2 kernel: [3729926.243353] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs], xfs_attr3_leaf block 0x171c60c0
May 24 12:05:12 vps2 kernel: [3729926.247586] XFS (dm-2): Unmount and run xfs_repair
May 24 12:05:12 vps2 kernel: [3729926.249022] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
May 24 12:05:12 vps2 kernel: [3729926.250424] ffff9fbb1d73a000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
May 24 12:05:12 vps2 kernel: [3729926.251870] ffff9fbb1d73a010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
May 24 12:05:12 vps2 kernel: [3729926.253412] ffff9fbb1d73a020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
May 24 12:05:12 vps2 kernel: [3729926.254941] ffff9fbb1d73a030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
May 24 12:05:12 vps2 kernel: [3729926.256478] CPU: 1 PID: 454 Comm: xfsaild/dm-2 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
May 24 12:05:12 vps2 kernel: [3729926.256480] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
May 24 12:05:12 vps2 kernel: [3729926.256483]  0000000000000000 ffffffffb1329cd5 ffff9fba7f543ac0 ffff9fb9cb6ca5a0
May 24 12:05:12 vps2 kernel: [3729926.256488]  ffffffffc06b4a58 0000000000000000 0000000004772296 0000000000000000
May 24 12:05:12 vps2 kernel: [3729926.256492]  ffffab7701f7bd40 ffffffffc06ff997 ffff9fbb830dbac0 ffffffffc06fdcc4
May 24 12:05:12 vps2 kernel: [3729926.256495] Call Trace:
May 24 12:05:12 vps2 kernel: [3729926.256505]  [<ffffffffb1329cd5>] ? dump_stack+0x5c/0x77
May 24 12:05:12 vps2 kernel: [3729926.256585]  [<ffffffffc06b4a58>] ? xfs_attr3_leaf_write_verify+0xe8/0x100 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.256645]  [<ffffffffc06ff997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.256712]  [<ffffffffc06fdcc4>] ? _xfs_buf_ioapply+0x84/0x450 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.256719]  [<ffffffffb10a2b30>] ? wake_up_q+0x60/0x60
May 24 12:05:12 vps2 kernel: [3729926.256781]  [<ffffffffc06ff997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.256845]  [<ffffffffc06ff6d7>] ? xfs_buf_submit+0x67/0x1f0 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.256911]  [<ffffffffc06ff997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.256975]  [<ffffffffc073177b>] ? xfsaild+0x2cb/0x7b0 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.257038]  [<ffffffffc06ff365>] ? xfs_buf_unlock+0x15/0x70 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.257105]  [<ffffffffc073177b>] ? xfsaild+0x2cb/0x7b0 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.257174]  [<ffffffffc07314b0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.257182]  [<ffffffffb10974c0>] ? kthread+0xe0/0x100
May 24 12:05:12 vps2 kernel: [3729926.257186]  [<ffffffffb102476b>] ? __switch_to+0x2bb/0x700
May 24 12:05:12 vps2 kernel: [3729926.257189]  [<ffffffffb10973e0>] ? kthread_park+0x60/0x60
May 24 12:05:12 vps2 kernel: [3729926.257194]  [<ffffffffb15fb675>] ? ret_from_fork+0x25/0x30
May 24 12:05:12 vps2 kernel: [3729926.257201] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1323 of file /home/zumbi/linux-4.9.13/fs/xfs/xfs_buf.c.  Return address = 0xffffffffc06fe084
May 24 12:05:12 vps2 kernel: [3729926.260093] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
May 24 12:05:12 vps2 kernel: [3729926.262115] CPU: 1 PID: 454 Comm: xfsaild/dm-2 Tainted: G            E   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
May 24 12:05:12 vps2 kernel: [3729926.262117] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
May 24 12:05:12 vps2 kernel: [3729926.262120]  0000000000000000 ffffffffb1329cd5 0000000000000008 ffff9fbc275f3000
May 24 12:05:12 vps2 kernel: [3729926.262125]  ffffffffc07061fa ffffffffc06fe084 0000000000000000 ffffab7701f7bd40
May 24 12:05:12 vps2 kernel: [3729926.262129]  ffffffffc06ff997 ffff9fbb830dbac0 ffffffffc06fe084 ffff9fb934f86080
May 24 12:05:12 vps2 kernel: [3729926.262133] Call Trace:
May 24 12:05:12 vps2 kernel: [3729926.262141]  [<ffffffffb1329cd5>] ? dump_stack+0x5c/0x77
May 24 12:05:12 vps2 kernel: [3729926.262228]  [<ffffffffc07061fa>] ? xfs_do_force_shutdown+0x13a/0x140 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262294]  [<ffffffffc06fe084>] ? _xfs_buf_ioapply+0x444/0x450 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262361]  [<ffffffffc06ff997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262427]  [<ffffffffc06fe084>] ? _xfs_buf_ioapply+0x444/0x450 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262433]  [<ffffffffb10a2b30>] ? wake_up_q+0x60/0x60
May 24 12:05:12 vps2 kernel: [3729926.262503]  [<ffffffffc06ff997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262573]  [<ffffffffc06ff6d7>] ? xfs_buf_submit+0x67/0x1f0 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262643]  [<ffffffffc06ff997>] ? xfs_buf_delwri_submit_buffers+0x137/0x260 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262717]  [<ffffffffc073177b>] ? xfsaild+0x2cb/0x7b0 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262789]  [<ffffffffc06ff365>] ? xfs_buf_unlock+0x15/0x70 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262864]  [<ffffffffc073177b>] ? xfsaild+0x2cb/0x7b0 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262939]  [<ffffffffc07314b0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
May 24 12:05:12 vps2 kernel: [3729926.262944]  [<ffffffffb10974c0>] ? kthread+0xe0/0x100
May 24 12:05:12 vps2 kernel: [3729926.262948]  [<ffffffffb102476b>] ? __switch_to+0x2bb/0x700
May 24 12:05:12 vps2 kernel: [3729926.262950]  [<ffffffffb10973e0>] ? kthread_park+0x60/0x60
May 24 12:05:12 vps2 kernel: [3729926.262955]  [<ffffffffb15fb675>] ? ret_from_fork+0x25/0x30
May 24 12:05:12 vps2 kernel: [3729926.263078] XFS (dm-2): Please umount the filesystem and rectify the problem(s)


After filesystem shutdown, before reboot i have upgraded kernel to 4.9.25-1~bpo8+1

Customer was calling us, just when nagios alerted us on host problem, that he was copying some files between accounts (which means changing acls), i have asked him for more information.

Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Libor Klepáč May 24, 2017, 12:24 p.m. UTC | #26
Hello,

On středa 24. května 2017 13:18:47 CEST Libor Klepáč wrote:
> Hi,
> after quiet two months , another crash, trace is same as before.
> There are no XFS lines before this, kern.log goes back to 23. april.
> 
---snip trace----
> 
> After filesystem shutdown, before reboot i have upgraded kernel to 
4.9.25-1~bpo8+1
> 
> Customer was calling us, just when nagios alerted us on host problem, that 
he was copying some files between accounts (which means changing acls), i have 
asked him for more information.

So according to customer, he was on ssh and was copying files from one 
customer account to another dedicated for development, using midnight 
commander.
It involves changing file ACLs.

original dir acl

##getfacl .
# file: .
# owner: cust1
# group: cust1
# flags: -s-
user::rwx
user:backup:--x
user:cust1:rwx
group::rwx
group:www-data:r-x
group:cust1:rwx
mask::rwx
other::---
default:user::rwx
default:user:backup:--x
default:user:cust1:rwx
default:group::rwx
default:group:www-data:r-x
default:group:cust1:rwx
default:mask::rwx
default:other::r-x

destination dir acls
# getfacl .
# file: .
# owner: admin-ssh
# group: www-data
user::rwx
user:backup:--x
group::rwx
group:www-data:rwx
mask::rwx
other::r-x
default:user::rwx
default:user:backup:--x
default:group::rwx
default:group:www-data:rwx
default:mask::rwx
default:other::r-x


----

user admin-ssh is member of www-data group so he can copy from customer dir

Libor

Patch
diff mbox

diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 40cb5f7..b855a10 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -593,7 +593,8 @@  process_leaf_attr_block(
 	stop = xfs_attr3_leaf_hdr_size(leaf);
 
 	/* does the count look sorta valid? */
-	if (leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
+	if (!leafhdr.count ||
+	    leafhdr.count * sizeof(xfs_attr_leaf_entry_t) + stop >
 						mp->m_sb.sb_blocksize) {
 		do_warn(
 	_("bad attribute count %d in attr block %u, inode %" PRIu64 "\n"),