[v3,2/4] btrfs-progs: lowmem check: Fix false alert about referencer count mismatch
diff mbox

Message ID 20170626103727.8945-2-lufq.fnst@cn.fujitsu.com
State New
Headers show

Commit Message

Lu Fengqi June 26, 2017, 10:37 a.m. UTC
The normal back reference counting doesn't care about the extent referred
by the extent data in the shared leaf. The check_extent_data_backref
function need to skip the leaf that owner mismatch with the root_id.

Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
---
 cmds-check.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Henk Slager July 2, 2017, 1:50 p.m. UTC | #1
On Mon, Jun 26, 2017 at 12:37 PM, Lu Fengqi <lufq.fnst@cn.fujitsu.com> wrote:
> The normal back reference counting doesn't care about the extent referred
> by the extent data in the shared leaf. The check_extent_data_backref
> function need to skip the leaf that owner mismatch with the root_id.
>
> Reported-by: Marc MERLIN <marc@merlins.org>
> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
> ---
>  cmds-check.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/cmds-check.c b/cmds-check.c
> index 70d2b7f2..f42968cd 100644
> --- a/cmds-check.c
> +++ b/cmds-check.c
> @@ -10692,7 +10692,8 @@ static int check_extent_data_backref(struct btrfs_fs_info *fs_info,
>                 leaf = path.nodes[0];
>                 slot = path.slots[0];
>
> -               if (slot >= btrfs_header_nritems(leaf))
> +               if (slot >= btrfs_header_nritems(leaf) ||
> +                   btrfs_header_owner(leaf) != root_id)
>                         goto next;
>                 btrfs_item_key_to_cpu(leaf, &key, slot);
>                 if (key.objectid != objectid || key.type != BTRFS_EXTENT_DATA_KEY)
> --
> 2.13.1

With this patch applied to v4.11, I ran:
# btrfs check -p --mode lowmem /dev/mapper/smr

no 'referencer count mismatch' anymore, but likely due to other hidden
corruption, the check took more time than I had planned, so after 5
days, I cancelled it.

As a summary, both kernel and lowmem check mention the same issue as
it looks like; for the lowmem check it is this, (repeating):
[...]
parent transid verify failed on 6350669414400 wanted 24678 found 24184
parent transid verify failed on 6350645837824 wanted 24678 found 23277
Ignoring transid failure
leaf parent key incorrect 6350645837824
ERROR: extent[6349151535104 16384] backref lost (owner: 2, level: 0)
ERROR: check leaf failed root 2 bytenr 6349151535104 level 0, force
continue check
parent transid verify failed on 6350645837824 wanted 24678 found 23277
Ignoring transid failure
leaf parent key incorrect 6350645837824
ERROR: extent[6349150486528 16384] backref lost (owner: 2, level: 0)
ERROR: check leaf failed root 2 bytenr 6349150486528 level 0, force
continue check
^C

My plan is now to image the whole 8TB fs to extra/new storage hardware
with dd and then see if I can get the copy fixed. But it might take a
year before I do so (it is not critical w.r.t. data-loss, it's cold
storage, multi-year btrfs features test).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lu Fengqi July 3, 2017, 2:55 a.m. UTC | #2
On Sun, Jul 02, 2017 at 03:50:31PM +0200, Henk Slager wrote:
>
>With this patch applied to v4.11, I ran:
># btrfs check -p --mode lowmem /dev/mapper/smr
>
>no 'referencer count mismatch' anymore, but likely due to other hidden
>corruption, the check took more time than I had planned, so after 5
>days, I cancelled it.
>
>As a summary, both kernel and lowmem check mention the same issue as
>it looks like; for the lowmem check it is this, (repeating):
>[...]
>parent transid verify failed on 6350669414400 wanted 24678 found 24184
>parent transid verify failed on 6350645837824 wanted 24678 found 23277
>Ignoring transid failure
>leaf parent key incorrect 6350645837824
>ERROR: extent[6349151535104 16384] backref lost (owner: 2, level: 0)
>ERROR: check leaf failed root 2 bytenr 6349151535104 level 0, force
>continue check
>parent transid verify failed on 6350645837824 wanted 24678 found 23277
>Ignoring transid failure
>leaf parent key incorrect 6350645837824
>ERROR: extent[6349150486528 16384] backref lost (owner: 2, level: 0)
>ERROR: check leaf failed root 2 bytenr 6349150486528 level 0, force
>continue check

This looks like the extent tree has some problems. I would appreciate it
if you could run the following command to dump the extent tree for me?

# btrfs-debug-tree -t 2 /dev/mapper/smr | grep -C 10 -e 6349151535104 -e 6349150486528

>My plan is now to image the whole 8TB fs to extra/new storage hardware
>with dd and then see if I can get the copy fixed. But it might take a
>year before I do so (it is not critical w.r.t. data-loss, it's cold
>storage, multi-year btrfs features test).
>
>

Patch
diff mbox

diff --git a/cmds-check.c b/cmds-check.c
index 70d2b7f2..f42968cd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -10692,7 +10692,8 @@  static int check_extent_data_backref(struct btrfs_fs_info *fs_info,
 		leaf = path.nodes[0];
 		slot = path.slots[0];
 
-		if (slot >= btrfs_header_nritems(leaf))
+		if (slot >= btrfs_header_nritems(leaf) ||
+		    btrfs_header_owner(leaf) != root_id)
 			goto next;
 		btrfs_item_key_to_cpu(leaf, &key, slot);
 		if (key.objectid != objectid || key.type != BTRFS_EXTENT_DATA_KEY)