From patchwork Sun May 20 01:02:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10413069 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AED576037D for ; Sun, 20 May 2018 01:14:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B577286C7 for ; Sun, 20 May 2018 01:14:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F297286CB; Sun, 20 May 2018 01:14:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D0C3F286CA for ; Sun, 20 May 2018 01:14:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752424AbeETBC1 (ORCPT ); Sat, 19 May 2018 21:02:27 -0400 Received: from mout.gmx.net ([212.227.17.21]:58419 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752188AbeETBC0 (ORCPT ); Sat, 19 May 2018 21:02:26 -0400 Received: from [0.0.0.0] ([207.148.91.157]) by mail.gmx.com (mrgmx102 [212.227.17.174]) with ESMTPSA (Nemesis) id 0Mc8Pz-1fdfIa0vgt-00JY2v; Sun, 20 May 2018 03:02:20 +0200 Subject: Re: off-by-one uncompressed invalid ram_bytes corruptions To: Steve Leung , linux-btrfs@vger.kernel.org References: <2dab827b-2c68-ea5c-6730-485037727c36@gmx.com> <0b8e2626-fb00-1e82-4f22-c400cea57533@shaw.ca> From: Qu Wenruo Openpgp: preference=signencrypt Autocrypt: addr=quwenruo.btrfs@gmx.com; prefer-encrypt=mutual; keydata= xsBNBFnVga8BCACyhFP3ExcTIuB73jDIBA/vSoYcTyysFQzPvez64TUSCv1SgXEByR7fju3o 8RfaWuHCnkkea5luuTZMqfgTXrun2dqNVYDNOV6RIVrc4YuG20yhC1epnV55fJCThqij0MRL 1NxPKXIlEdHvN0Kov3CtWA+R1iNN0RCeVun7rmOrrjBK573aWC5sgP7YsBOLK79H3tmUtz6b 9Imuj0ZyEsa76Xg9PX9Hn2myKj1hfWGS+5og9Va4hrwQC8ipjXik6NKR5GDV+hOZkktU81G5 gkQtGB9jOAYRs86QG/b7PtIlbd3+pppT0gaS+wvwMs8cuNG+Pu6KO1oC4jgdseFLu7NpABEB AAHNIlF1IFdlbnJ1byA8cXV3ZW5ydW8uYnRyZnNAZ214LmNvbT7CwJQEEwEIAD4CGwMFCwkI BwIGFQgJCgsCBBYCAwECHgECF4AWIQQt33LlpaVbqJ2qQuHCPZHzoSX+qAUCWdWCnQUJCWYC bgAKCRDCPZHzoSX+qAR8B/94VAsSNygx1C6dhb1u1Wp1Jr/lfO7QIOK/nf1PF0VpYjTQ2au8 ihf/RApTna31sVjBx3jzlmpy+lDoPdXwbI3Czx1PwDbdhAAjdRbvBmwM6cUWyqD+zjVm4RTG rFTPi3E7828YJ71Vpda2qghOYdnC45xCcjmHh8FwReLzsV2A6FtXsvd87bq6Iw2axOHVUax2 FGSbardMsHrya1dC2jF2R6n0uxaIc1bWGweYsq0LXvLcvjWH+zDgzYCUB0cfb+6Ib/ipSCYp 3i8BevMsTs62MOBmKz7til6Zdz0kkqDdSNOq8LgWGLOwUTqBh71+lqN2XBpTDu1eLZaNbxSI ilaVzsBNBFnVga8BCACqU+th4Esy/c8BnvliFAjAfpzhI1wH76FD1MJPmAhA3DnX5JDORcga CbPEwhLj1xlwTgpeT+QfDmGJ5B5BlrrQFZVE1fChEjiJvyiSAO4yQPkrPVYTI7Xj34FnscPj /IrRUUka68MlHxPtFnAHr25VIuOS41lmYKYNwPNLRz9Ik6DmeTG3WJO2BQRNvXA0pXrJH1fN GSsRb+pKEKHKtL1803x71zQxCwLh+zLP1iXHVM5j8gX9zqupigQR/Cel2XPS44zWcDW8r7B0 q1eW4Jrv0x19p4P923voqn+joIAostyNTUjCeSrUdKth9jcdlam9X2DziA/DHDFfS5eq4fEv ABEBAAHCwHwEGAEIACYWIQQt33LlpaVbqJ2qQuHCPZHzoSX+qAUCWdWBrwIbDAUJA8JnAAAK CRDCPZHzoSX+qA3xB/4zS8zYh3Cbm3FllKz7+RKBw/ETBibFSKedQkbJzRlZhBc+XRwF61mi f0SXSdqKMbM1a98fEg8H5kV6GTo62BzvynVrf/FyT+zWbIVEuuZttMk2gWLIvbmWNyrQnzPl mnjK4AEvZGIt1pk+3+N/CMEfAZH5Aqnp0PaoytRZ/1vtMXNgMxlfNnb96giC3KMR6U0E+siA 4V7biIoyNoaN33t8m5FwEwd2FQDG9dAXWhG13zcm9gnk63BN3wyCQR+X5+jsfBaS4dvNzvQv h8Uq/YGjCoV1ofKYh3WKMY8avjq25nlrhzD/Nto9jHp8niwr21K//pXVA81R2qaXqGbql+zo Message-ID: <5093c14b-5d6d-0827-0c04-bf2fd73af0bd@gmx.com> Date: Sun, 20 May 2018 09:02:15 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <0b8e2626-fb00-1e82-4f22-c400cea57533@shaw.ca> X-Provags-ID: V03:K1:cz5kG/3WxTWY3ZKJB7bcet/X2xWoWeyxaaem2zKo1BgrkV4XwTj BxFtnP8tSuOyS81r/DM5pP+2zyiLRyuiA7+d1ZiBEb9UuqQsZZzHVuQAWNa3CC3/Rk9JmBT yliGgezTHoCbjY7uuDg5XT+BtcqmciKDamEH+rPfxJpEWo270UJk8kj/tHkdWS+TOWXM1f5 S2Y3DZPoZBElHpaWUy/dA== X-UI-Out-Filterresults: notjunk:1; V01:K0:ewCGSh6zC2Y=:4hLX2Y07leiNH9INx1eAU1 MD9aANN7t4eCp5+DadZKhvct8nqSqcDIQQnx3IzgqXFjQwymCSiH6ZsyaBCRMUBpPJTw0LhkX zfKhEvhuOSEvD6urIbVnm49zyzNGiS5clGLM7ZBMxAqzy78fASD8JL9LuMwxOdS4swkVnktAg it5H/7lIS83WEkL76SK466C5pRhkNwqq3KcIDU67/YTst95lGAa26sa19omkN9guxgFXtWUMG BSAPBfxL535yH91tyLiouzusPRHD289SUvuKQaI/4cjGKLjHZeuHG4kP4oGUwVlm7m2oOZdnn ivvM57WK5ugyym+rsAdJjhF7hUC/ZX7kJ5YW9uy1LCaPu/nCCuKjbV8LY7tyJrAJmWHD4Qm7M eNMElB6gKzwmXAMtE9mL90MqB/5FiNe7QXft2Mo5jeUmbDAiMXy1lV7ebUdj472NXjoXjcHZP kV4RqhCi+pqgQ8X7Mgnrl7+nSvP5KCM9osgzacnyy9XH9Wi9GUKET4rT7PIvX5grl7t6ZDthd x/k9bvY4oeViFLn8BhBootxTFbkGRCwhfUutJ2rah/XlApVarTWubBaerBkGQhiUEOZkSEgak Cumbi6aP2sYJeM5Y12DO3EmIdNq2LXZ40oj+BnUiXQyZ4g8SBQjIyhYDVytHqmDb6OAea+YyQ lCKRTNEI7vgxOmbItx+J6dvFKT3pZFo4B1te+X2J7AGyAc7eBOFkvm/wbiRWoIOZQfbdUjAsC Lp9qkPFix6b714AAtonsnmlOv2oHQ7XZsHyyFIAe3g5XCBmRAeth+nja4l0= Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 2018年05月20日 07:40, Steve Leung wrote: > On 05/17/2018 11:49 PM, Qu Wenruo wrote: >> On 2018年05月18日 13:23, Steve Leung wrote: >>> Hi list, >>> >>> I've got 3-device raid1 btrfs filesystem that's throwing up some >>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've >>> observed lately: > > Evidently I forgot that I added a fourth device to this system, from the > info below, but I don't think it matters.  :) > >>>    BTRFS critical (device sda1): corrupt leaf: root=1 >>> block=4970196795392 >>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed >>> inline extent, have 3468 expect 3469 >> >> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to >> dump the leaf? > > Attached btrfs-debug-tree dumps for all of the blocks that I saw > messages for. > >> It's caught by tree-checker code which is ensuring all tree blocks are >> correct before btrfs can take use of them. >> >> That inline extent size check is tested, so I'm wondering if this >> indicates any real corruption. >> That btrfs-debug-tree output will definitely help. >> >> BTW, if I didn't miss anything, there should not be any inlined extent >> in root tree. >> >>>    BTRFS critical (device sda1): corrupt leaf: root=1 >>> block=4970552426496 >>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed >>> inline extent, have 3496 expect 3497 >> >> Same dump will definitely help. >> >>>    BTRFS critical (device sda1): corrupt leaf: root=1 >>> block=4970712399872 >>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed >>> inline extent, have 1790 expect 1791 >>>    BTRFS critical (device sda1): corrupt leaf: root=1 >>> block=4970803920896 >>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed >>> inline extent, have 2475 expect 2476 >>>    BTRFS critical (device sda1): corrupt leaf: root=1 >>> block=4970987945984 >>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed >>> inline extent, have 490 expect 491 >>> >>> All of them seem to be 1 short of the expected value. >>> >>> Some files do seem to be inaccessible on the filesystem, and btrfs >>> inspect-internal on any of those inode numbers fails with: >>> >>>   ERROR: ino paths ioctl: Input/output error >>> >>> and another message for that inode appears. >>> >>> 'btrfs check' (output attached) seems to notice these corruptions (among >>> a few others, some of which seem to be related to a problematic attempt >>> to build Android I posted about some months ago). >>> >>> Other information: >>> >>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has >>> about 25 snapshots at the moment, only a handful of compressed files, >>> and nothing fancy like qgroups enabled. >>> >>> btrfs fi show: >>> >>>   Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82 >>>           Total devices 4 FS bytes used 2.48TiB >>>           devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1 >>>           devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1 >>>           devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1 >>>           devid    4 size 3.49TiB used 2.49TiB path /dev/sda1 >>> >>> btrfs fi df: >>> >>>   Data, RAID1: total=2.49TiB, used=2.48TiB >>>   System, RAID1: total=32.00MiB, used=416.00KiB >>>   Metadata, RAID1: total=7.00GiB, used=5.29GiB >>>   GlobalReserve, single: total=512.00MiB, used=0.00B >>> >>> dmesg output attached as well. >>> >>> Thanks in advance for any assistance!  I have backups of all the >>> important stuff here but it would be nice to fix the corruptions in >>> place. >> >> And btrfs check doesn't report the same problem as the default original >> mode doesn't have such check. >> >> Please also post the result of "btrfs check --mode=lowmem /dev/sda1" > > Also, attached.  It seems to notice the same off-by-one problems, though > there also seem to be a couple of examples of being off by more than one. Unfortunately, it doesn't detect, as there is no off-by-one error at all. The problem is, kernel is reporting error on completely fine leaf. Further more, even in the same leaf, there are more inlined extents, and they are all valid. So the kernel reports the error out of nowhere. More problems happens for extent_size where a lot of them is offset by one. Moreover, the root owner is not printed correctly, thus I'm wondering if the memory is corrupted. Please try memtest+ to verify all your memory is correct, and if so, please try the attached patch and to see if it provides extra info. > > Thanks for looking at this!  I'll get my backups ready, just in case. > > Steve From 3540534d0ff8b6e9dc200f9dff92b8a5afa7d384 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Sun, 20 May 2018 09:01:43 +0800 Subject: [PATCH] btrfs: tree-checker: Add extra inline extent ram_bytes debug info Signed-off-by: Qu Wenruo --- fs/btrfs/tree-checker.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 8d40e7dd8c30..3a4534e7068e 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -163,8 +163,10 @@ static int check_extent_data_item(struct btrfs_fs_info *fs_info, if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + btrfs_file_extent_ram_bytes(leaf, fi)) { file_extent_err(fs_info, leaf, slot, - "invalid ram_bytes for uncompressed inline extent, have %u expect %llu", + "invalid ram_bytes for uncompressed inline extent, have %u expect %llu (%lu + %llu)", item_size, BTRFS_FILE_EXTENT_INLINE_DATA_START + + btrfs_file_extent_ram_bytes(leaf, fi), + BTRFS_FILE_EXTENT_INLINE_DATA_START, btrfs_file_extent_ram_bytes(leaf, fi)); return -EUCLEAN; } -- 2.17.0