From patchwork Sat Jul 13 16:14:04 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsOpcsO0bWUgQ2FycmV0ZXJv?= X-Patchwork-Id: 2827174 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id AE8D3C0AB2 for ; Sat, 13 Jul 2013 16:18:09 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8C70C20116 for ; Sat, 13 Jul 2013 16:18:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DE03020115 for ; Sat, 13 Jul 2013 16:18:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752303Ab3GMQSC (ORCPT ); Sat, 13 Jul 2013 12:18:02 -0400 Received: from zougloub.eu ([188.165.233.99]:47797 "EHLO zougloub.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751399Ab3GMQSA (ORCPT ); Sat, 13 Jul 2013 12:18:00 -0400 X-Greylist: delayed 371 seconds by postgrey-1.27 at vger.kernel.org; Sat, 13 Jul 2013 12:18:00 EDT Received: from Bidule (unknown [192.168.20.2]) by zougloub.eu (Postfix) with ESMTPA id 681A96A5B2; Sat, 13 Jul 2013 18:11:47 +0200 (CEST) Date: Sat, 13 Jul 2013 12:14:04 -0400 From: =?UTF-8?B?SsOpcsO0bWU=?= Carretero To: linux-btrfs , Josef Bacik Subject: Troublesome failure mode and recovery Message-ID: <20130713121404.65fc89ea@Bidule> Organization: none X-Mailer: Claws Mail 3.9.1 (GTK+ 2.24.18; x86_64-pc-linux-gnu) Mime-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi there, Experiencing an broken FS in a state I haven't seen before. I was running linux-3.10 on my laptop, which I had tried to put to sleep with an external btrfs partition attached. On resume, the external partition was lost. I was able to unmount it, despite many kernel warnings. Then I remounted it... and unplugged the USB cable. Then I couldn't unmount it. Well, too bad, not a big deal. I ran alt+sysrq+s, waited a little, ran alt+sysrq+b. And on reboot, my root partition (also btrfs) was unmountable, with the error: [ 1.150000] btrfs bad tree block start 0 1531035648 [ 1.150000] btrfs: failed to read log tree [ 1.150000] btrfs: open_ctree failed Then I did the following: - Tested various mount flags (some by memory, some by looking at the `fs/btrfs/super.c` code (recovery,clear_cache...) - Took the drive (Lenovo-branded Micron RealSSD 400) to another computer and made an image of this partition, because this issue could be of use, and I have some recent documents that I'd like to recover in some way. - Run various btrfs-progs utilities on the partition - Edit the kernel btrfs code and attempt to mount the partition from a user-mode linux kernel. The results are the following: - `btrfs-restore` only works with `-u 1`, so the first superblock data has an issue - `btrfsck` was crashing because the code would progress even if fs_root was null... fixed with this patch: But /sbin/init, /bin/bash wouldn't fire up because of btrfs errors. Looks like some inodes are broken. Somehow /usr/bin/python could start, which made me happy. Within the UML instance with python, I cannot do `ls` (`os.listdir()`) on my home folder (`/home/cJ`), and btrfs-restore only restores a few dot files in there. But I can get inode numbers and read files or subdirectories beyond this folder. And it looks like btrfs-debug-tree can find transactions containing older updated directory inodes. I can also do stat() calls on files, and to call `/sbin/btrfs` (using `subprocess.Popen` not `os.system()`). If this were a FAT partition, I would be able to recover data in subfolders even if the parent folder inode is broken. I assume the same thing is possible with btrfs, and even more, given that there are probably older copies of the `/home/cJ` directory entries from older transactions hanging around somewhere. But I am no btrfs specialist, so I can't get this data. Ideally I would like to be able to mount an older generation, or re-patch older directory inodes where the newer directories cannot be read. Having btrfs-restore able to restore sub-directories of a certain generation would also be very helpful. So I have my disk image, linux and btrfs-progs from git, a bootable UML, and can allocate some time to this issue. Your help is welcome. Thanks, diff --git a/cmds-check.c b/cmds-check.c index 8015288..be3e329 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -5777,6 +5777,11 @@ int cmd_check(int argc, char **argv) root = info->fs_root; + if (root == NULL) { + fprintf(stderr, "Error finding FS root\n"); + return -EIO; + } + if (init_extent_tree) { printf("Creating a new extent tree\n"); ret = reinit_extent_tree(info); - The linux kernel code patched with the following ugly hack would (somehow) boot: diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b8b60b6..0807f4d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2627,6 +2627,14 @@ retry_root_backup: tree_root->node = read_tree_block(tree_root, btrfs_super_root(disk_super), blocksize, generation); + + if (1) { // ugly hack to force using the second superblock + static int i = 0; + if (i++ == 0) { + goto recovery_tree_root; + } + } + if (!tree_root->node || !test_bit(EXTENT_BUFFER_UPTODATE, &tree_root->node->bflags)) { printk(KERN_WARNING "btrfs: failed to read tree root on %s\n",