From patchwork Tue Apr 19 21:55:31 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Trofimovich X-Patchwork-Id: 720301 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p3JLrCDp002923 for ; Tue, 19 Apr 2011 21:53:12 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756047Ab1DSVxI (ORCPT ); Tue, 19 Apr 2011 17:53:08 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:45174 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751044Ab1DSVxF (ORCPT ); Tue, 19 Apr 2011 17:53:05 -0400 Received: by mail-fx0-f46.google.com with SMTP id 17so93502fxm.19 for ; Tue, 19 Apr 2011 14:53:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:date:from:to:cc:subject:message-id:in-reply-to :references:x-mailer:mime-version:content-type; bh=HsvKwgio3T71kgVvEATEEksa0mes+xib6C2VqlZlM0s=; b=YECPUQUzj8zrzHZBjkkznKdUKUanisG+PKdFz7U7YlGUeYswUJrCw8WWnppcTe/Q2P AtToSTUaLkAdliGLi1H9zrCv5zyNhkck7u1cEx+gCt9/49/7n120esharfGEYj6kI3mt zAIqAtfoYJWnxyGalcXMuZdtJTLY99wuyYCr8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type; b=p2ziq33kelZpSGnsFjlmRhldm+aL3AyNhs8E/4iwD22A2LGUq5x3G6wZeIphslLxys PBHMcYgs9gtM2Bvqa19o95WwMTnDlGlp0C8G/om3eOwccmPKFlJhUh0WhPa2W96xc0x0 9tFb+M/PfpgSdQQnxy/OxwQpyCA9CAGLklOjI= Received: by 10.223.27.129 with SMTP id i1mr1317649fac.24.1303249984894; Tue, 19 Apr 2011 14:53:04 -0700 (PDT) Received: from sf ([178.121.206.79]) by mx.google.com with ESMTPS id p16sm87322fax.45.2011.04.19.14.53.01 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 19 Apr 2011 14:53:03 -0700 (PDT) Date: Wed, 20 Apr 2011 00:55:31 +0300 From: Sergei Trofimovich To: liubo Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Arne Jansen Subject: Re: [PATCH] Btrfs: fix easily get into ENOSPC in mixed case Message-ID: <20110420005531.6d395333@sf> In-Reply-To: <4DA29FC1.9050900@cn.fujitsu.com> References: <20110402121946.6bf27f80@sf.home> <4D96EE76.5040208@cn.fujitsu.com> <20110402134132.0391f4fd@sf.home> <4D9708E0.6030206@cn.fujitsu.com> <20110402155545.2159043a@sf.home> <4D9ECAF5.50108@cn.fujitsu.com> <20110409000932.6154c7b5@sf> <20110409001920.089c4409@sf> <20110409005555.4b136f25@sf> <4DA29FC1.9050900@cn.fujitsu.com> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Tue, 19 Apr 2011 21:53:12 +0000 (UTC) On Mon, 11 Apr 2011 14:29:21 +0800 liubo wrote: > On 04/09/2011 05:55 AM, Sergei Trofimovich wrote: > > [ 100.500011] Call Trace: > > [ 100.500011] [] vfs_unlink+0x80/0xf0 > > [ 100.500011] [] do_unlinkat+0x173/0x1b0 > > [ 100.500011] [] ? fsnotify_find_inode_mark+0x3b/0x50 > > [ 100.500011] [] ? filp_close+0x61/0x90 > > [ 100.500011] [] sys_unlinkat+0x1d/0x40 > > [ 100.500011] [] system_call_fastpath+0x16/0x1b > > [ 100.500011] Code: 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 40 00 4c 89 fe 4c 89 ef e8 05 d0 ff ff 85 c0 74 bb 0f 0b <0f> 0b 89 c3 eb cd 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 > > [ 100.500011] RIP [] btrfs_unlink+0xd1/0xe0 [btrfs] > > [ 100.500011] RSP > > [ 100.525672] ---[ end trace 7e63b9144b7307fe ]--- > > > > Looks like I won't be able to test your patch until this thing will go away first. > > Thanks a lot for testing, though. > > I guess something messed up your btrfs metadata, cause when btrfs_unlink() wanted to remove A, > it found that A was just missing... Hello again! I've decided to explore OOps myself a bit. The result is a debugging patch I've attached. I've applied it and you patch [PATCH] Btrfs: fix easily get into ENOSPC in mixed case on top of vanilla 2.6.39-rc4 The result you can see in attached 'fault.log'. There you can see -ENOSPC propagation. No idea why it converted to -ENOENT in the end. The result is very similar with vanilla 2.6.39-rc4. I can run the debugging patch on it as well if you like. I think interesting the parts are (they were found by Arne before): [ 0.040000] new update_space_info: [ 0.040000] space_info has 0 free, is not full [ 0.040000] space_info total=0, used=0, pinned=0, reserved=0, may_use=0, readonly=0 [ 0.040000] new update_space_info: [ 0.040000] space_info has 0 free, is not full [ 0.040000] space_info total=0, used=0, pinned=0, reserved=0, may_use=0, readonly=0 [ 0.040000] new update_space_info: [ 0.040000] space_info has 0 free, is not full [ 0.040000] space_info total=0, used=0, pinned=0, reserved=0, may_use=0, readonly=0 and: [ 0.290000] shrink_delalloc: kick writeback_inodes_sb_nr_if_idle to free 0 pages (yet 0 bytes to wb). [ 0.300000] shrink_delalloc: kick writeback_inodes_sb_nr_if_idle to free 0 pages (yet 0 bytes to wb). [ 0.310000] shrink_delalloc: kick writeback_inodes_sb_nr_if_idle to free 0 pages (yet 0 bytes to wb). [ 0.320000] reserve_metadata_bytes: shrink_delalloc(num_bytes=983040) -> 0 [ 0.320000] reserve_metadata_bytes: [ 0.320000] space_info has 18446744073708568576 free, is not full [ 0.320000] space_info total=0, used=0, pinned=0, reserved=983040, may_use=0, readonly=0 Here I see 2 problems: 1 (major) free space is busted (as the whole space_info). 2 (minor) the original loop of shrink_delalloc busy waited 30 seconds, so I had to trim it; and it seemed to know that additional IO won't reclaim any pages ('yet 0 bytes' string AFAIU); but it's likely a result of busted space_info: it has reserved=983040 more, than total=0. If I'll revert c59021f846881a957ac5afe456d0f59d6a517b61 from vanilla 2.6.39-rc4 I'll be able to delete the files from filesystem. I perform tests in usermode linux, and run 'rm -rf' on the same faulty partition snapshot. I've backed it up, so it's easy to reproduce and debug. I can provide more info if you need. It's a 'readable' (read-only btrfs mount behaves fine) 5GB image. From 56668c999cae1b5ca00e94c63dc1c38eba68e3e1 Mon Sep 17 00:00:00 2001 From: Sergei Trofimovich Date: Wed, 20 Apr 2011 00:11:40 +0300 Subject: [PATCH 2/2] btrfs: my nice debug Signed-off-by: Sergei Trofimovich --- fs/btrfs/ctree.c | 7 +++++++ fs/btrfs/dir-item.c | 6 ++++++ fs/btrfs/extent-tree.c | 25 ++++++++++++++++++++++++- fs/btrfs/inode.c | 5 +++++ 4 files changed, 42 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 84d7ca1..41f8581 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -421,7 +421,10 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, root->root_key.objectid, &disk_key, level, search_start, empty_size); if (IS_ERR(cow)) + { + printk(KERN_INFO "__btrfs_cow_block: btrfs_alloc_free_block ret=%d\n", (int)PTR_ERR(cow)); return PTR_ERR(cow); + } /* cow is set to blocking by btrfs_init_new_buffer */ @@ -1648,6 +1651,7 @@ again: p->slots[level + 1], &b); if (err) { ret = err; + printk(KERN_INFO "1. btrfs_search_slot ret=%d\n", ret); goto done; } } @@ -1692,6 +1696,7 @@ cow_done: goto again; if (err) { ret = err; + printk(KERN_INFO "2. btrfs_search_slot ret=%d\n", ret); goto done; } b = p->nodes[level]; @@ -1711,6 +1716,7 @@ cow_done: goto again; if (err) { ret = err; + printk(KERN_INFO "3. btrfs_search_slot ret=%d\n", ret); goto done; } @@ -1736,6 +1742,7 @@ cow_done: BUG_ON(err > 0); if (err) { ret = err; + printk(KERN_INFO "4. btrfs_search_slot ret=%d\n", ret); goto done; } } diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index dec9348..228a706 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -228,7 +228,10 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow); if (ret < 0) + { + printk (KERN_INFO "btrfs_lookup_dir_item: search slot failed: %d\n", ret); return ERR_PTR(ret); + } if (ret > 0) { if (path->slots[0] == 0) return NULL; @@ -243,6 +246,7 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, found_key.offset != key.offset) return NULL; + printk (KERN_INFO "btrfs_lookup_dir_item -> btrfs_match_dir_item_name\n"); return btrfs_match_dir_item_name(root, path, name, name_len); } @@ -377,6 +381,8 @@ struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root, u32 this_len; struct extent_buffer *leaf; + printk(KERN_INFO "btrfs_match_dir_item_name (name='%s')\n", name); + leaf = path->nodes[0]; dir_item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dir_item); if (verify_dir_item(root, leaf, dir_item)) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 31f33ba..e51adfa 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3016,6 +3016,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, found->full = 0; spin_unlock(&found->lock); *space_info = found; + printk (KERN_INFO "found %s: ", __func__); dump_space_info (*space_info, 0, 0); return 0; } found = kzalloc(sizeof(*found), GFP_NOFS); @@ -3043,6 +3044,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, *space_info = found; list_add_rcu(&found->list, &info->space_info); atomic_set(&found->caching_threads, 0); + printk (KERN_INFO "new %s: ", __func__); dump_space_info (*space_info, 0, 0); return 0; } @@ -3418,6 +3420,8 @@ static int shrink_delalloc(struct btrfs_trans_handle *trans, block_rsv = &root->fs_info->delalloc_block_rsv; space_info = block_rsv->space_info; + printk(KERN_INFO "shrink_delalloc: "); dump_space_info (space_info, 0, 0); + smp_mb(); reserved = space_info->bytes_reserved; progress = space_info->reservation_progress; @@ -3427,11 +3431,13 @@ static int shrink_delalloc(struct btrfs_trans_handle *trans, max_reclaim = min(reserved, to_reclaim); - while (loops < 1024) { + while (loops < 10) { /* have the flusher threads jump in and do some IO */ smp_mb(); nr_pages = min_t(unsigned long, nr_pages, root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT); + printk(KERN_INFO "shrink_delalloc: kick writeback_inodes_sb_nr_if_idle to free %d pages (yet %llu bytes to wb).\n" + , nr_pages, (long long unsigned)root->fs_info->delalloc_bytes); writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages); spin_lock(&space_info->lock); @@ -3446,7 +3452,10 @@ static int shrink_delalloc(struct btrfs_trans_handle *trans, break; if (trans && trans->transaction->blocked) + { + printk(KERN_INFO "shrink_delalloc: ret -EAGAIN\n"); return -EAGAIN; + } time_left = schedule_timeout_interruptible(1); @@ -3499,11 +3508,15 @@ again: if (reserved) num_bytes = 0; + printk(KERN_INFO "reserve_metadata_bytes: "); dump_space_info (space_info, 0, 0); + spin_lock(&space_info->lock); unused = space_info->bytes_used + space_info->bytes_reserved + space_info->bytes_pinned + space_info->bytes_readonly + space_info->bytes_may_use; + printk(KERN_INFO "reserve_metadata_bytes: consider unused=%llu\n", (long long unsigned)unused); + /* * The idea here is that we've not already over-reserved the block group * then we can go ahead and save our reservation first and then start @@ -3558,6 +3571,8 @@ again: * metadata until after the IO is completed. */ ret = shrink_delalloc(trans, root, num_bytes, 1); + printk(KERN_INFO "reserve_metadata_bytes: shrink_delalloc(num_bytes=%llu) -> %d\n" + , (long long unsigned)num_bytes, ret); if (ret > 0) return 0; else if (ret < 0) @@ -5773,11 +5788,14 @@ use_block_rsv(struct btrfs_trans_handle *trans, * the global reserve. */ if (ret && block_rsv != global_rsv) { + printk(KERN_INFO "use_block_rsv (nonglobal metadata rsv) ret=%d. retry global\n", ret); ret = block_rsv_use_bytes(global_rsv, blocksize); if (!ret) return global_rsv; + printk(KERN_INFO "use_block_rsv (retry global metadata rsv) ret=%d\n", ret); return ERR_PTR(ret); } else if (ret) { + printk(KERN_INFO "use_block_rsv (global metadata rsv) ret=%d\n", ret); return ERR_PTR(ret); } return block_rsv; @@ -5802,6 +5820,7 @@ use_block_rsv(struct btrfs_trans_handle *trans, } } + printk(KERN_INFO "the very end of use_block_rsv. -ENOSPC\n"); return ERR_PTR(-ENOSPC); } @@ -5833,12 +5852,16 @@ struct extent_buffer *btrfs_alloc_free_block(struct btrfs_trans_handle *trans, block_rsv = use_block_rsv(trans, root, blocksize); if (IS_ERR(block_rsv)) + { + printk(KERN_INFO "btrfs_alloc_free_block: use_block_rsv error=%d\n", (int)PTR_ERR(block_rsv)); return ERR_CAST(block_rsv); + } ret = btrfs_reserve_extent(trans, root, blocksize, blocksize, empty_size, hint, (u64)-1, &ins, 0); if (ret) { unuse_block_rsv(block_rsv, blocksize); + printk(KERN_INFO "btrfs_alloc_free_block: reserve_extent=%d\n", ret); return ERR_PTR(ret); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fcd66b6..037afe2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2699,6 +2699,7 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans, di = btrfs_lookup_dir_item(trans, root, path, dir->i_ino, name, name_len, -1); if (IS_ERR(di)) { + printk(KERN_INFO "btrfs_lookup_dir_item: IS_ERR(di) = %d\n", (int)PTR_ERR(di)); ret = PTR_ERR(di); goto err; } @@ -2710,7 +2711,9 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans, btrfs_dir_item_key_to_cpu(leaf, di, &key); ret = btrfs_delete_one_dir_name(trans, root, path, di); if (ret) + { goto err; + } btrfs_release_path(root, path); ret = btrfs_del_inode_ref(trans, root, name, name_len, @@ -2762,8 +2765,10 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans, const char *name, int name_len) { int ret; + printk (KERN_INFO"__unlinking '%s'\n", name); ret = __btrfs_unlink_inode(trans, root, dir, inode, name, name_len); if (!ret) { + printk (KERN_INFO"unlinking '%s'\n", name); btrfs_drop_nlink(inode); ret = btrfs_update_inode(trans, root, inode); } -- 1.7.3.4