[v3] Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group
diff mbox

Message ID 1423635852-23258-1-git-send-email-forrestl@synology.com
State Under Review
Headers show

Commit Message

Forrest Liu Feb. 11, 2015, 6:24 a.m. UTC
Removing large amount of block group in a transaction may encounters
BUG_ON() in btrfs_orphan_add(). That is because btrfs_orphan_reserve_metadata()
will grab metadata reservation from transaction handle, and
btrfs_delete_unused_bgs() didn't reserve metadata for trnasaction handle when
delete unused block group.

The problem can be reproduce by following script

    mntpath=/btrfs
    loopdev=/dev/loop0
    filepath=/home/forrest/image

    umount $mntpath
    losetup -d $loopdev
    truncate --size 1000g $filepath
    losetup $loopdev $filepath
    mkfs.btrfs -f $loopdev
    mount $loopdev $mntpath

    for j in `seq 1 1 1000`; do
        fallocate -l 1g $mntpath/$j
    done
    # wait cleaner thread remove unused block group
    sleep 300

The call trace that results from the BUG_ON() is:

[  613.093084] ------------[ cut here ]------------
[  613.097928] kernel BUG at fs/btrfs/inode.c:3142!
[  613.105855] invalid opcode: 0000 [#1] SMP
[  613.112702] Modules linked in: coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) snd_ens1371(E) snd_ac97_codec(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) ac97_bus(E) ablk_helper(E) gameport(E) cryptd(E) snd_rawmidi(E) snd_seq_device(E) snd_pcm(E) vmw_balloon(E) snd_timer(E) snd(E) soundcore(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) drm(E) vmw_vmci(E) parport_pc(E) shpchp(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E) btrfs(E) xor(E) raid6_pq(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) ahci(E) libahci(E) e1000(E) mptspi(E) mptscsih(E) mptbase(E) floppy(E) vmw_pvscsi(E) vmxnet3(E)
[  613.144196] CPU: 0 PID: 1480 Comm: btrfs-cleaner Tainted: G            E  3.19.0-rc7-custom #2
[  613.148501] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  613.152694] task: ffff880035cdb1a0 ti: ffff880039cf4000 task.ti: ffff880039cf4000
[  613.154969] RIP: 0010:[<ffffffffa01441c2>]  [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
[  613.157780] RSP: 0018:ffff880039cf7c48  EFLAGS: 00010286
[  613.159560] RAX: 00000000ffffffe4 RBX: ffff88003bd981a0 RCX: ffff88003c9e4000
[  613.161904] RDX: 0000000000002244 RSI: 0000000000040000 RDI: ffff88003c9e4138
[  613.164264] RBP: ffff880039cf7c88 R08: 000060ffc0000850 R09: 0000000000000000
[  613.166507] R10: ffff88003bc4b7a0 R11: ffffea0000eb6740 R12: ffff88003c9c0000
[  613.168681] R13: ffff88003c102160 R14: ffff88003c9c0458 R15: 0000000000000001
[  613.170932] FS:  0000000000000000(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
[  613.173316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  613.175227] CR2: 00007f6343537000 CR3: 0000000036329000 CR4: 00000000000407f0
[  613.177554] Stack:
[  613.178712]  ffff880039cf7c88 ffffffffa0182a54 ffff88003c9e4b04 ffff88003c9c7800
[  613.181297]  ffff88003bc4b7a0 ffff88003bd981a0 ffff88003c8db200 ffff88003c2fcc60
[  613.183782]  ffff880039cf7d18 ffffffffa012da97 ffff88003bc4b7a4 ffff88003bc4b7a0
[  613.186171] Call Trace:
[  613.187493]  [<ffffffffa0182a54>] ? lookup_free_space_inode+0x44/0x100 [btrfs]
[  613.189801]  [<ffffffffa012da97>] btrfs_remove_block_group+0x137/0x740 [btrfs]
[  613.192126]  [<ffffffffa0166912>] btrfs_remove_chunk+0x672/0x780 [btrfs]
[  613.194267]  [<ffffffffa012e2ff>] btrfs_delete_unused_bgs+0x25f/0x280 [btrfs]
[  613.196567]  [<ffffffffa0135e4c>] cleaner_kthread+0x12c/0x190 [btrfs]
[  613.198687]  [<ffffffffa0135d20>] ? check_leaf+0x350/0x350 [btrfs]
[  613.200758]  [<ffffffff8108f232>] kthread+0xd2/0xf0
[  613.202616]  [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180
[  613.204738]  [<ffffffff8175dabc>] ret_from_fork+0x7c/0xb0
[  613.206652]  [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180
[  613.208741] Code: ff ff 0f 1f 80 00 00 00 00 89 45 c8 3e 80 63 80 fd 48 89 df e8 d0 23 fe ff 8b 45 c8 e9 14 ff ff ff b8 f4 ff ff ff e9 12 ff ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
[  613.216562] RIP  [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
[  613.218828]  RSP <ffff880039cf7c48>
[  613.220382] ---[ end trace 71073106deb8a457 ]---

This patch replace btrfs_join_transaction() with btrfs_start_transaction() in
btrfs_delete_unused_bgs() to revent BUG_ON() in btrfs_orphan_add()

Signed-off-by: Forrest Liu <forrestl@synology.com>
---
V2: add reproducing step and call trace in commit message
V3: fix a typo: becuase, in commit message

 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Chris Mason Feb. 17, 2015, 2:55 p.m. UTC | #1
On Wed, Feb 11, 2015 at 1:24 AM, Forrest Liu <forrestl@synology.com> 
wrote:
> Removing large amount of block group in a transaction may encounters
> BUG_ON() in btrfs_orphan_add(). That is because 
> btrfs_orphan_reserve_metadata()
> will grab metadata reservation from transaction handle, and
> btrfs_delete_unused_bgs() didn't reserve metadata for trnasaction 
> handle when
> delete unused block group.
> 
> The problem can be reproduce by following script
> 
>     mntpath=/btrfs
>     loopdev=/dev/loop0
>     filepath=/home/forrest/image
> 
>     umount $mntpath
>     losetup -d $loopdev
>     truncate --size 1000g $filepath
>     losetup $loopdev $filepath
>     mkfs.btrfs -f $loopdev
>     mount $loopdev $mntpath
> 
>     for j in `seq 1 1 1000`; do
>         fallocate -l 1g $mntpath/$j
>     done
>     # wait cleaner thread remove unused block group
>     sleep 300
> 
> The call trace that results from the BUG_ON() is:
> 
> [  613.093084] ------------[ cut here ]------------
> [  613.097928] kernel BUG at fs/btrfs/inode.c:3142!

This should fix a hard to trigger bug we've seen in production here.  I 
was on a different (wrong) track for explaining it, so your timing is 
excellent.

Thanks.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a684086..63b974f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9611,7 +9611,8 @@  void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		 * Want to do this before we do anything else so we can recover
 		 * properly if we fail to join the transaction.
 		 */
-		trans = btrfs_join_transaction(root);
+		/* 1 for btrfs_orphan_reserve_metadata() */
+		trans = btrfs_start_transaction(root, 1);
 		if (IS_ERR(trans)) {
 			btrfs_set_block_group_rw(root, block_group);
 			ret = PTR_ERR(trans);