Message ID | 1456933778-7944-1-git-send-email-fdmanana@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
fdmanana posted on Wed, 02 Mar 2016 15:49:38 +0000 as excerpted: > When looking for orphan roots during mount we can end up hitting a > BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is > replayed and qgroups are enabled. This should hit 4.6, right? Will it hit 4.5 before release? Because I wasn't sure of current quota functionality status, but this bug obviously resets the counter on my ongoing "two kernel cycles with no known quota bugs before you try to use quotas" recommendation. Meanwhile, what /is/ current quota feature status? Other than this bug, is it now considered known bug free, or is more quota reworking and/or bug fixing known to be needed for 4.6 and beyond? IOW, given that two release cycles no known bugs counter, are we realistically looking at that being 4.8, or are we now looking at 4.9 or beyond for reasonable quota stability?
Duncan wrote on 2016/03/03 04:31 +0000: > fdmanana posted on Wed, 02 Mar 2016 15:49:38 +0000 as excerpted: > >> When looking for orphan roots during mount we can end up hitting a >> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >> replayed and qgroups are enabled. > > This should hit 4.6, right? Will it hit 4.5 before release? > > Because I wasn't sure of current quota functionality status, but this bug > obviously resets the counter on my ongoing "two kernel cycles with no > known quota bugs before you try to use quotas" recommendation. IMHO, btrfs quota is *functionally* stable. Which means, its main function, quota accounting is stable, under almost all operation. There will be some hidden corner like this one, which is not easy to spot during rework. (Although it seems the regression is not caused by qgroup rework though) > > Meanwhile, what /is/ current quota feature status? Other than this bug, > is it now considered known bug free, or is more quota reworking and/or > bug fixing known to be needed for 4.6 and beyond? AFAIK, no planed rework for qgroup, and the most recent large qgroup modification is by Mark Fasheh, allowing btrfs subvolume remove to update qgroup accouting correctly, at Nov 2015. > > IOW, given that two release cycles no known bugs counter, are we > realistically looking at that being 4.8, or are we now looking at 4.9 or > beyond for reasonable quota stability? > Never heard the 2 release cycles principle, but seems to be not flex enough. From this point of view, every time Filipe(just an example, as he finds the most of bugs and corner case), some part or even the whole btrfs is not stable for 4 months. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
wrote on 2016/03/02 15:49 +0000: > From: Filipe Manana <fdmanana@suse.com> > > When looking for orphan roots during mount we can end up hitting a > BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is > replayed and qgroups are enabled. This is because after a log tree is > replayed, a transaction commit is made, which triggers qgroup extent > accounting which in turn does backref walking which ends up reading and > inserting all roots in the radix tree fs_info->fs_root_radix, including > orphan roots (deleted snapshots). So after the log tree is replayed, when > finding orphan roots we hit the BUG_ON with the following trace: > > [118209.182438] ------------[ cut here ]------------ > [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314! > [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse > processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata > virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs] > [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G W 4.5.0-rc5-btrfs-next-24+ #1 > [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000 > [118209.186318] RIP: 0010:[<ffffffffa04237d7>] [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] > [118209.186318] RSP: 0018:ffff8800af34faa8 EFLAGS: 00010246 > [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001 > [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff > [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000 > [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000 > [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000 > [118209.186318] FS: 00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000 > [118209.186318] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0 > [118209.186318] Stack: > [118209.186318] fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000 > [118209.186318] fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000 > [118209.186318] 0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000 > [118209.186318] Call Trace: > [118209.186318] [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs] > [118209.186318] [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs] > [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf > [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 > [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde > [118209.186318] [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs] > [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf > [118209.186318] [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3 > [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 > [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde > [118209.186318] [<ffffffff81195637>] do_mount+0x8a6/0x9e8 > [118209.186318] [<ffffffff8119598d>] SyS_mount+0x77/0x9f > [118209.186318] [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b > [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b > 4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00 > [118209.186318] RIP [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] > [118209.186318] RSP <ffff8800af34faa8> > [118209.230735] ---[ end trace 83938f987d85d477 ]--- > > So fix this by not treating the error -EEXIST, returned when attempting > to insert a root already inserted by the backref walking code, as an error. > > The following test case for xfstests reproduces the bug: > > seq=`basename $0` > seqres=$RESULT_DIR/$seq > echo "QA output created by $seq" > tmp=/tmp/$$ > status=1 # failure is the default! > trap "_cleanup; exit \$status" 0 1 2 3 15 > > _cleanup() > { > _cleanup_flakey > cd / > rm -f $tmp.* > } > > # get standard environment, filters and checks > . ./common/rc > . ./common/filter > . ./common/dmflakey > > # real QA test starts here > _supported_fs btrfs > _supported_os Linux > _require_scratch > _require_dm_target flakey > _require_metadata_journaling $SCRATCH_DEV > > rm -f $seqres.full > > _scratch_mkfs >>$seqres.full 2>&1 > _init_flakey > _mount_flakey > > _run_btrfs_util_prog quota enable $SCRATCH_MNT > > # Create 2 directories with one file in one of them. > # We use these just to trigger a transaction commit later, moving the file from > # directory a to directory b and doing an fsync against directory a. > mkdir $SCRATCH_MNT/a > mkdir $SCRATCH_MNT/b > touch $SCRATCH_MNT/a/f > sync > > # Create our test file with 2 4K extents. > $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > > # Create a snapshot and delete it. This doesn't really delete the snapshot > # immediately, just makes it inaccessible and invisible to user space, the > # snapshot is deleted later by a dedicated kernel thread (cleaner kthread) > # which is woke up at the next transaction commit. > # A root orphan item is inserted into the tree of tree roots, so that if a > # power failure happens before the dedicated kernel thread does the snapshot > # deletion, the next time the filesystem is mounted it resumes the snapshot > # deletion. > _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap > _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap > > # Now overwrite half of the extents we wrote before. Because we made a snapshpot > # before, which isn't really deleted yet (since no transaction commit happened > # after we did the snapshot delete request), the non overwritten extents get > # referenced twice, once by the default subvolume and once by the snapshot. > $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > > # Now move file f from directory a to directory b and fsync directory a. > # The fsync on the directory a triggers a transaction commit (because a file > # was moved from it to another directory) and the file fsync leaves a log tree > # with file extent items to replay. > mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar > > echo "File digest before power failure:" > md5sum $SCRATCH_MNT/foobar | _filter_scratch > > # Now simulate a power failure and mount the filesystem to replay the log tree. > # After the log tree was replayed, we used to hit a BUG_ON() when processing > # the root orphan item for the deleted snapshot. This is because when processing > # an orphan root the code expected to be the first code inserting the root into > # the fs_info->fs_root_radix radix tree, while in reallity it was the second > # caller attempting to do it - the first caller was the transaction commit that > # took place after replaying the log tree, when updating the qgroup counters. > _flakey_drop_and_remount > > echo "File digest before after failure:" > # Must match what he got before the power failure. > md5sum $SCRATCH_MNT/foobar | _filter_scratch > > _unmount_flakey > status=0 > exit > > Fixes: 2d9e97761087 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref") > Cc: stable@vger.kernel.org # 4.4+ > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Looks good, and the comment is clear enough. Thanks for your long effort to spot and fix corner cases like this. Thanks, Qu > --- > fs/btrfs/root-tree.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c > index a25f3b2..9fcd6df 100644 > --- a/fs/btrfs/root-tree.c > +++ b/fs/btrfs/root-tree.c > @@ -310,8 +310,16 @@ int btrfs_find_orphan_roots(struct btrfs_root *tree_root) > set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state); > > err = btrfs_insert_fs_root(root->fs_info, root); > + /* > + * The root might have been inserted already, as before we look > + * for orphan roots, log replay might have happened, which > + * triggers a transaction commit and qgroup accounting, which > + * in turn reads and inserts fs roots while doing backref > + * walking. > + */ > + if (err == -EEXIST) > + err = 0; > if (err) { > - BUG_ON(err == -EEXIST); > btrfs_free_fs_root(root); > break; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Qu Wenruo posted on Thu, 03 Mar 2016 14:26:45 +0800 as excerpted: > Never heard the 2 release cycles principle, but seems to be not flex > enough. > From this point of view, every time Filipe(just an example, as he finds > the most of bugs and corner case), some part or even the whole btrfs is > not stable for 4 months. Well, once a feature is considered stable (relative to the rest of btrfs, anyway), the two-releases-without-bugs counter/clock no longer resets, and it's simply bugs in otherwise stable features as opposed to more bugs in still unstable features, resetting the clock on those features and preventing them from reaching (my measure of) stability. And it's more or less just a general rule of thumb, anyway, with per- feature and per-bug adjustments based on the amount of new code in the feature and how critical the bug is or isn't. But in my experience it /does/ seem a relatively good general rule of thumb level guideline, provided it is taken /as/ a rule of thumb level guideline and not applied too inflexibly. The other factor, however, would be the relative view into bugs... I suppose it's reasonably well known that in practice, one has to be a bit cautious about evaluating the stability of a project by the raw number of scary looking problems reported on the mailing list or bug tracker, in part, because that's what those are /for/, and while they're good at tracking that, they don't normally yield a good picture at all of all the hundreds or thousands or tens of thousands or millions actually using the project without problems at all. By the same token, the developer's viewpoint of a feature is likely to see quite a few more bugs, due to simple familarity with the topic and exposure on multiple channels (IRC/btrfs-list/private-mail/lkml/ filesystems-list/kernel-bugzilla/one-or-more-distro-lists...), than someone like me, a simple user/admin, tracking perhaps one or two of those channels. There's a lot of feature bugs that a feature developer is going to be aware of, that simply won't rise to my level of consciousness. But by the same token, if multiple people are suddenly reporting an issue, as will likely happen for the serious bugs, I'm likely to see one and possibly more reports of it here, and /will/ be aware of it. So what I'm saying is that, at my level of awareness at least, and assuming it is taken as the rule of thumb guideline I intend it as, the two releases without a known bug in a feature /guideline/ has from my experience turned out to be reasonably practical and reliable, tho I'd expect it would not and could not be workable at that level, applied by a feature dev, because by definition, that feature dev is going to know about way more bugs in the feature than I will, and as you said, applying it in that case would mean the feature /never/ stabilizes. Does that make more sense, now? Now back to the specific feature in question, btrfs quotas. Thanks for affirming that they are in general considered workable now. I'll mention that as developer perspective when I make my recommendations, and it will certainly positively influence my own recommendations, as well, tho I'll mention that there are still corner-case bugs being worked out, so recommend following quota related discussion and patches on the list for now as well. But considering it ready for normal use is already beyond what I felt ready to recommend before, so it's already a much more positive recommendation now that previously, even if it's still tempered with "but keep up with the list discussion and current on your kernels, and be aware there are still occasional corner-cases being worked out" as a caveat, which it should be said, is only slightly stronger than the general recommendation for btrfs itself.
Duncan wrote on 2016/03/03 07:44 +0000: > Qu Wenruo posted on Thu, 03 Mar 2016 14:26:45 +0800 as excerpted: > >> Never heard the 2 release cycles principle, but seems to be not flex >> enough. >> From this point of view, every time Filipe(just an example, as he finds >> the most of bugs and corner case), some part or even the whole btrfs is >> not stable for 4 months. > > Well, once a feature is considered stable (relative to the rest of btrfs, > anyway), the two-releases-without-bugs counter/clock no longer resets, > and it's simply bugs in otherwise stable features as opposed to more bugs > in still unstable features, resetting the clock on those features and > preventing them from reaching (my measure of) stability. > > And it's more or less just a general rule of thumb, anyway, with per- > feature and per-bug adjustments based on the amount of new code in the > feature and how critical the bug is or isn't. > > But in my experience it /does/ seem a relatively good general rule of > thumb level guideline, provided it is taken /as/ a rule of thumb level > guideline and not applied too inflexibly. > > The other factor, however, would be the relative view into bugs... > > I suppose it's reasonably well known that in practice, one has to be a > bit cautious about evaluating the stability of a project by the raw > number of scary looking problems reported on the mailing list or bug > tracker, in part, because that's what those are /for/, and while they're > good at tracking that, they don't normally yield a good picture at all of > all the hundreds or thousands or tens of thousands or millions actually > using the project without problems at all. > > By the same token, the developer's viewpoint of a feature is likely to > see quite a few more bugs, due to simple familarity with the topic and > exposure on multiple channels (IRC/btrfs-list/private-mail/lkml/ > filesystems-list/kernel-bugzilla/one-or-more-distro-lists...), than > someone like me, a simple user/admin, tracking perhaps one or two of > those channels. > > There's a lot of feature bugs that a feature developer is going to be > aware of, that simply won't rise to my level of consciousness. But by > the same token, if multiple people are suddenly reporting an issue, as > will likely happen for the serious bugs, I'm likely to see one and > possibly more reports of it here, and /will/ be aware of it. > > So what I'm saying is that, at my level of awareness at least, and > assuming it is taken as the rule of thumb guideline I intend it as, the > two releases without a known bug in a feature /guideline/ has from my > experience turned out to be reasonably practical and reliable, tho I'd > expect it would not and could not be workable at that level, applied by a > feature dev, because by definition, that feature dev is going to know > about way more bugs in the feature than I will, and as you said, applying > it in that case would mean the feature /never/ stabilizes. > > Does that make more sense, now? Make sense now. > > Now back to the specific feature in question, btrfs quotas. Thanks for > affirming that they are in general considered workable now. I'll mention > that as developer perspective when I make my recommendations, and it will > certainly positively influence my own recommendations, as well, tho I'll > mention that there are still corner-case bugs being worked out, so > recommend following quota related discussion and patches on the list for > now as well. But considering it ready for normal use is already beyond > what I felt ready to recommend before, so it's already a much more > positive recommendation now that previously, even if it's still tempered > with "but keep up with the list discussion and current on your kernels, > and be aware there are still occasional corner-cases being worked out" as > a caveat, which it should be said, is only slightly stronger than the > general recommendation for btrfs itself. > Thanks for your recommendation about qgroup, I'm also seeking some feedback from end users to either spot some corner case or enhance UI related design. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 3, 2016 at 4:31 AM, Duncan <1i5t5.duncan@cox.net> wrote: > fdmanana posted on Wed, 02 Mar 2016 15:49:38 +0000 as excerpted: > >> When looking for orphan roots during mount we can end up hitting a >> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >> replayed and qgroups are enabled. > > This should hit 4.6, right? Will it hit 4.5 before release? It's not the first time you do a similar question, and if it's targeted at me, all I can tell you is I don't know. It's the maintainers (Chris, Josef, David) who decide when to pick patches and for which releases. > > Because I wasn't sure of current quota functionality status, but this bug > obviously resets the counter on my ongoing "two kernel cycles with no > known quota bugs before you try to use quotas" recommendation. You shouldn't spread such affirmation with such a level of certainty every time a user reports a problem. There are many bugs affecting the last 2 to 3 releases, but there are also many bugs present since btrfs was added to the linux kernel tree, and many others present for 2+ years, etc. > > Meanwhile, what /is/ current quota feature status? Other than this bug, > is it now considered known bug free, or is more quota reworking and/or > bug fixing known to be needed for 4.6 and beyond? > > IOW, given that two release cycles no known bugs counter, are we > realistically looking at that being 4.8, or are we now looking at 4.9 or > beyond for reasonable quota stability? I don't know. I generally don't look actively look at qgroups, and I'm not a user either. You can only take conclusions based on user bug reports. Probably there aren't more bugs for qgroups than there are for send/receive or even non-btrfs specific features for example. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 3, 2016 at 6:29 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > wrote on 2016/03/02 15:49 +0000: >> >> From: Filipe Manana <fdmanana@suse.com> >> >> When looking for orphan roots during mount we can end up hitting a >> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >> replayed and qgroups are enabled. This is because after a log tree is >> replayed, a transaction commit is made, which triggers qgroup extent >> accounting which in turn does backref walking which ends up reading and >> inserting all roots in the radix tree fs_info->fs_root_radix, including >> orphan roots (deleted snapshots). So after the log tree is replayed, when >> finding orphan roots we hit the BUG_ON with the following trace: >> >> [118209.182438] ------------[ cut here ]------------ >> [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314! >> [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC >> [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic >> ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm >> psmouse >> processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 >> crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix >> libata >> virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs] >> [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G W >> 4.5.0-rc5-btrfs-next-24+ #1 >> [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS by qemu-project.org 04/01/2014 >> [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: >> ffff8800af34c000 >> [118209.186318] RIP: 0010:[<ffffffffa04237d7>] [<ffffffffa04237d7>] >> btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] >> [118209.186318] RSP: 0018:ffff8800af34faa8 EFLAGS: 00010246 >> [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: >> 0000000000000001 >> [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: >> 00000000ffffffff >> [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: >> 0000000000000000 >> [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: >> ffff880171b97000 >> [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: >> 0000160000000000 >> [118209.186318] FS: 00007f5bcb914840(0000) GS:ffff88023edc0000(0000) >> knlGS:0000000000000000 >> [118209.186318] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: >> 00000000000006e0 >> [118209.186318] Stack: >> [118209.186318] fffffbffffffffff 010230ffffffffff 0101000000000000 >> ff84000000000000 >> [118209.186318] fbffffffffffffff 30ffffffffffffff 0000000000000101 >> ffff880082348000 >> [118209.186318] 0000000000000000 ffff8800afa2e000 ffff8800afa2e000 >> 0000000000000000 >> [118209.186318] Call Trace: >> [118209.186318] [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs] >> [118209.186318] [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs] >> [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf >> [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 >> [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde >> [118209.186318] [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs] >> [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf >> [118209.186318] [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3 >> [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 >> [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde >> [118209.186318] [<ffffffff81195637>] do_mount+0x8a6/0x9e8 >> [118209.186318] [<ffffffff8119598d>] SyS_mount+0x77/0x9f >> [118209.186318] [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b >> [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 >> 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 >> 02 <0f> 0b >> 4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00 >> [118209.186318] RIP [<ffffffffa04237d7>] >> btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] >> [118209.186318] RSP <ffff8800af34faa8> >> [118209.230735] ---[ end trace 83938f987d85d477 ]--- >> >> So fix this by not treating the error -EEXIST, returned when attempting >> to insert a root already inserted by the backref walking code, as an >> error. >> >> The following test case for xfstests reproduces the bug: >> >> seq=`basename $0` >> seqres=$RESULT_DIR/$seq >> echo "QA output created by $seq" >> tmp=/tmp/$$ >> status=1 # failure is the default! >> trap "_cleanup; exit \$status" 0 1 2 3 15 >> >> _cleanup() >> { >> _cleanup_flakey >> cd / >> rm -f $tmp.* >> } >> >> # get standard environment, filters and checks >> . ./common/rc >> . ./common/filter >> . ./common/dmflakey >> >> # real QA test starts here >> _supported_fs btrfs >> _supported_os Linux >> _require_scratch >> _require_dm_target flakey >> _require_metadata_journaling $SCRATCH_DEV >> >> rm -f $seqres.full >> >> _scratch_mkfs >>$seqres.full 2>&1 >> _init_flakey >> _mount_flakey >> >> _run_btrfs_util_prog quota enable $SCRATCH_MNT >> >> # Create 2 directories with one file in one of them. >> # We use these just to trigger a transaction commit later, moving the >> file from >> # directory a to directory b and doing an fsync against directory a. >> mkdir $SCRATCH_MNT/a >> mkdir $SCRATCH_MNT/b >> touch $SCRATCH_MNT/a/f >> sync >> >> # Create our test file with 2 4K extents. >> $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | >> _filter_xfs_io >> >> # Create a snapshot and delete it. This doesn't really delete the >> snapshot >> # immediately, just makes it inaccessible and invisible to user space, >> the >> # snapshot is deleted later by a dedicated kernel thread (cleaner >> kthread) >> # which is woke up at the next transaction commit. >> # A root orphan item is inserted into the tree of tree roots, so that >> if a >> # power failure happens before the dedicated kernel thread does the >> snapshot >> # deletion, the next time the filesystem is mounted it resumes the >> snapshot >> # deletion. >> _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap >> _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap >> >> # Now overwrite half of the extents we wrote before. Because we made a >> snapshpot >> # before, which isn't really deleted yet (since no transaction commit >> happened >> # after we did the snapshot delete request), the non overwritten >> extents get >> # referenced twice, once by the default subvolume and once by the >> snapshot. >> $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | >> _filter_xfs_io >> >> # Now move file f from directory a to directory b and fsync directory >> a. >> # The fsync on the directory a triggers a transaction commit (because a >> file >> # was moved from it to another directory) and the file fsync leaves a >> log tree >> # with file extent items to replay. >> mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b >> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a >> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar >> >> echo "File digest before power failure:" >> md5sum $SCRATCH_MNT/foobar | _filter_scratch >> >> # Now simulate a power failure and mount the filesystem to replay the >> log tree. >> # After the log tree was replayed, we used to hit a BUG_ON() when >> processing >> # the root orphan item for the deleted snapshot. This is because when >> processing >> # an orphan root the code expected to be the first code inserting the >> root into >> # the fs_info->fs_root_radix radix tree, while in reallity it was the >> second >> # caller attempting to do it - the first caller was the transaction >> commit that >> # took place after replaying the log tree, when updating the qgroup >> counters. >> _flakey_drop_and_remount >> >> echo "File digest before after failure:" >> # Must match what he got before the power failure. >> md5sum $SCRATCH_MNT/foobar | _filter_scratch >> >> _unmount_flakey >> status=0 >> exit >> >> Fixes: 2d9e97761087 ("Btrfs: use btrfs_get_fs_root in >> resolve_indirect_ref") >> Cc: stable@vger.kernel.org # 4.4+ >> Signed-off-by: Filipe Manana <fdmanana@suse.com> > > > Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > > Looks good, and the comment is clear enough. > > Thanks for your long effort to spot and fix corner cases like this. Well, using qgroups, deleting snapshots and fsync'ing file data isn't that much of a rare use case, is it? :P > > Thanks, > Qu > > >> --- >> fs/btrfs/root-tree.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c >> index a25f3b2..9fcd6df 100644 >> --- a/fs/btrfs/root-tree.c >> +++ b/fs/btrfs/root-tree.c >> @@ -310,8 +310,16 @@ int btrfs_find_orphan_roots(struct btrfs_root >> *tree_root) >> set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state); >> >> err = btrfs_insert_fs_root(root->fs_info, root); >> + /* >> + * The root might have been inserted already, as before we >> look >> + * for orphan roots, log replay might have happened, which >> + * triggers a transaction commit and qgroup accounting, >> which >> + * in turn reads and inserts fs roots while doing backref >> + * walking. >> + */ >> + if (err == -EEXIST) >> + err = 0; >> if (err) { >> - BUG_ON(err == -EEXIST); >> btrfs_free_fs_root(root); >> break; >> } >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ping? Cc: Chris and David It seems that this fix is missing in 4.6 merge window. Or did I miss something? Thanks, Qu Filipe Manana wrote on 2016/03/03 09:10 +0000: > On Thu, Mar 3, 2016 at 4:31 AM, Duncan <1i5t5.duncan@cox.net> wrote: >> fdmanana posted on Wed, 02 Mar 2016 15:49:38 +0000 as excerpted: >> >>> When looking for orphan roots during mount we can end up hitting a >>> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >>> replayed and qgroups are enabled. >> >> This should hit 4.6, right? Will it hit 4.5 before release? > > It's not the first time you do a similar question, and if it's > targeted at me, all I can tell you is I don't know. It's the > maintainers (Chris, Josef, David) who decide when to pick patches and > for which releases. > >> >> Because I wasn't sure of current quota functionality status, but this bug >> obviously resets the counter on my ongoing "two kernel cycles with no >> known quota bugs before you try to use quotas" recommendation. > > You shouldn't spread such affirmation with such a level of certainty > every time a user reports a problem. > There are many bugs affecting the last 2 to 3 releases, but there are > also many bugs present since btrfs was added to the linux kernel tree, > and many others present for 2+ years, etc. > >> >> Meanwhile, what /is/ current quota feature status? Other than this bug, >> is it now considered known bug free, or is more quota reworking and/or >> bug fixing known to be needed for 4.6 and beyond? >> >> IOW, given that two release cycles no known bugs counter, are we >> realistically looking at that being 4.8, or are we now looking at 4.9 or >> beyond for reasonable quota stability? > > I don't know. I generally don't look actively look at qgroups, and I'm > not a user either. > You can only take conclusions based on user bug reports. Probably > there aren't more bugs for qgroups than there are for send/receive or > even non-btrfs specific features for example. > >> >> -- >> Duncan - List replies preferred. No HTML msgs. >> "Every nonfree program has a lord, a master -- >> and if you use the program, he is your master." Richard Stallman >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 14, 2016 at 6:34 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > Ping? > > Cc: Chris and David > > It seems that this fix is missing in 4.6 merge window. > Or did I miss something? 4.5-rc7: https://lkml.org/lkml/2016/3/4/695 > > Thanks, > Qu > > > Filipe Manana wrote on 2016/03/03 09:10 +0000: >> >> On Thu, Mar 3, 2016 at 4:31 AM, Duncan <1i5t5.duncan@cox.net> wrote: >>> >>> fdmanana posted on Wed, 02 Mar 2016 15:49:38 +0000 as excerpted: >>> >>>> When looking for orphan roots during mount we can end up hitting a >>>> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >>>> replayed and qgroups are enabled. >>> >>> >>> This should hit 4.6, right? Will it hit 4.5 before release? >> >> >> It's not the first time you do a similar question, and if it's >> targeted at me, all I can tell you is I don't know. It's the >> maintainers (Chris, Josef, David) who decide when to pick patches and >> for which releases. >> >>> >>> Because I wasn't sure of current quota functionality status, but this bug >>> obviously resets the counter on my ongoing "two kernel cycles with no >>> known quota bugs before you try to use quotas" recommendation. >> >> >> You shouldn't spread such affirmation with such a level of certainty >> every time a user reports a problem. >> There are many bugs affecting the last 2 to 3 releases, but there are >> also many bugs present since btrfs was added to the linux kernel tree, >> and many others present for 2+ years, etc. >> >>> >>> Meanwhile, what /is/ current quota feature status? Other than this bug, >>> is it now considered known bug free, or is more quota reworking and/or >>> bug fixing known to be needed for 4.6 and beyond? >>> >>> IOW, given that two release cycles no known bugs counter, are we >>> realistically looking at that being 4.8, or are we now looking at 4.9 or >>> beyond for reasonable quota stability? >> >> >> I don't know. I generally don't look actively look at qgroups, and I'm >> not a user either. >> You can only take conclusions based on user bug reports. Probably >> there aren't more bugs for qgroups than there are for send/receive or >> even non-btrfs specific features for example. >> >>> >>> -- >>> Duncan - List replies preferred. No HTML msgs. >>> "Every nonfree program has a lord, a master -- >>> and if you use the program, he is your master." Richard Stallman >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> > >
Filipe Manana wrote on 2016/04/14 10:21 +0100: > On Thu, Apr 14, 2016 at 6:34 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> Ping? >> >> Cc: Chris and David >> >> It seems that this fix is missing in 4.6 merge window. >> Or did I miss something? > > 4.5-rc7: https://lkml.org/lkml/2016/3/4/695 Strangely, not in integration-4.6. This makes me wonder which branch I should use now... Thanks, Qu > >> >> Thanks, >> Qu >> >> >> Filipe Manana wrote on 2016/03/03 09:10 +0000: >>> >>> On Thu, Mar 3, 2016 at 4:31 AM, Duncan <1i5t5.duncan@cox.net> wrote: >>>> >>>> fdmanana posted on Wed, 02 Mar 2016 15:49:38 +0000 as excerpted: >>>> >>>>> When looking for orphan roots during mount we can end up hitting a >>>>> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >>>>> replayed and qgroups are enabled. >>>> >>>> >>>> This should hit 4.6, right? Will it hit 4.5 before release? >>> >>> >>> It's not the first time you do a similar question, and if it's >>> targeted at me, all I can tell you is I don't know. It's the >>> maintainers (Chris, Josef, David) who decide when to pick patches and >>> for which releases. >>> >>>> >>>> Because I wasn't sure of current quota functionality status, but this bug >>>> obviously resets the counter on my ongoing "two kernel cycles with no >>>> known quota bugs before you try to use quotas" recommendation. >>> >>> >>> You shouldn't spread such affirmation with such a level of certainty >>> every time a user reports a problem. >>> There are many bugs affecting the last 2 to 3 releases, but there are >>> also many bugs present since btrfs was added to the linux kernel tree, >>> and many others present for 2+ years, etc. >>> >>>> >>>> Meanwhile, what /is/ current quota feature status? Other than this bug, >>>> is it now considered known bug free, or is more quota reworking and/or >>>> bug fixing known to be needed for 4.6 and beyond? >>>> >>>> IOW, given that two release cycles no known bugs counter, are we >>>> realistically looking at that being 4.8, or are we now looking at 4.9 or >>>> beyond for reasonable quota stability? >>> >>> >>> I don't know. I generally don't look actively look at qgroups, and I'm >>> not a user either. >>> You can only take conclusions based on user bug reports. Probably >>> there aren't more bugs for qgroups than there are for send/receive or >>> even non-btrfs specific features for example. >>> >>>> >>>> -- >>>> Duncan - List replies preferred. No HTML msgs. >>>> "Every nonfree program has a lord, a master -- >>>> and if you use the program, he is your master." Richard Stallman >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> >> >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 15, 2016 at 09:17:49AM +0800, Qu Wenruo wrote: > > > Filipe Manana wrote on 2016/04/14 10:21 +0100: > > On Thu, Apr 14, 2016 at 6:34 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > >> Ping? > >> > >> Cc: Chris and David > >> > >> It seems that this fix is missing in 4.6 merge window. > >> Or did I miss something? > > > > 4.5-rc7: https://lkml.org/lkml/2016/3/4/695 > > Strangely, not in integration-4.6. > > This makes me wonder which branch I should use now... for-linus-4.6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index a25f3b2..9fcd6df 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -310,8 +310,16 @@ int btrfs_find_orphan_roots(struct btrfs_root *tree_root) set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state); err = btrfs_insert_fs_root(root->fs_info, root); + /* + * The root might have been inserted already, as before we look + * for orphan roots, log replay might have happened, which + * triggers a transaction commit and qgroup accounting, which + * in turn reads and inserts fs roots while doing backref + * walking. + */ + if (err == -EEXIST) + err = 0; if (err) { - BUG_ON(err == -EEXIST); btrfs_free_fs_root(root); break; }