Message ID | 20190206204615.5862-1-josef@toxicpanda.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix missing reference aborts when resuming snapshot delete | expand |
On Wed, Feb 06, 2019 at 03:46:13PM -0500, Josef Bacik wrote: > With my delayed refs rsv patches in place we started hitting issues in our build > servers that do a lot of snapshot deletions. Turns out there was a bug in > btrfs_end_transaction_throttle() that caused it to basically always commit the > transaction, which uncovered this particular bug. > > The gory details are in the change logs for both patches, but generally speaking > it's a problem with how we update our root_item->drop_progress key. We will > skip updating it some times even though we will have dropped references to > blocks. If we crash or unmount at these times we will start at a point earlier > in our delete than we should be and try to free blocks that we already freed, > thus ending up with a transaction abort because we couldn't find the extent > reference. > > There are 2 patches, 1 patch to deal with already broken file systems, and 1 > patch to keep this problem from happening in the first place. > > The steps to reproduce this easily are sort of tricky, I had to add a couple of > debug patches to the kernel in order to make it easy, basically I just needed to > make sure we did actually commit the transaction every time we finished a > walk_down_tree/walk_up_tree combo. > > The reproducer > > 1) Creates a base subvolume. > 2) Creates 100k files in the subvolume. > 3) Snapshots the base subvolume (snap1). > 4) Touches files 5000-6000 in snap1. > 5) Snapshots snap1 (snap2). > 6) Deletes snap1. > > I do this with dm-log-writes, and then replay to every FUA in the log and fsck > the fs. Without these patches this falls over pretty quickly. With just the > first patch we can mount the fs at the point that the fsck fails and it cleans > everything up properly. With both patches applied the fsck never fails and > we're golden. Thanks, I copied the reproducer steps to the 2nd patch. 1 and 2 added to misc-next.