[0/2] Fix missing reference aborts when resuming snapshot delete

Message ID	20190206204615.5862-1-josef@toxicpanda.com (mailing list archive)
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Josef Bacik <josef@toxicpanda.com> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 0/2] Fix missing reference aborts when resuming snapshot delete Date: Wed, 6 Feb 2019 15:46:13 -0500 Message-Id: <20190206204615.5862-1-josef@toxicpanda.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	Fix missing reference aborts when resuming snapshot delete \| expand [0/2] Fix missing reference aborts when resuming snapshot delete [1/2] btrfs: check for refs on snapshot delete resume [2/2] btrfs: save drop_progress if we drop refs at all

Message ID

20190206204615.5862-1-josef@toxicpanda.com (mailing list archive)

Headers

From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 0/2] Fix missing reference aborts when resuming snapshot
 delete
Date: Wed,  6 Feb 2019 15:46:13 -0500
Message-Id: <20190206204615.5862-1-josef@toxicpanda.com>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Series

Fix missing reference aborts when resuming snapshot delete | expand

Message

Josef Bacik Feb. 6, 2019, 8:46 p.m. UTC

With my delayed refs rsv patches in place we started hitting issues in our build
servers that do a lot of snapshot deletions.  Turns out there was a bug in
btrfs_end_transaction_throttle() that caused it to basically always commit the
transaction, which uncovered this particular bug.

The gory details are in the change logs for both patches, but generally speaking
it's a problem with how we update our root_item->drop_progress key.  We will
skip updating it some times even though we will have dropped references to
blocks.  If we crash or unmount at these times we will start at a point earlier
in our delete than we should be and try to free blocks that we already freed,
thus ending up with a transaction abort because we couldn't find the extent
reference.

There are 2 patches, 1 patch to deal with already broken file systems, and 1
patch to keep this problem from happening in the first place.

The steps to reproduce this easily are sort of tricky, I had to add a couple of
debug patches to the kernel in order to make it easy, basically I just needed to
make sure we did actually commit the transaction every time we finished a
walk_down_tree/walk_up_tree combo.

The reproducer

1) Creates a base subvolume.
2) Creates 100k files in the subvolume.
3) Snapshots the base subvolume (snap1).
4) Touches files 5000-6000 in snap1.
5) Snapshots snap1 (snap2).
6) Deletes snap1.

I do this with dm-log-writes, and then replay to every FUA in the log and fsck
the fs.  Without these patches this falls over pretty quickly.  With just the
first patch we can mount the fs at the point that the fsck fails and it cleans
everything up properly.  With both patches applied the fsck never fails and
we're golden.  Thanks,

Josef

Comments

David Sterba Feb. 27, 2019, 1:08 p.m. UTC | #1

On Wed, Feb 06, 2019 at 03:46:13PM -0500, Josef Bacik wrote:
> With my delayed refs rsv patches in place we started hitting issues in our build
> servers that do a lot of snapshot deletions.  Turns out there was a bug in
> btrfs_end_transaction_throttle() that caused it to basically always commit the
> transaction, which uncovered this particular bug.
> 
> The gory details are in the change logs for both patches, but generally speaking
> it's a problem with how we update our root_item->drop_progress key.  We will
> skip updating it some times even though we will have dropped references to
> blocks.  If we crash or unmount at these times we will start at a point earlier
> in our delete than we should be and try to free blocks that we already freed,
> thus ending up with a transaction abort because we couldn't find the extent
> reference.
> 
> There are 2 patches, 1 patch to deal with already broken file systems, and 1
> patch to keep this problem from happening in the first place.
> 
> The steps to reproduce this easily are sort of tricky, I had to add a couple of
> debug patches to the kernel in order to make it easy, basically I just needed to
> make sure we did actually commit the transaction every time we finished a
> walk_down_tree/walk_up_tree combo.
> 
> The reproducer
> 
> 1) Creates a base subvolume.
> 2) Creates 100k files in the subvolume.
> 3) Snapshots the base subvolume (snap1).
> 4) Touches files 5000-6000 in snap1.
> 5) Snapshots snap1 (snap2).
> 6) Deletes snap1.
> 
> I do this with dm-log-writes, and then replay to every FUA in the log and fsck
> the fs.  Without these patches this falls over pretty quickly.  With just the
> first patch we can mount the fs at the point that the fsck fails and it cleans
> everything up properly.  With both patches applied the fsck never fails and
> we're golden.  Thanks,

I copied the reproducer steps to the 2nd patch. 1 and 2 added to
misc-next.