Message ID | 20181211101945.28695-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] Btrfs: send, fix race with transaction commits that create snapshots | expand |
On Tue, Dec 11, 2018 at 10:19:45AM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > If we create a snapshot of a snapshot currently being used by a send > operation, we can end up with send failing unexpectedly (returning > -ENOENT error to user space for example). The following diagram shows > how this happens. > > CPU 1 CPU2 CPU3 > > btrfs_ioctl_send() > (...) > create_snapshot() > -> creates snapshot of a > root used by the send > task > btrfs_commit_transaction() > create_pending_snapshot() > __get_inode_info() > btrfs_search_slot() > btrfs_search_slot_get_root() > down_read commit_root_sem > > get reference on eb of the > commit root > -> eb with bytenr == X > > up_read commit_root_sem > > btrfs_cow_block(root node) > btrfs_free_tree_block() > -> creates delayed ref to > free the extent > > btrfs_run_delayed_refs() > -> runs the delayed ref, > adds extent to > fs_info->pinned_extents > > btrfs_finish_extent_commit() > unpin_extent_range() > -> marks extent as free > in the free space cache > > transaction commit finishes > > btrfs_start_transaction() > (...) > btrfs_cow_block() > btrfs_alloc_tree_block() > btrfs_reserve_extent() > -> allocates extent at > bytenr == X > btrfs_init_new_buffer(bytenr X) > btrfs_find_create_tree_block() > alloc_extent_buffer(bytenr X) > find_extent_buffer(bytenr X) > -> returns existing eb, > which the send task got > > (...) > -> modifies content of the > eb with bytenr == X > > -> uses an eb that now > belongs to some other > tree and no more matches > the commit root of the > snapshot, resuts will be > unpredictable > > The consequences of this race can be various, and can lead to searches in > the commit root performed by the send task failing unexpectedly (unable to > find inode items, returning -ENOENT to user space, for example) or not > failing because an inode item with the same number was added to the tree > that reused the metadata extent, in which case send can behave incorrectly > in the worst case or just fail later for some reason. > > Fix this by performing a copy of the commit root's extent buffer when doing > a search in the context of a send operation. > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Nice catch, patch added to misc-next, thanks.
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 539901fb5165..99e7645ad94e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2584,14 +2584,27 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root, root_lock = BTRFS_READ_LOCK; if (p->search_commit_root) { - /* The commit roots are read only so we always do read locks */ - if (p->need_commit_sem) + /* + * The commit roots are read only so we always do read locks, + * and we always must hold the commit_root_sem when doing + * searches on them, the only exception is send where we don't + * want to block transaction commits for a long time, so + * we need to clone the commit root in order to avoid races + * with transaction commits that create a snapshot of one of + * the roots used by a send operation. + */ + if (p->need_commit_sem) { down_read(&fs_info->commit_root_sem); - b = root->commit_root; - extent_buffer_get(b); - level = btrfs_header_level(b); - if (p->need_commit_sem) + b = btrfs_clone_extent_buffer(root->commit_root); up_read(&fs_info->commit_root_sem); + if (!b) + return ERR_PTR(-ENOMEM); + + } else { + b = root->commit_root; + extent_buffer_get(b); + } + level = btrfs_header_level(b); /* * Ensure that all callers have set skip_locking when * p->search_commit_root = 1. @@ -2717,6 +2730,10 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, again: prev_cmp = -1; b = btrfs_search_slot_get_root(root, p, write_lock_level); + if (IS_ERR(b)) { + ret = PTR_ERR(b); + goto done; + } while (b) { level = btrfs_header_level(b);