[2/4] btrfs: fix race between writes to swap files and scrub

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

When we active a swap file, at btrfs_swap_activate(), we acquire the
exclusive operation lock to prevent the physical location of the swap
file extents to be changed by operations such as balance and device
replace/resize/remove. We also call there can_nocow_extent() which,
among other things, checks if the block group of a swap file extent is
currently RO, and if it is we can not use the extent, since a write
into it would result in COWing the extent.

However we have no protection against a scrub operation running after we
activate the swap file, which can result in the swap file extents to be
COWed while the scrub is running and operating on the respective block
group, because scrub turns a block group into RO before it processes it
and then back again to RW mode after processing it. That means an attempt
to write into a swap file extent while scrub is processing the respective
block group, will result in COWing the extent, changing its physical
location on disk.

Fix this by making sure that block groups that have extents that are used
by active swap files can not be turned into RO mode, therefore making it
not possible for a scrub to turn them into RO mode. When a scrub finds a
block group that can not be turned to RO due to the existence of extents
used by swap files, it proceeds to the next block group and logs a warning
message that mentions the block group was skipped due to active swap
files - this is the same approach we currently use for balance.

This ends up removing the need to call btrfs_extent_readonly() from
can_nocow_extent(), as btrfs_swap_activate() now checks if a block group
is RO through the new function btrfs_inc_block_group_swap_extents().

The only other caller of can_nocow_extent() is the direct IO write path,
btrfs_get_blocks_direct_write(), but that already checks if a block group
is RO through the call to btrfs_inc_nocow_writers(). In fact, after this
change we end up optimizing the direct IO write path, since we no longer
iterate the block groups rbtree twice, once with btrfs_extent_readonly(),
through can_nocow_extent(), and once again with btrfs_inc_nocow_writers().
This can save time and reduce contention on the lock that protects the
rbtree (specially because it is a spinlock and not a read/write lock) on
very large filesystems, with several thousands of allocated block groups.

Fixes: ed46ff3d42378 ("Btrfs: support swap files")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/block-group.c | 33 ++++++++++++++++++++++++++++++++-
 fs/btrfs/block-group.h |  9 +++++++++
 fs/btrfs/ctree.h       |  5 +++++
 fs/btrfs/inode.c       | 22 ++++++++++++++++++----
 fs/btrfs/scrub.c       |  9 ++++++++-
 5 files changed, 72 insertions(+), 6 deletions(-)

Message ID	0da379a02fdabaf9ca295a34f7de287b5d5465f7.1612350698.git.fdmanana@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/4] btrfs: fix race between writes to swap files and scrub Date: Wed, 3 Feb 2021 11:17:45 +0000 Message-Id: <0da379a02fdabaf9ca295a34f7de287b5d5465f7.1612350698.git.fdmanana@suse.com> In-Reply-To: <cover.1612350698.git.fdmanana@suse.com> References: <cover.1612350698.git.fdmanana@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: fix a couple swapfile support bugs \| expand [0/4] btrfs: fix a couple swapfile support bugs [1/4] btrfs: avoid checking for RO block group twice during nocow writeback [2/4] btrfs: fix race between writes to swap files and scrub [3/4] btrfs: remove no longer used function btrfs_extent_readonly() [4/4] btrfs: fix race between swap file activation and snapshot creation

[2/4] btrfs: fix race between writes to swap files and scrub

Commit Message

Comments

Patch