Message ID | e7669602a731fc542034cbe933b067326c5aa3ad.1612529182.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix a couple swapfile support bugs | expand |
On 05/02/2021 20:55, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When creating a snapshot we check if the current number of swap files, in > the root, is non-zero, and if it is, we error out and warn that we can not > create the snapshot because there are active swap files. > > However this is racy because when a task started activation of a swap > file, another task might have started already snapshot creation and might > have seen the counter for the number of swap files as zero. This means > that after the swap file is activated we may end up with a snapshot of the > same root successfully created, and therefore when the first write to the > swap file happens it has to fall back into COW mode, which should never > happen for active swap files. > > Basically what can happen is: > > 1) Task A starts snapshot creation and enters ioctl.c:create_snapshot(). > There it sees that root->nr_swapfiles has a value of 0 so it continues; > > 2) Task B enters btrfs_swap_activate(). It is not aware that another task > started snapshot creation but it did not finish yet. It increments > root->nr_swapfiles from 0 to 1; > > 3) Task B checks that the file meets all requirements to be an active > swap file - it has NOCOW set, there are no snapshots for the inode's > root at the moment, no file holes, no reflinked extents, etc; > > 4) Task B returns success and now the file is an active swap file; > > 5) Task A commits the transaction to create the snapshot and finishes. > The swap file's extents are now shared between the original root and > the snapshot; > > 6) A write into an extent of the swap file is attempted - there is a > snapshot of the file's root, so we fall back to COW mode and therefore > the physical location of the extent changes on disk. > > So fix this by taking the snapshot lock during swap file activation before > locking the extent range, as that is the order in which we lock these > during buffered writes. > > Fixes: ed46ff3d42378 ("Btrfs: support swap files") > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Thanks, Anand > --- > fs/btrfs/inode.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 715ae1320383..a9fb6a4eb9dd 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -10116,7 +10116,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > sector_t *span) > { > struct inode *inode = file_inode(file); > - struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; > + struct btrfs_root *root = BTRFS_I(inode)->root; > + struct btrfs_fs_info *fs_info = root->fs_info; > struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; > struct extent_state *cached_state = NULL; > struct extent_map *em = NULL; > @@ -10167,13 +10168,27 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > "cannot activate swapfile while exclusive operation is running"); > return -EBUSY; > } > + > + /* > + * Prevent snapshot creation while we are activating the swap file. > + * We do not want to race with snapshot creation. If snapshot creation > + * already started before we bumped nr_swapfiles from 0 to 1 and > + * completes before the first write into the swap file after it is > + * activated, than that write would fallback to COW. > + */ > + if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) { > + btrfs_exclop_finish(fs_info); > + btrfs_warn(fs_info, > + "cannot activate swapfile because snapshot creation is in progress"); > + return -EINVAL; > + } > /* > * Snapshots can create extents which require COW even if NODATACOW is > * set. We use this counter to prevent snapshots. We must increment it > * before walking the extents because we don't want a concurrent > * snapshot to run after we've already checked the extents. > */ > - atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles); > + atomic_inc(&root->nr_swapfiles); > > isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize); > > @@ -10319,6 +10334,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, > if (ret) > btrfs_swap_deactivate(file); > > + btrfs_drew_write_unlock(&root->snapshot_lock); > + > btrfs_exclop_finish(fs_info); > > if (ret) >
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 715ae1320383..a9fb6a4eb9dd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10116,7 +10116,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, sector_t *span) { struct inode *inode = file_inode(file); - struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_fs_info *fs_info = root->fs_info; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct extent_state *cached_state = NULL; struct extent_map *em = NULL; @@ -10167,13 +10168,27 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, "cannot activate swapfile while exclusive operation is running"); return -EBUSY; } + + /* + * Prevent snapshot creation while we are activating the swap file. + * We do not want to race with snapshot creation. If snapshot creation + * already started before we bumped nr_swapfiles from 0 to 1 and + * completes before the first write into the swap file after it is + * activated, than that write would fallback to COW. + */ + if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) { + btrfs_exclop_finish(fs_info); + btrfs_warn(fs_info, + "cannot activate swapfile because snapshot creation is in progress"); + return -EINVAL; + } /* * Snapshots can create extents which require COW even if NODATACOW is * set. We use this counter to prevent snapshots. We must increment it * before walking the extents because we don't want a concurrent * snapshot to run after we've already checked the extents. */ - atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles); + atomic_inc(&root->nr_swapfiles); isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize); @@ -10319,6 +10334,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, if (ret) btrfs_swap_deactivate(file); + btrfs_drew_write_unlock(&root->snapshot_lock); + btrfs_exclop_finish(fs_info); if (ret)