diff mbox

[RFC,v4,4/6] Btrfs: prevent ioctls from interfering with a swap file

Message ID 296f04a07350c9cc2032e03f78c1a7284747fb50.1527197312.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval May 24, 2018, 9:41 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

When a swap file is active, we must make sure that the extents of the
file are not moved and that they don't become shared. That means that
the following are not safe:

- chattr +c (enable compression)
- reflink
- dedupe
- snapshot
- defrag
- balance
- device remove/replace/resize

Don't allow those to happen on an active swap file. Balance and device
remove/replace/resize in particular are disallowed entirely; in the
future, we can relax this so that relocation skips/errors out only on
chunks containing an active swap file.

Note that we don't have to worry about chattr -C (disable nocow), which
we ignore for non-empty files, because an active swapfile must be
non-empty and can't be truncated. We also don't have to worry about
autodefrag because it's only done on COW files. Truncate and fallocate
are already taken care of by the generic code. Device add doesn't do
relocation so it's not an issue, either.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h   |  6 ++++++
 fs/btrfs/disk-io.c |  3 +++
 fs/btrfs/ioctl.c   | 51 ++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.c |  8 ++++++++
 4 files changed, 64 insertions(+), 4 deletions(-)

Comments

David Sterba May 25, 2018, 2:50 p.m. UTC | #1
On Thu, May 24, 2018 at 02:41:28PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> When a swap file is active, we must make sure that the extents of the
> file are not moved and that they don't become shared. That means that
> the following are not safe:
> 
> - chattr +c (enable compression)
> - reflink
> - dedupe
> - snapshot
> - defrag
> - balance
> - device remove/replace/resize
> 
> Don't allow those to happen on an active swap file. Balance and device
> remove/replace/resize in particular are disallowed entirely; in the
> future, we can relax this so that relocation skips/errors out only on
> chunks containing an active swap file.

Hm, disabling the entire balance is too intrusive. It's clear that the
swapfile causes a lot of trouble when it goes against the dynamic
capabilities of btrfs (relocation and the functionality that builds on
it).

Skipping the swapfile extents should be implemented at minimum. We can
add some heuristics that will group the swap extents to a small number
of chunks so the impact of unmovable chunks is limited.

I haven't looked at the implementation, but it might be possible to
internally find a different location for the swap extent once it's not
used for the actual paged data.

In an ideal case, the swap extents could span entire chunks (1G) and not
mix with regular data/metadata.

> Note that we don't have to worry about chattr -C (disable nocow), which
> we ignore for non-empty files, because an active swapfile must be
> non-empty and can't be truncated. We also don't have to worry about
> autodefrag because it's only done on COW files. Truncate and fallocate
> are already taken care of by the generic code. Device add doesn't do
> relocation so it's not an issue, either.

Ok, fine the remaining easy cases are covered.

I don't know if you mentioned that elsewhere (as design questions are
in this patch), the allocation profile is single, or is it also possible
to have striped or duplicated swap extents?
Omar Sandoval May 25, 2018, 4 p.m. UTC | #2
On Fri, May 25, 2018 at 04:50:55PM +0200, David Sterba wrote:
> On Thu, May 24, 2018 at 02:41:28PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > When a swap file is active, we must make sure that the extents of the
> > file are not moved and that they don't become shared. That means that
> > the following are not safe:
> > 
> > - chattr +c (enable compression)
> > - reflink
> > - dedupe
> > - snapshot
> > - defrag
> > - balance
> > - device remove/replace/resize
> > 
> > Don't allow those to happen on an active swap file. Balance and device
> > remove/replace/resize in particular are disallowed entirely; in the
> > future, we can relax this so that relocation skips/errors out only on
> > chunks containing an active swap file.
> 
> Hm, disabling the entire balance is too intrusive. It's clear that the
> swapfile causes a lot of trouble when it goes against the dynamic
> capabilities of btrfs (relocation and the functionality that builds on
> it).
> 
> Skipping the swapfile extents should be implemented at minimum.

Sure thing, this should definitely be possible. For balance, we can skip
them; for resize or delete, it of course has to fail if it encounters
swap extents. I'll take a stab at it.

> We can
> add some heuristics that will group the swap extents to a small number
> of chunks so the impact of unmovable chunks is limited.
> 
> I haven't looked at the implementation, but it might be possible to
> internally find a different location for the swap extent once it's not
> used for the actual paged data.
> 
> In an ideal case, the swap extents could span entire chunks (1G) and not
> mix with regular data/metadata.
> 
> > Note that we don't have to worry about chattr -C (disable nocow), which
> > we ignore for non-empty files, because an active swapfile must be
> > non-empty and can't be truncated. We also don't have to worry about
> > autodefrag because it's only done on COW files. Truncate and fallocate
> > are already taken care of by the generic code. Device add doesn't do
> > relocation so it's not an issue, either.
> 
> Ok, fine the remaining easy cases are covered.
> 
> I don't know if you mentioned that elsewhere (as design questions are
> in this patch), the allocation profile is single, or is it also possible
> to have striped or duplicated swap extents?

That's briefly mentioned in the last patch, only single data is
supported, although I think I can easily relax that to also allow RAID0.
Anything else is much harder to support, but we need to start somewhere.
David Sterba May 25, 2018, 4:10 p.m. UTC | #3
On Fri, May 25, 2018 at 09:00:58AM -0700, Omar Sandoval wrote:
> On Fri, May 25, 2018 at 04:50:55PM +0200, David Sterba wrote:
> > On Thu, May 24, 2018 at 02:41:28PM -0700, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > When a swap file is active, we must make sure that the extents of the
> > > file are not moved and that they don't become shared. That means that
> > > the following are not safe:
> > > 
> > > - chattr +c (enable compression)
> > > - reflink
> > > - dedupe
> > > - snapshot
> > > - defrag
> > > - balance
> > > - device remove/replace/resize
> > > 
> > > Don't allow those to happen on an active swap file. Balance and device
> > > remove/replace/resize in particular are disallowed entirely; in the
> > > future, we can relax this so that relocation skips/errors out only on
> > > chunks containing an active swap file.
> > 
> > Hm, disabling the entire balance is too intrusive. It's clear that the
> > swapfile causes a lot of trouble when it goes against the dynamic
> > capabilities of btrfs (relocation and the functionality that builds on
> > it).
> > 
> > Skipping the swapfile extents should be implemented at minimum.
> 
> Sure thing, this should definitely be possible. For balance, we can skip
> them; for resize or delete, it of course has to fail if it encounters
> swap extents. I'll take a stab at it.

We can detect if there's an active swap file on the filesystem before
shrink, delete or replace is started so the user is not surprised if it
fails in the end, or not start the operations at all and give some hints
what to do.

> > We can
> > add some heuristics that will group the swap extents to a small number
> > of chunks so the impact of unmovable chunks is limited.
> > 
> > I haven't looked at the implementation, but it might be possible to
> > internally find a different location for the swap extent once it's not
> > used for the actual paged data.
> > 
> > In an ideal case, the swap extents could span entire chunks (1G) and not
> > mix with regular data/metadata.
> > 
> > > Note that we don't have to worry about chattr -C (disable nocow), which
> > > we ignore for non-empty files, because an active swapfile must be
> > > non-empty and can't be truncated. We also don't have to worry about
> > > autodefrag because it's only done on COW files. Truncate and fallocate
> > > are already taken care of by the generic code. Device add doesn't do
> > > relocation so it's not an issue, either.
> > 
> > Ok, fine the remaining easy cases are covered.
> > 
> > I don't know if you mentioned that elsewhere (as design questions are
> > in this patch), the allocation profile is single, or is it also possible
> > to have striped or duplicated swap extents?
> 
> That's briefly mentioned in the last patch, only single data is
> supported, although I think I can easily relax that to also allow RAID0.
> Anything else is much harder to support, but we need to start somewhere.

Of course, support for single is absolutelly fine for the first
implementation.
Omar Sandoval Aug. 21, 2018, 8:46 a.m. UTC | #4
On Fri, May 25, 2018 at 06:10:43PM +0200, David Sterba wrote:
> On Fri, May 25, 2018 at 09:00:58AM -0700, Omar Sandoval wrote:
> > On Fri, May 25, 2018 at 04:50:55PM +0200, David Sterba wrote:
> > > On Thu, May 24, 2018 at 02:41:28PM -0700, Omar Sandoval wrote:
> > > > From: Omar Sandoval <osandov@fb.com>
> > > > 
> > > > When a swap file is active, we must make sure that the extents of the
> > > > file are not moved and that they don't become shared. That means that
> > > > the following are not safe:
> > > > 
> > > > - chattr +c (enable compression)
> > > > - reflink
> > > > - dedupe
> > > > - snapshot
> > > > - defrag
> > > > - balance
> > > > - device remove/replace/resize
> > > > 
> > > > Don't allow those to happen on an active swap file. Balance and device
> > > > remove/replace/resize in particular are disallowed entirely; in the
> > > > future, we can relax this so that relocation skips/errors out only on
> > > > chunks containing an active swap file.
> > > 
> > > Hm, disabling the entire balance is too intrusive. It's clear that the
> > > swapfile causes a lot of trouble when it goes against the dynamic
> > > capabilities of btrfs (relocation and the functionality that builds on
> > > it).
> > > 
> > > Skipping the swapfile extents should be implemented at minimum.
> > 
> > Sure thing, this should definitely be possible. For balance, we can skip
> > them; for resize or delete, it of course has to fail if it encounters
> > swap extents. I'll take a stab at it.
> 
> We can detect if there's an active swap file on the filesystem before
> shrink, delete or replace is started so the user is not surprised if it
> fails in the end, or not start the operations at all and give some hints
> what to do.

I looked at this some more, it's not pretty... Basically, we need to

- Add a counter of active swap extents to struct btrfs_block_group_cache
- At activate time, map the file extent to a block group and increment the counter
- At relocation time, check the counter, and either error out or skip as
  appropriate
- At deactivate time, decrement all of the block group counters. Easier
  said than done because the file could in theory have new extents
  allocated beyond EOF

The last point is the tricky one. The straightforward way to implement
deactivate would be to walk all of the extents of the file in the same
way we did for activate, but the extents may have changed. So, we have
to remember which extents were activated (or get that information from
the generic swap code somehow). This seems fragile.

Does anyone see a better approach? Is it worth the trouble?
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6deb39d8415d..280d7f5e2fe4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1117,6 +1117,9 @@  struct btrfs_fs_info {
 	u32 sectorsize;
 	u32 stripesize;
 
+	/* Number of active swapfiles */
+	atomic_t nr_swapfiles;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
@@ -1282,6 +1285,9 @@  struct btrfs_root {
 	spinlock_t qgroup_meta_rsv_lock;
 	u64 qgroup_meta_rsv_pertrans;
 	u64 qgroup_meta_rsv_prealloc;
+
+	/* Number of active swapfiles */
+	atomic_t nr_swapfiles;
 };
 
 struct btrfs_file_private {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 22171bbe86e3..b42fd6b41b20 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1215,6 +1215,7 @@  static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	atomic_set(&root->log_batch, 0);
 	refcount_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshotted, 0);
+	atomic_set(&root->nr_swapfiles, 0);
 	root->log_transid = 0;
 	root->log_transid_committed = -1;
 	root->last_log_commit = 0;
@@ -2825,6 +2826,8 @@  int open_ctree(struct super_block *sb,
 	fs_info->sectorsize = 4096;
 	fs_info->stripesize = 4096;
 
+	atomic_set(&fs_info->nr_swapfiles, 0);
+
 	ret = btrfs_alloc_stripe_hash_table(fs_info);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 82be4a94334b..5f068aabe612 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -295,6 +295,11 @@  static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
 	} else if (fsflags & FS_COMPR_FL) {
 		const char *comp;
 
+		if (IS_SWAPFILE(inode)) {
+			ret = -ETXTBSY;
+			goto out_unlock;
+		}
+
 		binode->flags |= BTRFS_INODE_COMPRESS;
 		binode->flags &= ~BTRFS_INODE_NOCOMPRESS;
 
@@ -767,6 +772,12 @@  static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 	if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 		return -EINVAL;
 
+	if (atomic_read(&root->nr_swapfiles)) {
+		btrfs_info(fs_info,
+			   "cannot create snapshot with active swapfile");
+		return -ETXTBSY;
+	}
+
 	pending_snapshot = kzalloc(sizeof(*pending_snapshot), GFP_KERNEL);
 	if (!pending_snapshot)
 		return -ENOMEM;
@@ -1504,9 +1515,13 @@  int btrfs_defrag_file(struct inode *inode, struct file *file,
 		}
 
 		inode_lock(inode);
-		if (do_compress)
-			BTRFS_I(inode)->defrag_compress = compress_type;
-		ret = cluster_pages_for_defrag(inode, pages, i, cluster);
+		if (IS_SWAPFILE(inode)) {
+			ret = -ETXTBSY;
+		} else {
+			if (do_compress)
+				BTRFS_I(inode)->defrag_compress = compress_type;
+			ret = cluster_pages_for_defrag(inode, pages, i, cluster);
+		}
 		if (ret < 0) {
 			inode_unlock(inode);
 			goto out_ra;
@@ -1602,6 +1617,12 @@  static noinline int btrfs_ioctl_resize(struct file *file,
 		return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 	}
 
+	if (atomic_read(&fs_info->nr_swapfiles)) {
+		btrfs_info(fs_info, "cannot resize with active swapfile");
+		ret = -ETXTBSY;
+		goto out;
+	}
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
 		ret = PTR_ERR(vol_args);
@@ -3614,6 +3635,11 @@  static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 		goto out;
 	}
 
+	if (IS_SWAPFILE(src) || IS_SWAPFILE(dst)) {
+		ret = -ETXTBSY;
+		goto out;
+	}
+
 	for (i = 0; i < chunk_count; i++) {
 		ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN,
 					      dst, dst_loff, &cmp);
@@ -4286,6 +4312,11 @@  static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 		goto out_unlock;
 	}
 
+	if (IS_SWAPFILE(src) || IS_SWAPFILE(inode)) {
+		ret = -ETXTBSY;
+		goto out_unlock;
+	}
+
 	/* determine range to clone */
 	ret = -EINVAL;
 	if (off + len > src->i_size || off + len < off)
@@ -4764,7 +4795,13 @@  static long btrfs_ioctl_dev_replace(struct btrfs_fs_info *fs_info,
 		if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
 			ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		} else {
-			ret = btrfs_dev_replace_by_ioctl(fs_info, p);
+			if (atomic_read(&fs_info->nr_swapfiles)) {
+				btrfs_info(fs_info,
+					   "cannot replace device with active swapfile");
+				ret = -ETXTBSY;
+			} else {
+				ret = btrfs_dev_replace_by_ioctl(fs_info, p);
+			}
 			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 		}
 		break;
@@ -5024,6 +5061,12 @@  static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 locked:
 	BUG_ON(!test_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
 
+	if (atomic_read(&fs_info->nr_swapfiles)) {
+		btrfs_info(fs_info, "cannot balance with active swapfile");
+		ret = -ETXTBSY;
+		goto out_unlock;
+	}
+
 	if (arg) {
 		bargs = memdup_user(arg, sizeof(*bargs));
 		if (IS_ERR(bargs)) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9cfac177214f..33b3d329ebb9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1954,6 +1954,13 @@  int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
 		return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 
+	if (atomic_read(&fs_info->nr_swapfiles)) {
+		btrfs_info(fs_info,
+			   "cannot remove device with active swapfile");
+		ret = -ETXTBSY;
+		goto out_clear;
+	}
+
 	mutex_lock(&uuid_mutex);
 
 	num_devices = fs_devices->num_devices;
@@ -2072,6 +2079,7 @@  int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 
 out:
 	mutex_unlock(&uuid_mutex);
+out_clear:
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	return ret;