diff mbox series

[RFC,3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing

Message ID 20250326112220.1988619-4-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series fs: automatic kernel fs freeze / thaw | expand

Commit Message

Luis Chamberlain March 26, 2025, 11:22 a.m. UTC
Add support to automatically handle freezing and thawing filesystems
during the kernel's suspend/resume cycle.

This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.

A long term advantage of also adding filesystem freeze / thawing
supporting during suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.

This does not remove the superfluous freezer calls on all filesystems.
Each filesystem must remove all the kthread freezer stuff and peg
the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
flag.

Subsequent patches remove the kthread freezer usage from each
filesystem, one at a time to make all this work bisectable.
Once all filesystems remove the usage of the kthread freezer we
can remove the FS_AUTOFREEZE flag.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/super.c             | 50 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     | 14 ++++++++++++
 kernel/power/process.c | 15 ++++++++++++-
 3 files changed, 78 insertions(+), 1 deletion(-)

Comments

James Bottomley March 26, 2025, 11:53 a.m. UTC | #1
On Wed, 2025-03-26 at 04:22 -0700, Luis Chamberlain wrote:
> Add support to automatically handle freezing and thawing filesystems
> during the kernel's suspend/resume cycle.
> 
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
> 
> A long term advantage of also adding filesystem freeze / thawing
> supporting during suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
> 
> This does not remove the superfluous freezer calls on all
> filesystems.
> Each filesystem must remove all the kthread freezer stuff and peg
> the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> flag.
> 
> Subsequent patches remove the kthread freezer usage from each
> filesystem, one at a time to make all this work bisectable.
> Once all filesystems remove the usage of the kthread freezer we
> can remove the FS_AUTOFREEZE flag.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  fs/super.c             | 50
> ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     | 14 ++++++++++++
>  kernel/power/process.c | 15 ++++++++++++-
>  3 files changed, 78 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 9995546cf159..7428f0b2251c 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block
> *sb)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(sb_init_dio_done_wq);
> +
> +#ifdef CONFIG_PM_SLEEP
> +static bool super_should_freeze(struct super_block *sb)
> +{
> +	if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> +		return false;
> +	/*
> +	 * We don't freeze virtual filesystems, we skip those
> filesystems with
> +	 * no backing device.
> +	 */
> +	if (sb->s_bdi == &noop_backing_dev_info)
> +		return false;


This logic won't work for me because efivarfs is a pseudofilesystem and
will have a noop bdi (or simply a null s_bdev, which is easier to check
for).  I was thinking of allowing freeze/thaw to continue for a s_bdev
== NULL filesystem if it provided a freeze or thaw callback, which will
cover efivarfs.

> +
> +	return true;
> +}
> +
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> +	int error = 0;
> +
> +	if (!super_should_freeze(sb))
> +		goto out;
> +
> +	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> +
> +	error = freeze_super(sb, false);

This is actually not wholly correct now.  If the fs provides a sb-
>freeze() method, you should use that instead of freeze_super() ... see
how fs_bdev_freeze() is doing it.

Additionally, the first thing freeze_super() does is take the
superblock lock exclusively.  Since you've already taken it exclusively
in your iterate super, how does this not deadlock?

You also need to handle the hibernate deadlock I ran into where a
process (and some of the systemd processes are very fast at doing this)
touches the filesystem and gets blocked on uninterruptible wait before
the remainder of freeze_processes() runs.  Once a task is
uninterruptible hibernate fails.  I came up with a simplistic solution:

https://lore.kernel.org/linux-fsdevel/1af829aa7a65eb5ebc0614a00f7019615ed0f62b.camel@HansenPartnership.com/

But there should probably be a freezable percpu_rwsem that
sb_write_started() can use to get these semantics rather than making
every use of percpu_rwsem freezable.

Regards,

James
Christian Brauner March 26, 2025, 2:09 p.m. UTC | #2
On Wed, Mar 26, 2025 at 07:53:10AM -0400, James Bottomley wrote:
> On Wed, 2025-03-26 at 04:22 -0700, Luis Chamberlain wrote:
> > Add support to automatically handle freezing and thawing filesystems
> > during the kernel's suspend/resume cycle.
> > 
> > This is needed so that we properly really stop IO in flight without
> > races after userspace has been frozen. Without this we rely on
> > kthread freezing and its semantics are loose and error prone.
> > For instance, even though a kthread may use try_to_freeze() and end
> > up being frozen we have no way of being sure that everything that
> > has been spawned asynchronously from it (such as timers) have also
> > been stopped as well.
> > 
> > A long term advantage of also adding filesystem freeze / thawing
> > supporting during suspend / hibernation is that long term we may
> > be able to eventually drop the kernel's thread freezing completely
> > as it was originally added to stop disk IO in flight as we hibernate
> > or suspend.
> > 
> > This does not remove the superfluous freezer calls on all
> > filesystems.
> > Each filesystem must remove all the kthread freezer stuff and peg
> > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> > flag.
> > 
> > Subsequent patches remove the kthread freezer usage from each
> > filesystem, one at a time to make all this work bisectable.
> > Once all filesystems remove the usage of the kthread freezer we
> > can remove the FS_AUTOFREEZE flag.
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  fs/super.c             | 50
> > ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h     | 14 ++++++++++++
> >  kernel/power/process.c | 15 ++++++++++++-
> >  3 files changed, 78 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/super.c b/fs/super.c
> > index 9995546cf159..7428f0b2251c 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block
> > *sb)
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(sb_init_dio_done_wq);
> > +
> > +#ifdef CONFIG_PM_SLEEP
> > +static bool super_should_freeze(struct super_block *sb)
> > +{
> > +	if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> > +		return false;
> > +	/*
> > +	 * We don't freeze virtual filesystems, we skip those
> > filesystems with
> > +	 * no backing device.
> > +	 */
> > +	if (sb->s_bdi == &noop_backing_dev_info)
> > +		return false;
> 
> 
> This logic won't work for me because efivarfs is a pseudofilesystem and
> will have a noop bdi (or simply a null s_bdev, which is easier to check
> for).  I was thinking of allowing freeze/thaw to continue for a s_bdev
> == NULL filesystem if it provided a freeze or thaw callback, which will
> cover efivarfs.

Filesystem freezing isn't dependent on backing devices. I'm not sure
where that impression comes from. The FS_AUTOFREEZE shouldn't be
necessary once all filesystems have been fixed up (which I guess this is
about). The logic should just be similar to what we do for the freeze
ioctl.

IOW, we skip filesystems without any freeze method. That excludes any fs
that isn't prepared to be frozen:

The easiest way is very likely to give efivarfs a ->freeze_super() and
->thaw_super() method since it likely doesn't all of the fanciness that
freeze_super() adds.

Then we have two approaches:

(1) Change the iterator to take a reference while holding the super_lock() and
    then calling a helper to freeze the fs.
(2) Pass the information that s_umount is held down to the freeze methods.

For example (2) would be something like:

diff --git a/include/linux/fs.h b/include/linux/fs.h
index be3ad155ec9f..7ad515ad6934 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2272,6 +2272,7 @@ enum freeze_holder {
        FREEZE_HOLDER_KERNEL    = (1U << 0),
        FREEZE_HOLDER_USERSPACE = (1U << 1),
        FREEZE_MAY_NEST         = (1U << 2),
+       FREEZE_SUPER_LOCKED     = (1U << 3),
 };

 struct super_operations {

static int freeze_super_locked(struct file *filp)
{
	/* If filesystem doesn't support freeze feature, return. */
	if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
		return 0;

	if (sb->s_op->freeze_super)
		return sb->s_op->freeze_super(sb, FREEZE_HOLDER_KERNEL | FREEZE_SUPER_LOCKED);
	return freeze_super(sb, FREEZE_HOLDER_KERNEL | FREEZE_SUPER_LOCKED);
}

Why do you care about efivarfs taking part in system suspend though?

> 
> > +
> > +	return true;
> > +}
> > +
> > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> > +{
> > +	int error = 0;
> > +
> > +	if (!super_should_freeze(sb))
> > +		goto out;
> > +
> > +	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> > +
> > +	error = freeze_super(sb, false);
> 
> This is actually not wholly correct now.  If the fs provides a sb-
> >freeze() method, you should use that instead of freeze_super() ... see
> how fs_bdev_freeze() is doing it.

> 
> Additionally, the first thing freeze_super() does is take the
> superblock lock exclusively.  Since you've already taken it exclusively
> in your iterate super, how does this not deadlock?

It will deadlock.
James Bottomley March 26, 2025, 2:37 p.m. UTC | #3
On Wed, 2025-03-26 at 15:09 +0100, Christian Brauner wrote:
[...]
> Why do you care about efivarfs taking part in system suspend though?

I don't, I only care about intercepting thaw.  If ->thaw_super can get
called on resume without me having to provide a freeze_super, then I'm
happy.

I'd still like to know whether the thaw is for suspend or hibernate,
though.

Regards,

James
diff mbox series

Patch

diff --git a/fs/super.c b/fs/super.c
index 9995546cf159..7428f0b2251c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -2279,3 +2279,53 @@  int sb_init_dio_done_wq(struct super_block *sb)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sb_init_dio_done_wq);
+
+#ifdef CONFIG_PM_SLEEP
+static bool super_should_freeze(struct super_block *sb)
+{
+	if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
+		return false;
+	/*
+	 * We don't freeze virtual filesystems, we skip those filesystems with
+	 * no backing device.
+	 */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return false;
+
+	return true;
+}
+
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+	int error = 0;
+
+	if (!super_should_freeze(sb))
+		goto out;
+
+	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
+
+	error = freeze_super(sb, false);
+	if (error && error != -EBUSY)
+		pr_notice("%s (%s): Unable to freeze, error=%d",
+			  sb->s_type->name, sb->s_id, error);
+out:
+	return error;
+}
+
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+	int error = 0;
+
+	if (!super_should_freeze(sb))
+		goto out;
+
+	pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
+
+	error = thaw_super(sb, false);
+	if (error && error != -EBUSY)
+		pr_notice("%s (%s): Unable to unfreeze, error=%d",
+			  sb->s_type->name, sb->s_id, error);
+out:
+	return error;
+}
+#endif /* CONFIG_PM_SLEEP */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index da17fd74961c..e0614c3d376e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2638,6 +2638,7 @@  struct file_system_type {
 #define FS_MGTIME		64	/* FS uses multigrain timestamps */
 #define FS_LBS			128	/* FS supports LBS */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
+#define FS_AUTOFREEZE           (1<<16)	/*  temporary as we phase kthread freezer out */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
 	struct dentry *(*mount) (struct file_system_type *, int,
@@ -2729,6 +2730,19 @@  extern int user_statfs(const char __user *, struct kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
 int freeze_super(struct super_block *super, enum freeze_holder who);
 int thaw_super(struct super_block *super, enum freeze_holder who);
+#ifdef CONFIG_PM_SLEEP
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv);
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv);
+#else
+static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+	return 0;
+}
+static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+	return 0;
+}
+#endif
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
 extern int super_setup_bdi(struct super_block *sb);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 66ac067d9ae6..d0f540a89c39 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -140,6 +140,16 @@  int freeze_processes(void)
 
 	BUG_ON(in_atomic());
 
+	pr_info("Freezing filesystems ... ");
+	error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
+	if (error) {
+		pr_cont("failed\n");
+		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
+		thaw_processes();
+		return error;
+	}
+	pr_cont("done.\n");
+
 	/*
 	 * Now that the whole userspace is frozen we need to disable
 	 * the OOM killer to disallow any further interference with
@@ -149,8 +159,10 @@  int freeze_processes(void)
 	if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
 		error = -EBUSY;
 
-	if (error)
+	if (error) {
+		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
 		thaw_processes();
+	}
 	return error;
 }
 
@@ -188,6 +200,7 @@  void thaw_processes(void)
 	pm_nosig_freezing = false;
 
 	oom_killer_enable();
+	iterate_supers_excl(fs_suspend_thaw_sb, NULL);
 
 	pr_info("Restarting tasks ... ");