Message ID | 20250326112220.1988619-4-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs: automatic kernel fs freeze / thaw | expand |
On Wed, 2025-03-26 at 04:22 -0700, Luis Chamberlain wrote: > Add support to automatically handle freezing and thawing filesystems > during the kernel's suspend/resume cycle. > > This is needed so that we properly really stop IO in flight without > races after userspace has been frozen. Without this we rely on > kthread freezing and its semantics are loose and error prone. > For instance, even though a kthread may use try_to_freeze() and end > up being frozen we have no way of being sure that everything that > has been spawned asynchronously from it (such as timers) have also > been stopped as well. > > A long term advantage of also adding filesystem freeze / thawing > supporting during suspend / hibernation is that long term we may > be able to eventually drop the kernel's thread freezing completely > as it was originally added to stop disk IO in flight as we hibernate > or suspend. > > This does not remove the superfluous freezer calls on all > filesystems. > Each filesystem must remove all the kthread freezer stuff and peg > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE > flag. > > Subsequent patches remove the kthread freezer usage from each > filesystem, one at a time to make all this work bisectable. > Once all filesystems remove the usage of the kthread freezer we > can remove the FS_AUTOFREEZE flag. > > Reviewed-by: Jan Kara <jack@suse.cz> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > fs/super.c | 50 > ++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 14 ++++++++++++ > kernel/power/process.c | 15 ++++++++++++- > 3 files changed, 78 insertions(+), 1 deletion(-) > > diff --git a/fs/super.c b/fs/super.c > index 9995546cf159..7428f0b2251c 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block > *sb) > return 0; > } > EXPORT_SYMBOL_GPL(sb_init_dio_done_wq); > + > +#ifdef CONFIG_PM_SLEEP > +static bool super_should_freeze(struct super_block *sb) > +{ > + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) > + return false; > + /* > + * We don't freeze virtual filesystems, we skip those > filesystems with > + * no backing device. > + */ > + if (sb->s_bdi == &noop_backing_dev_info) > + return false; This logic won't work for me because efivarfs is a pseudofilesystem and will have a noop bdi (or simply a null s_bdev, which is easier to check for). I was thinking of allowing freeze/thaw to continue for a s_bdev == NULL filesystem if it provided a freeze or thaw callback, which will cover efivarfs. > + > + return true; > +} > + > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > +{ > + int error = 0; > + > + if (!super_should_freeze(sb)) > + goto out; > + > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); > + > + error = freeze_super(sb, false); This is actually not wholly correct now. If the fs provides a sb- >freeze() method, you should use that instead of freeze_super() ... see how fs_bdev_freeze() is doing it. Additionally, the first thing freeze_super() does is take the superblock lock exclusively. Since you've already taken it exclusively in your iterate super, how does this not deadlock? You also need to handle the hibernate deadlock I ran into where a process (and some of the systemd processes are very fast at doing this) touches the filesystem and gets blocked on uninterruptible wait before the remainder of freeze_processes() runs. Once a task is uninterruptible hibernate fails. I came up with a simplistic solution: https://lore.kernel.org/linux-fsdevel/1af829aa7a65eb5ebc0614a00f7019615ed0f62b.camel@HansenPartnership.com/ But there should probably be a freezable percpu_rwsem that sb_write_started() can use to get these semantics rather than making every use of percpu_rwsem freezable. Regards, James
On Wed, Mar 26, 2025 at 07:53:10AM -0400, James Bottomley wrote: > On Wed, 2025-03-26 at 04:22 -0700, Luis Chamberlain wrote: > > Add support to automatically handle freezing and thawing filesystems > > during the kernel's suspend/resume cycle. > > > > This is needed so that we properly really stop IO in flight without > > races after userspace has been frozen. Without this we rely on > > kthread freezing and its semantics are loose and error prone. > > For instance, even though a kthread may use try_to_freeze() and end > > up being frozen we have no way of being sure that everything that > > has been spawned asynchronously from it (such as timers) have also > > been stopped as well. > > > > A long term advantage of also adding filesystem freeze / thawing > > supporting during suspend / hibernation is that long term we may > > be able to eventually drop the kernel's thread freezing completely > > as it was originally added to stop disk IO in flight as we hibernate > > or suspend. > > > > This does not remove the superfluous freezer calls on all > > filesystems. > > Each filesystem must remove all the kthread freezer stuff and peg > > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE > > flag. > > > > Subsequent patches remove the kthread freezer usage from each > > filesystem, one at a time to make all this work bisectable. > > Once all filesystems remove the usage of the kthread freezer we > > can remove the FS_AUTOFREEZE flag. > > > > Reviewed-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > > --- > > fs/super.c | 50 > > ++++++++++++++++++++++++++++++++++++++++++ > > include/linux/fs.h | 14 ++++++++++++ > > kernel/power/process.c | 15 ++++++++++++- > > 3 files changed, 78 insertions(+), 1 deletion(-) > > > > diff --git a/fs/super.c b/fs/super.c > > index 9995546cf159..7428f0b2251c 100644 > > --- a/fs/super.c > > +++ b/fs/super.c > > @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block > > *sb) > > return 0; > > } > > EXPORT_SYMBOL_GPL(sb_init_dio_done_wq); > > + > > +#ifdef CONFIG_PM_SLEEP > > +static bool super_should_freeze(struct super_block *sb) > > +{ > > + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) > > + return false; > > + /* > > + * We don't freeze virtual filesystems, we skip those > > filesystems with > > + * no backing device. > > + */ > > + if (sb->s_bdi == &noop_backing_dev_info) > > + return false; > > > This logic won't work for me because efivarfs is a pseudofilesystem and > will have a noop bdi (or simply a null s_bdev, which is easier to check > for). I was thinking of allowing freeze/thaw to continue for a s_bdev > == NULL filesystem if it provided a freeze or thaw callback, which will > cover efivarfs. Filesystem freezing isn't dependent on backing devices. I'm not sure where that impression comes from. The FS_AUTOFREEZE shouldn't be necessary once all filesystems have been fixed up (which I guess this is about). The logic should just be similar to what we do for the freeze ioctl. IOW, we skip filesystems without any freeze method. That excludes any fs that isn't prepared to be frozen: The easiest way is very likely to give efivarfs a ->freeze_super() and ->thaw_super() method since it likely doesn't all of the fanciness that freeze_super() adds. Then we have two approaches: (1) Change the iterator to take a reference while holding the super_lock() and then calling a helper to freeze the fs. (2) Pass the information that s_umount is held down to the freeze methods. For example (2) would be something like: diff --git a/include/linux/fs.h b/include/linux/fs.h index be3ad155ec9f..7ad515ad6934 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2272,6 +2272,7 @@ enum freeze_holder { FREEZE_HOLDER_KERNEL = (1U << 0), FREEZE_HOLDER_USERSPACE = (1U << 1), FREEZE_MAY_NEST = (1U << 2), + FREEZE_SUPER_LOCKED = (1U << 3), }; struct super_operations { static int freeze_super_locked(struct file *filp) { /* If filesystem doesn't support freeze feature, return. */ if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL) return 0; if (sb->s_op->freeze_super) return sb->s_op->freeze_super(sb, FREEZE_HOLDER_KERNEL | FREEZE_SUPER_LOCKED); return freeze_super(sb, FREEZE_HOLDER_KERNEL | FREEZE_SUPER_LOCKED); } Why do you care about efivarfs taking part in system suspend though? > > > + > > + return true; > > +} > > + > > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > > +{ > > + int error = 0; > > + > > + if (!super_should_freeze(sb)) > > + goto out; > > + > > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); > > + > > + error = freeze_super(sb, false); > > This is actually not wholly correct now. If the fs provides a sb- > >freeze() method, you should use that instead of freeze_super() ... see > how fs_bdev_freeze() is doing it. > > Additionally, the first thing freeze_super() does is take the > superblock lock exclusively. Since you've already taken it exclusively > in your iterate super, how does this not deadlock? It will deadlock.
On Wed, 2025-03-26 at 15:09 +0100, Christian Brauner wrote:
[...]
> Why do you care about efivarfs taking part in system suspend though?
I don't, I only care about intercepting thaw. If ->thaw_super can get
called on resume without me having to provide a freeze_super, then I'm
happy.
I'd still like to know whether the thaw is for suspend or hibernate,
though.
Regards,
James
diff --git a/fs/super.c b/fs/super.c index 9995546cf159..7428f0b2251c 100644 --- a/fs/super.c +++ b/fs/super.c @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block *sb) return 0; } EXPORT_SYMBOL_GPL(sb_init_dio_done_wq); + +#ifdef CONFIG_PM_SLEEP +static bool super_should_freeze(struct super_block *sb) +{ + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) + return false; + /* + * We don't freeze virtual filesystems, we skip those filesystems with + * no backing device. + */ + if (sb->s_bdi == &noop_backing_dev_info) + return false; + + return true; +} + +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) +{ + int error = 0; + + if (!super_should_freeze(sb)) + goto out; + + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); + + error = freeze_super(sb, false); + if (error && error != -EBUSY) + pr_notice("%s (%s): Unable to freeze, error=%d", + sb->s_type->name, sb->s_id, error); +out: + return error; +} + +int fs_suspend_thaw_sb(struct super_block *sb, void *priv) +{ + int error = 0; + + if (!super_should_freeze(sb)) + goto out; + + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id); + + error = thaw_super(sb, false); + if (error && error != -EBUSY) + pr_notice("%s (%s): Unable to unfreeze, error=%d", + sb->s_type->name, sb->s_id, error); +out: + return error; +} +#endif /* CONFIG_PM_SLEEP */ diff --git a/include/linux/fs.h b/include/linux/fs.h index da17fd74961c..e0614c3d376e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2638,6 +2638,7 @@ struct file_system_type { #define FS_MGTIME 64 /* FS uses multigrain timestamps */ #define FS_LBS 128 /* FS supports LBS */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ +#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */ int (*init_fs_context)(struct fs_context *); const struct fs_parameter_spec *parameters; struct dentry *(*mount) (struct file_system_type *, int, @@ -2729,6 +2730,19 @@ extern int user_statfs(const char __user *, struct kstatfs *); extern int fd_statfs(int, struct kstatfs *); int freeze_super(struct super_block *super, enum freeze_holder who); int thaw_super(struct super_block *super, enum freeze_holder who); +#ifdef CONFIG_PM_SLEEP +int fs_suspend_freeze_sb(struct super_block *sb, void *priv); +int fs_suspend_thaw_sb(struct super_block *sb, void *priv); +#else +static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv) +{ + return 0; +} +static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv) +{ + return 0; +} +#endif extern __printf(2, 3) int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); extern int super_setup_bdi(struct super_block *sb); diff --git a/kernel/power/process.c b/kernel/power/process.c index 66ac067d9ae6..d0f540a89c39 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -140,6 +140,16 @@ int freeze_processes(void) BUG_ON(in_atomic()); + pr_info("Freezing filesystems ... "); + error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL); + if (error) { + pr_cont("failed\n"); + iterate_supers_excl(fs_suspend_thaw_sb, NULL); + thaw_processes(); + return error; + } + pr_cont("done.\n"); + /* * Now that the whole userspace is frozen we need to disable * the OOM killer to disallow any further interference with @@ -149,8 +159,10 @@ int freeze_processes(void) if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs))) error = -EBUSY; - if (error) + if (error) { + iterate_supers_excl(fs_suspend_thaw_sb, NULL); thaw_processes(); + } return error; } @@ -188,6 +200,7 @@ void thaw_processes(void) pm_nosig_freezing = false; oom_killer_enable(); + iterate_supers_excl(fs_suspend_thaw_sb, NULL); pr_info("Restarting tasks ... ");