diff mbox

[RFC,2/5] fs: freeze on suspend and thaw on resume

Message ID 20171003185313.1017-3-mcgrof@kernel.org
State New, archived
Headers show

Commit Message

Luis Chamberlain Oct. 3, 2017, 6:53 p.m. UTC
This uses the existing filesystem freeze and thaw callbacks to
freeze each filesystem on suspend/hibernation and thaw upon resume.

This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.

A long term advantage of also adding filesystem freeze / thawing
supporting durign suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.

This also implies that many kthread users exist which have been
adding freezer semantics onto its kthreads without need. These also
will need to be reviewed later.

This is based on prior work originally by Rafael Wysocki and later by
Jiri Kosina.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 fs/super.c             | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     | 13 +++++++++
 kernel/power/process.c | 14 ++++++++-
 3 files changed, 105 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Oct. 3, 2017, 8:02 p.m. UTC | #1
On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote:
> +static bool super_allows_freeze(struct super_block *sb)

> +{

> +	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);

> +}


A minor comment: if "!!" would be left out the compiler will perform the
conversion from int to bool implicitly so I propose to leave out the "!!"
and parentheses. Anyway, I agree with the approach of this patch and I think
that freezing filesystems before processes are frozen would be a big step
forward.

Bart.
Jiri Kosina Oct. 3, 2017, 8:06 p.m. UTC | #2
On Tue, 3 Oct 2017, Luis R. Rodriguez wrote:

> This uses the existing filesystem freeze and thaw callbacks to
> freeze each filesystem on suspend/hibernation and thaw upon resume.
> 
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
> 
> A long term advantage of also adding filesystem freeze / thawing
> supporting durign suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
> 
> This also implies that many kthread users exist which have been
> adding freezer semantics onto its kthreads without need. These also
> will need to be reviewed later.
> 
> This is based on prior work originally by Rafael Wysocki and later by
> Jiri Kosina.
> 
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>

Thanks a lot for picking this up; I never found time to actually finalize 
it.

	Acked-by: Jiri Kosina <jkosina@suse.cz>

for patches 2 and 5 (the fs agnostic code), which were in the core of my 
original work.
Luis Chamberlain Oct. 3, 2017, 8:23 p.m. UTC | #3
On Tue, Oct 03, 2017 at 08:02:22PM +0000, Bart Van Assche wrote:
> On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote:
> > +static bool super_allows_freeze(struct super_block *sb)
> > +{
> > +	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);
> > +}
> 
> A minor comment: if "!!" would be left out the compiler will perform the
> conversion from int to bool implicitly

For all compilers?

> so I propose to leave out the "!!" and parentheses.

OK!

> Anyway, I agree with the approach of this patch and I think
> that freezing filesystems before processes are frozen would be a big step
> forward.

Great! But please note, the current implementation calls fs_suspend_freeze()
*after* try_to_freeze_tasks(), ie: this implementation freezes userspace and
only after then filesystems.

Order will be *critical* here to get right, so we should definitely figure
out if this is definitely the right place (TM) to call fs_suspend_freeze().

Lastly, a final minor implementation note:

I think using a PM notifier would have been much cleaner, in fact it was my the
way I originally implemented this orthogonally to Jiri's work, however to get
this right the semantics of __pm_notifier_call_chain() would need to be
expanded with another state, perhaps PM_USERSPACE_FROZEN, for example. I
decided in the end a new state was not worth it give we would just have one
user: fs freezing. So to be clear using a notifier to wrap this code into
the fs code and not touching kernel/power/process.c is not possible with
today's semantics nor do I think its worth it to expand on these semantics.

This approach is explicit about order and requirements for those that should
care: those that will maintain kernel/power/process.c and friends. Having
this in a notifier would shift this implicitly.

  Luis
Bart Van Assche Oct. 3, 2017, 8:32 p.m. UTC | #4
On Tue, 2017-10-03 at 22:23 +0200, Luis R. Rodriguez wrote:
> On Tue, Oct 03, 2017 at 08:02:22PM +0000, Bart Van Assche wrote:

> > On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote:

> > > +static bool super_allows_freeze(struct super_block *sb)

> > > +{

> > > +	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);

> > > +}

> > 

> > A minor comment: if "!!" would be left out the compiler will perform the

> > conversion from int to bool implicitly

> 

> For all compilers?


Let's have a look at the output of the following commands:

$ PAGER= git grep 'typedef.*[[:blank:]]bool;' include
include/linux/types.h:typedef _Bool                     bool;
$ PAGER= git grep std= Makefile
Makefile:               -fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS)
Makefile:                  -std=gnu89 $(call cc-option,-fno-PIE)

From https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/C-Dialect-Options.html#C-Dialect-Options:
‘gnu89’
GNU dialect of ISO C90 (including some C99 features).

I think this means that the Linux kernel tree can only be compiled correctly
by compilers that support the C11 type _Bool.

> > Anyway, I agree with the approach of this patch and I think

> > that freezing filesystems before processes are frozen would be a big step

> > forward.

> 

> Great! But please note, the current implementation calls fs_suspend_freeze()

> *after* try_to_freeze_tasks(), ie: this implementation freezes userspace and

> only after then filesystems.


What will the impact be of that choice on filesystems implemented in userspace?

Thanks,

Bart.
Luis Chamberlain Oct. 3, 2017, 8:39 p.m. UTC | #5
On Tue, Oct 03, 2017 at 08:32:39PM +0000, Bart Van Assche wrote:
> On Tue, 2017-10-03 at 22:23 +0200, Luis R. Rodriguez wrote:
> > On Tue, Oct 03, 2017 at 08:02:22PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote:
> > > > +static bool super_allows_freeze(struct super_block *sb)
> > > > +{
> > > > +	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);
> > > > +}
> > > 
> > > A minor comment: if "!!" would be left out the compiler will perform the
> > > conversion from int to bool implicitly
> > 
> > For all compilers?
> 
> Let's have a look at the output of the following commands:
> 
> $ PAGER= git grep 'typedef.*[[:blank:]]bool;' include
> include/linux/types.h:typedef _Bool                     bool;
> $ PAGER= git grep std= Makefile
> Makefile:               -fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS)
> Makefile:                  -std=gnu89 $(call cc-option,-fno-PIE)
> 
> From https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/C-Dialect-Options.html#C-Dialect-Options:
> ‘gnu89’
> GNU dialect of ISO C90 (including some C99 features).
> 
> I think this means that the Linux kernel tree can only be compiled correctly
> by compilers that support the C11 type _Bool.

:*) beautiful, thanks.

> > > Anyway, I agree with the approach of this patch and I think
> > > that freezing filesystems before processes are frozen would be a big step
> > > forward.
> > 
> > Great! But please note, the current implementation calls fs_suspend_freeze()
> > *after* try_to_freeze_tasks(), ie: this implementation freezes userspace and
> > only after then filesystems.
> 
> What will the impact be of that choice on filesystems implemented in userspace?

Depends on what kernel hooks those use? Also now is a good time for those working
on userspace filesystems to chime in :) Its why I am stating -- I am not saying
I have found the right order, I have find the right order that works for me, and
we need consensus on what the right order should be.

  Luis
Dave Chinner Oct. 3, 2017, 8:58 p.m. UTC | #6
On Tue, Oct 03, 2017 at 11:53:10AM -0700, Luis R. Rodriguez wrote:
> This uses the existing filesystem freeze and thaw callbacks to
> freeze each filesystem on suspend/hibernation and thaw upon resume.
> 
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
> 
> A long term advantage of also adding filesystem freeze / thawing
> supporting durign suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
> 
> This also implies that many kthread users exist which have been
> adding freezer semantics onto its kthreads without need. These also
> will need to be reviewed later.
> 
> This is based on prior work originally by Rafael Wysocki and later by
> Jiri Kosina.
> 
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
>  fs/super.c             | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     | 13 +++++++++
>  kernel/power/process.c | 14 ++++++++-
>  3 files changed, 105 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index d45e92d9a38f..ce8da8b187b1 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1572,3 +1572,82 @@ int thaw_super(struct super_block *sb)
>  	return 0;
>  }
>  EXPORT_SYMBOL(thaw_super);
> +
> +#ifdef CONFIG_PM_SLEEP
> +static bool super_allows_freeze(struct super_block *sb)
> +{
> +	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);
> +}

That's a completely misleading function name. All superblocks can be
frozen - freeze_super() is filesystem independent. And given that, I
don't see why these super_should_freeze() hoops need to be jumped
through...

> +
> +static bool super_should_freeze(struct super_block *sb)
> +{
> +	if (!sb->s_root)
> +		return false;
> +	if (!(sb->s_flags & MS_BORN))
> +		return false;
> +	/*
> +	 * We don't freeze virtual filesystems, we skip those filesystems with
> +	 * no backing device.
> +	 */
> +	if (sb->s_bdi == &noop_backing_dev_info)
> +		return false;
> +	/* No need to freeze read-only filesystems */
> +	if (sb->s_flags & MS_RDONLY)
> +		return false;
> +	if (!super_allows_freeze(sb))
> +		return false;
> +
> +	return true;
> +}

> +
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> +	int error = 0;
> +
> +	spin_lock(&sb_lock);
> +	if (!super_should_freeze(sb))
> +		goto out;
> +
> +	up_read(&sb->s_umount);
> +	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> +	error = freeze_super(sb);
> +	down_read(&sb->s_umount);
> +out:
> +	if (error && error != -EBUSY)
> +		pr_notice("%s (%s): Unable to freeze, error=%d",
> +			  sb->s_type->name, sb->s_id, error);
> +	spin_unlock(&sb_lock);
> +	return error;
> +}

I don't think this was ever tested.  Calling freeze_super() with a
spinlock held with through "sleeping in atomic" errors all over the
place.

Also, the s_umount lock juggling is nasty. Your new copy+pasted
iterate_supers_reverse() takes the lock in read mode, yet all the
freeze/thaw callers here want to take it in write mode. So, really,
iterate_supers_reverse() needs to be iterate_supers_reverse_excl()
and take the write lock, and freeze_super/thaw_super need to be
factored into locked and unlocked versions.

In which case, we end up with:

int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
{
	return freeze_locked_super(sb);
}

int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
{
	return thaw_locked_super(sb);
}

Cheers,

Dave.
Luis Chamberlain Oct. 3, 2017, 9:16 p.m. UTC | #7
On Wed, Oct 04, 2017 at 07:58:41AM +1100, Dave Chinner wrote:
> On Tue, Oct 03, 2017 at 11:53:10AM -0700, Luis R. Rodriguez wrote:
> > diff --git a/fs/super.c b/fs/super.c
> > index d45e92d9a38f..ce8da8b187b1 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -1572,3 +1572,82 @@ int thaw_super(struct super_block *sb)
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(thaw_super);
> > +
> > +#ifdef CONFIG_PM_SLEEP
> > +static bool super_allows_freeze(struct super_block *sb)
> > +{
> > +	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);
> > +}
> 
> That's a completely misleading function name. All superblocks can be
> frozen - freeze_super() is filesystem independent. And given that, I
> don't see why these super_should_freeze() hoops need to be jumped
> through...

I added this given ext4's current thaw implementation stalls on resume 
as it implicates a bio_submit() and this never completes. So I refactored
ext4 thaw skip a sync on thaw. This requires more eyeballs. This may be an
underlying issue elsewhere.  If its not an bug elsewhere or on ordering, then
there may be some restrictions on thaw when used on resume. Refer to my notes
to Ted on patch #4 for ext4.

> > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> > +{
> > +	int error = 0;
> > +
> > +	spin_lock(&sb_lock);
> > +	if (!super_should_freeze(sb))
> > +		goto out;
> > +
> > +	up_read(&sb->s_umount);
> > +	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> > +	error = freeze_super(sb);
> > +	down_read(&sb->s_umount);
> > +out:
> > +	if (error && error != -EBUSY)
> > +		pr_notice("%s (%s): Unable to freeze, error=%d",
> > +			  sb->s_type->name, sb->s_id, error);
> > +	spin_unlock(&sb_lock);
> > +	return error;
> > +}
> 
> I don't think this was ever tested.  Calling freeze_super() with a
> spinlock held with through "sleeping in atomic" errors all over the
> place.

No, I run time tested it with a rootfs with ext4 and xfs with my
development tree and hammering on both while I loop on suspend
and resume.

> Also, the s_umount lock juggling is nasty. Your new copy+pasted
> iterate_supers_reverse() takes the lock in read mode, yet all the
> freeze/thaw callers here want to take it in write mode.

Yeap, yuk!

> So, really,
> iterate_supers_reverse() needs to be iterate_supers_reverse_excl()
> and take the write lock, and freeze_super/thaw_super need to be
> factored into locked and unlocked versions.
> 
> In which case, we end up with:
> 
> int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> {
> 	return freeze_locked_super(sb);
> }
> 
> int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
> {
> 	return thaw_locked_super(sb);
> }

Groovy, thanks, I suspected that locking was convoluted and
we could come up with something better. Will do it this way.

Its really what I hoped we could do :) I just needed to get
it in writing.

  Luis
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index d45e92d9a38f..ce8da8b187b1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1572,3 +1572,82 @@  int thaw_super(struct super_block *sb)
 	return 0;
 }
 EXPORT_SYMBOL(thaw_super);
+
+#ifdef CONFIG_PM_SLEEP
+static bool super_allows_freeze(struct super_block *sb)
+{
+	return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND);
+}
+
+static bool super_should_freeze(struct super_block *sb)
+{
+	if (!sb->s_root)
+		return false;
+	if (!(sb->s_flags & MS_BORN))
+		return false;
+	/*
+	 * We don't freeze virtual filesystems, we skip those filesystems with
+	 * no backing device.
+	 */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return false;
+	/* No need to freeze read-only filesystems */
+	if (sb->s_flags & MS_RDONLY)
+		return false;
+
+	if (!super_allows_freeze(sb))
+		return false;
+
+	return true;
+}
+
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	if (!super_should_freeze(sb))
+		goto out;
+
+	up_read(&sb->s_umount);
+	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
+	error = freeze_super(sb);
+	down_read(&sb->s_umount);
+out:
+	if (error && error != -EBUSY)
+		pr_notice("%s (%s): Unable to freeze, error=%d",
+			  sb->s_type->name, sb->s_id, error);
+	spin_unlock(&sb_lock);
+	return error;
+}
+
+int fs_suspend_freeze(void)
+{
+	return iterate_supers_reverse(fs_suspend_freeze_sb, NULL);
+}
+
+void fs_resume_unfreeze_sb(struct super_block *sb, void *priv)
+{
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	if (!super_should_freeze(sb))
+		goto out;
+
+	up_read(&sb->s_umount);
+	pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
+	error = thaw_super(sb);
+	down_read(&sb->s_umount);
+out:
+	if (error && error != -EBUSY)
+		pr_notice("%s (%s): Unable to unfreeze, error=%d",
+			  sb->s_type->name, sb->s_id, error);
+	spin_unlock(&sb_lock);
+}
+
+void fs_resume_unfreeze(void)
+{
+	iterate_supers(fs_resume_unfreeze_sb, NULL);
+}
+
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cd084792cf39..7754230d6afc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2071,6 +2071,7 @@  struct file_system_type {
 #define FS_BINARY_MOUNTDATA	2
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
+#define FS_FREEZE_ON_SUSPEND   16      /* should filesystem freeze on suspend? */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
@@ -2172,6 +2173,18 @@  extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+#ifdef CONFIG_PM_SLEEP
+int fs_suspend_freeze(void);
+void fs_resume_unfreeze(void);
+#else
+static inline int fs_suspend_freeze(void)
+{
+	return 0;
+}
+static inline void fs_resume_unfreeze(void)
+{
+}
+#endif
 extern bool our_mnt(struct vfsmount *mnt);
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 50f25cb370c6..9d1277768312 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -144,6 +144,15 @@  int freeze_processes(void)
 	pr_cont("\n");
 	BUG_ON(in_atomic());
 
+	pr_info("Freezing filesystems ... ");
+	error = fs_suspend_freeze();
+	if (error) {
+		pr_cont("failed\n");
+		fs_resume_unfreeze();
+		return error;
+	}
+	pr_cont("done.\n");
+
 	/*
 	 * Now that the whole userspace is frozen we need to disbale
 	 * the OOM killer to disallow any further interference with
@@ -153,8 +162,10 @@  int freeze_processes(void)
 	if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
 		error = -EBUSY;
 
-	if (error)
+	if (error) {
 		thaw_processes();
+		fs_resume_unfreeze();
+	}
 	return error;
 }
 
@@ -197,6 +208,7 @@  void thaw_processes(void)
 	pm_nosig_freezing = false;
 
 	oom_killer_enable();
+	fs_resume_unfreeze();
 
 	pr_info("Restarting tasks ... ");