mbox series

[0/6] power: wire-up filesystem freeze/thaw with suspend/resume

Message ID 20250401-work-freeze-v1-0-d000611d4ab0@kernel.org (mailing list archive)
Headers show
Series power: wire-up filesystem freeze/thaw with suspend/resume | expand

Message

Christian Brauner April 1, 2025, 12:32 a.m. UTC
The whole shebang can also be found at:
https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze

I know nothing about power or hibernation. I've tested it as best as I
could. Works for me (TM).

I need to catch some actual sleep now...

---

Now all the pieces are in place to actually allow the power subsystem to
freeze/thaw filesystems during suspend/resume. Filesystems are only
frozen and thawed if the power subsystem does actually own the freeze.

Othwerwise it risks thawing filesystems it didn't own. This could be
done differently be e.g., keeping the filesystems that were actually
frozen on a list and then unfreezing them from that list. This is
disgustingly unclean though and reeks of an ugly hack.

If the filesystem is already frozen by the time we've frozen all
userspace processes we don't care to freeze it again. That's userspace's
job once the process resumes. We only actually freeze filesystems if we
absolutely have to and we ignore other failures to freeze.

We could bubble up errors and fail suspend/resume if the error isn't
EBUSY (aka it's already frozen) but I don't think that this is worth it.
Filesystem freezing during suspend/resume is best-effort. If the user
has 500 ext4 filesystems mounted and 4 fail to freeze for whatever
reason then we simply skip them.

What we have now is already a big improvement and let's see how we fare
with it before making our lives even harder (and uglier) than we have
to.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (3):
      fs: add owner of freeze/thaw
      fs: allow pagefault based writers to be frozen
      power: freeze filesystems during suspend/resume

Luis Chamberlain (3):
      ext4: replace kthread freezing with auto fs freezing
      btrfs: replace kthread freezing with auto fs freezing
      xfs: replace kthread freezing with auto fs freezing

 fs/btrfs/disk-io.c          |  4 +--
 fs/btrfs/scrub.c            |  2 +-
 fs/ext4/mballoc.c           |  2 +-
 fs/ext4/super.c             |  3 --
 fs/f2fs/gc.c                |  6 ++--
 fs/gfs2/super.c             | 20 ++++++-----
 fs/gfs2/sys.c               |  4 +--
 fs/ioctl.c                  |  8 ++---
 fs/super.c                  | 82 ++++++++++++++++++++++++++++++++++++---------
 fs/xfs/scrub/fscounters.c   |  4 +--
 fs/xfs/xfs_discard.c        |  2 +-
 fs/xfs/xfs_log.c            |  3 +-
 fs/xfs/xfs_log_cil.c        |  2 +-
 fs/xfs/xfs_mru_cache.c      |  2 +-
 fs/xfs/xfs_notify_failure.c |  6 ++--
 fs/xfs/xfs_pwork.c          |  2 +-
 fs/xfs/xfs_super.c          | 14 ++++----
 fs/xfs/xfs_trans_ail.c      |  3 --
 fs/xfs/xfs_zone_gc.c        |  2 --
 include/linux/fs.h          | 16 ++++++---
 kernel/power/hibernate.c    | 13 ++++++-
 kernel/power/suspend.c      |  8 +++++
 22 files changed, 139 insertions(+), 69 deletions(-)
---
base-commit: a68c99192db8060f383a2680333866c0be688ece
change-id: 20250401-work-freeze-693b5b5a78e0

Comments

Christian Brauner April 1, 2025, 8:16 a.m. UTC | #1
On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote:
> The whole shebang can also be found at:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> 
> I know nothing about power or hibernation.

I would like to place this behind a Kconfig option and add a
/sys/power/freeze_on_suspend option as these changes are pretty
sensitive and to give userspace the ability to experiment with this for
a while until we remove it. That means we should skip the removal of all
the freezer changes in the filesystems until we're happy enough that
this works reliable enough.
Jan Kara April 1, 2025, 9:32 a.m. UTC | #2
On Tue 01-04-25 02:32:45, Christian Brauner wrote:
> The whole shebang can also be found at:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> 
> I know nothing about power or hibernation. I've tested it as best as I
> could. Works for me (TM).
> 
> I need to catch some actual sleep now...
> 
> ---
> 
> Now all the pieces are in place to actually allow the power subsystem to
> freeze/thaw filesystems during suspend/resume. Filesystems are only
> frozen and thawed if the power subsystem does actually own the freeze.
> 
> Othwerwise it risks thawing filesystems it didn't own. This could be
> done differently be e.g., keeping the filesystems that were actually
> frozen on a list and then unfreezing them from that list. This is
> disgustingly unclean though and reeks of an ugly hack.
> 
> If the filesystem is already frozen by the time we've frozen all
> userspace processes we don't care to freeze it again. That's userspace's
> job once the process resumes. We only actually freeze filesystems if we
> absolutely have to and we ignore other failures to freeze.

Hum, I don't follow here. I supposed we'll use FREEZE_MAY_NEST |
FREEZE_HOLDER_KERNEL for freezing from power subsystem. As far as I
remember we have specifically designed nesting of freeze counters so that
this way power subsystem can be sure freezing succeeds even if the
filesystem is already frozen (by userspace or the kernel) and similarly
power subsystem cannot thaw a filesystem frozen by somebody else. It will
just drop its freeze refcount... What am I missing?

								Honza
Christian Brauner April 1, 2025, 1:03 p.m. UTC | #3
On Tue, Apr 01, 2025 at 11:32:49AM +0200, Jan Kara wrote:
> On Tue 01-04-25 02:32:45, Christian Brauner wrote:
> > The whole shebang can also be found at:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> > 
> > I know nothing about power or hibernation. I've tested it as best as I
> > could. Works for me (TM).
> > 
> > I need to catch some actual sleep now...
> > 
> > ---
> > 
> > Now all the pieces are in place to actually allow the power subsystem to
> > freeze/thaw filesystems during suspend/resume. Filesystems are only
> > frozen and thawed if the power subsystem does actually own the freeze.
> > 
> > Othwerwise it risks thawing filesystems it didn't own. This could be
> > done differently be e.g., keeping the filesystems that were actually
> > frozen on a list and then unfreezing them from that list. This is
> > disgustingly unclean though and reeks of an ugly hack.
> > 
> > If the filesystem is already frozen by the time we've frozen all
> > userspace processes we don't care to freeze it again. That's userspace's
> > job once the process resumes. We only actually freeze filesystems if we
> > absolutely have to and we ignore other failures to freeze.
> 
> Hum, I don't follow here. I supposed we'll use FREEZE_MAY_NEST |
> FREEZE_HOLDER_KERNEL for freezing from power subsystem. As far as I
> remember we have specifically designed nesting of freeze counters so that
> this way power subsystem can be sure freezing succeeds even if the
> filesystem is already frozen (by userspace or the kernel) and similarly
> power subsystem cannot thaw a filesystem frozen by somebody else. It will
> just drop its freeze refcount... What am I missing?

If we have 10 filesystems and suspend/hibernate manges to freeze 5 and
then fails on the 6th for whatever odd reason (current or future) then
power needs to undo the freeze of the first 5 filesystems. We can't just
walk the list again because while it's unlikely that a new filesystem
got added in the meantime we still cannot tell what filesystems the
power subsystem actually managed to get a freeze reference count on that
we need to drop during thaw.

There's various ways out of this ugliness. Either we record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or we make sure that the power subsystem just actually
exclusively freezes things it can freeze and marking such filesystems as
being owned by power for the duration of the suspend or resume cycle. I
opted for the latter as that seemed the clean thing to do even if it
means more code changes. What are your thoughts on this?
Peter Zijlstra April 1, 2025, 2:14 p.m. UTC | #4
On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote:
> The whole shebang can also be found at:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> 
> I know nothing about power or hibernation. I've tested it as best as I
> could. Works for me (TM).
> 
> I need to catch some actual sleep now...
> 
> ---
> 
> Now all the pieces are in place to actually allow the power subsystem to
> freeze/thaw filesystems during suspend/resume. Filesystems are only
> frozen and thawed if the power subsystem does actually own the freeze.

Urgh, I was relying on all kthreads to be freezable for live-patching:

  https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net

So I understand the problem with freezing filesystems, but can't we
leave the TASK_FREEZABLE in the kthreads? The way I understand it, the
power subsystem will first freeze the filesystems before it goes freeze
threads anyway. So them remaining freezable should not affect anything,
right?
Christian Brauner April 1, 2025, 2:40 p.m. UTC | #5
On Tue, Apr 01, 2025 at 04:14:07PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote:
> > The whole shebang can also be found at:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> > 
> > I know nothing about power or hibernation. I've tested it as best as I
> > could. Works for me (TM).
> > 
> > I need to catch some actual sleep now...
> > 
> > ---
> > 
> > Now all the pieces are in place to actually allow the power subsystem to
> > freeze/thaw filesystems during suspend/resume. Filesystems are only
> > frozen and thawed if the power subsystem does actually own the freeze.
> 
> Urgh, I was relying on all kthreads to be freezable for live-patching:
> 
>   https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net
> 
> So I understand the problem with freezing filesystems, but can't we
> leave the TASK_FREEZABLE in the kthreads? The way I understand it, the

Yeah, we can.

> power subsystem will first freeze the filesystems before it goes freeze
> threads anyway. So them remaining freezable should not affect anything,
> right?

Yes. I've dropped the other patches. I've discussed this later
downthread with Jan.
>
Peter Zijlstra April 1, 2025, 2:59 p.m. UTC | #6
On Tue, Apr 01, 2025 at 04:40:33PM +0200, Christian Brauner wrote:
> On Tue, Apr 01, 2025 at 04:14:07PM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote:
> > > The whole shebang can also be found at:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> > > 
> > > I know nothing about power or hibernation. I've tested it as best as I
> > > could. Works for me (TM).
> > > 
> > > I need to catch some actual sleep now...
> > > 
> > > ---
> > > 
> > > Now all the pieces are in place to actually allow the power subsystem to
> > > freeze/thaw filesystems during suspend/resume. Filesystems are only
> > > frozen and thawed if the power subsystem does actually own the freeze.
> > 
> > Urgh, I was relying on all kthreads to be freezable for live-patching:
> > 
> >   https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net
> > 
> > So I understand the problem with freezing filesystems, but can't we
> > leave the TASK_FREEZABLE in the kthreads? The way I understand it, the
> 
> Yeah, we can.
> 
> > power subsystem will first freeze the filesystems before it goes freeze
> > threads anyway. So them remaining freezable should not affect anything,
> > right?
> 
> Yes. I've dropped the other patches. I've discussed this later
> downthread with Jan.

Thanks!
Jan Kara April 1, 2025, 4:57 p.m. UTC | #7
On Tue 01-04-25 15:03:33, Christian Brauner wrote:
> On Tue, Apr 01, 2025 at 11:32:49AM +0200, Jan Kara wrote:
> > On Tue 01-04-25 02:32:45, Christian Brauner wrote:
> > > The whole shebang can also be found at:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> > > 
> > > I know nothing about power or hibernation. I've tested it as best as I
> > > could. Works for me (TM).
> > > 
> > > I need to catch some actual sleep now...
> > > 
> > > ---
> > > 
> > > Now all the pieces are in place to actually allow the power subsystem to
> > > freeze/thaw filesystems during suspend/resume. Filesystems are only
> > > frozen and thawed if the power subsystem does actually own the freeze.
> > > 
> > > Othwerwise it risks thawing filesystems it didn't own. This could be
> > > done differently be e.g., keeping the filesystems that were actually
> > > frozen on a list and then unfreezing them from that list. This is
> > > disgustingly unclean though and reeks of an ugly hack.
> > > 
> > > If the filesystem is already frozen by the time we've frozen all
> > > userspace processes we don't care to freeze it again. That's userspace's
> > > job once the process resumes. We only actually freeze filesystems if we
> > > absolutely have to and we ignore other failures to freeze.
> > 
> > Hum, I don't follow here. I supposed we'll use FREEZE_MAY_NEST |
> > FREEZE_HOLDER_KERNEL for freezing from power subsystem. As far as I
> > remember we have specifically designed nesting of freeze counters so that
> > this way power subsystem can be sure freezing succeeds even if the
> > filesystem is already frozen (by userspace or the kernel) and similarly
> > power subsystem cannot thaw a filesystem frozen by somebody else. It will
> > just drop its freeze refcount... What am I missing?
> 
> If we have 10 filesystems and suspend/hibernate manges to freeze 5 and
> then fails on the 6th for whatever odd reason (current or future) then
> power needs to undo the freeze of the first 5 filesystems. We can't just
> walk the list again because while it's unlikely that a new filesystem
> got added in the meantime we still cannot tell what filesystems the
> power subsystem actually managed to get a freeze reference count on that
> we need to drop during thaw.
> 
> There's various ways out of this ugliness. Either we record the
> filesystems the power subsystem managed to freeze on a temporary list in
> the callbacks and then walk that list backwards during thaw to undo the
> freezing or we make sure that the power subsystem just actually
> exclusively freezes things it can freeze and marking such filesystems as
> being owned by power for the duration of the suspend or resume cycle. I
> opted for the latter as that seemed the clean thing to do even if it
> means more code changes. What are your thoughts on this?

Ah, I see. Thanks for explanation. So failure to freeze filesystem should
be rare (mostly only due to IO errors or similar serious issues) hence
I'd consider failing hibernation in case we fail to freeze some filesystem
appropriate. The function that's walking all superblocks and freezing them
could just walk from the superblock where freezing failed towards the end
and thaw all filesystems. That way the function also has the nice property
that it either freezes everything or keeps things as they were.

But you've touched on an interesting case I didn't consider: New
superblocks can be added to the end of the list while we are walking it.
These superblocks will not be frozen and on resume (or error recovery) this
will confuse things. Your "freeze owner" stuff deals with this problem
nicely. Somewhat lighter fix for this may be to provide the superblock to
start from / end with to these loops iterating and freezing / thawing
superblocks. It doesn't seem too hacky but if you prefer your freeze owner
approach I won't object.

								Honza
James Bottomley April 1, 2025, 5:02 p.m. UTC | #8
On Tue, 2025-04-01 at 02:32 +0200, Christian Brauner wrote:
> The whole shebang can also be found at:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> 
> I know nothing about power or hibernation. I've tested it as best as
> I could. Works for me (TM).

I'm testing the latest you have in work.freeze and it doesn't currently
work for me.  Patch 7b315c39b67d ("power: freeze filesystems during
suspend/resume") doesn't set filesystems_freeze_ptr so it ends up being
NULL and tripping over this check 

+static inline bool may_unfreeze(struct super_block *sb, enum
freeze_holder who,
+                               const void *freeze_owner)
+{
+       WARN_ON_ONCE((who & ~FREEZE_FLAGS));
+       WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
+
+       if (who & FREEZE_EXCL) {
+               if (WARN_ON_ONCE(sb->s_writers.freeze_owner == NULL))
+                       return false;


in f15a9ae05a71 ("fs: add owner of freeze/thaw") and failing to resume
from hibernate.  Setting it to __builtin_return_address(0) in
filesystems_freeze() makes everything work as expected, so that's what
I'm testing now.

I suppose one minor, minor nit is that the vagaries of English grammar
mean that the verbs fail and succeed don't take the same grammatical
construction, so failed can take the infinitive (failed to thaw)
perfectly well, but succeeded takes a prepositional gerund construction
instead: "succeeded at/in thawing" instead of the infinitive "succeeded
to thaw" ... I've no idea why, but I'd probably blame the Victorians
...

Regards,

James
Christian Brauner April 2, 2025, 7:46 a.m. UTC | #9
On Tue, Apr 01, 2025 at 01:02:07PM -0400, James Bottomley wrote:
> On Tue, 2025-04-01 at 02:32 +0200, Christian Brauner wrote:
> > The whole shebang can also be found at:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze
> > 
> > I know nothing about power or hibernation. I've tested it as best as
> > I could. Works for me (TM).
> 
> I'm testing the latest you have in work.freeze and it doesn't currently
> work for me.  Patch 7b315c39b67d ("power: freeze filesystems during
> suspend/resume") doesn't set filesystems_freeze_ptr so it ends up being
> NULL and tripping over this check 

I haven't pushed the new version there. Sorry about that. I only have it
locally.

> 
> +static inline bool may_unfreeze(struct super_block *sb, enum
> freeze_holder who,
> +                               const void *freeze_owner)
> +{
> +       WARN_ON_ONCE((who & ~FREEZE_FLAGS));
> +       WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
> +
> +       if (who & FREEZE_EXCL) {
> +               if (WARN_ON_ONCE(sb->s_writers.freeze_owner == NULL))
> +                       return false;
> 
> 
> in f15a9ae05a71 ("fs: add owner of freeze/thaw") and failing to resume
> from hibernate.  Setting it to __builtin_return_address(0) in
> filesystems_freeze() makes everything work as expected, so that's what
> I'm testing now.

+1

I'll send the final version out in a bit.