Message ID | 20250401-work-freeze-v1-0-d000611d4ab0@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | power: wire-up filesystem freeze/thaw with suspend/resume | expand |
On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote: > The whole shebang can also be found at: > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > I know nothing about power or hibernation. I would like to place this behind a Kconfig option and add a /sys/power/freeze_on_suspend option as these changes are pretty sensitive and to give userspace the ability to experiment with this for a while until we remove it. That means we should skip the removal of all the freezer changes in the filesystems until we're happy enough that this works reliable enough.
On Tue 01-04-25 02:32:45, Christian Brauner wrote: > The whole shebang can also be found at: > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > I know nothing about power or hibernation. I've tested it as best as I > could. Works for me (TM). > > I need to catch some actual sleep now... > > --- > > Now all the pieces are in place to actually allow the power subsystem to > freeze/thaw filesystems during suspend/resume. Filesystems are only > frozen and thawed if the power subsystem does actually own the freeze. > > Othwerwise it risks thawing filesystems it didn't own. This could be > done differently be e.g., keeping the filesystems that were actually > frozen on a list and then unfreezing them from that list. This is > disgustingly unclean though and reeks of an ugly hack. > > If the filesystem is already frozen by the time we've frozen all > userspace processes we don't care to freeze it again. That's userspace's > job once the process resumes. We only actually freeze filesystems if we > absolutely have to and we ignore other failures to freeze. Hum, I don't follow here. I supposed we'll use FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL for freezing from power subsystem. As far as I remember we have specifically designed nesting of freeze counters so that this way power subsystem can be sure freezing succeeds even if the filesystem is already frozen (by userspace or the kernel) and similarly power subsystem cannot thaw a filesystem frozen by somebody else. It will just drop its freeze refcount... What am I missing? Honza
On Tue, Apr 01, 2025 at 11:32:49AM +0200, Jan Kara wrote: > On Tue 01-04-25 02:32:45, Christian Brauner wrote: > > The whole shebang can also be found at: > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > > > I know nothing about power or hibernation. I've tested it as best as I > > could. Works for me (TM). > > > > I need to catch some actual sleep now... > > > > --- > > > > Now all the pieces are in place to actually allow the power subsystem to > > freeze/thaw filesystems during suspend/resume. Filesystems are only > > frozen and thawed if the power subsystem does actually own the freeze. > > > > Othwerwise it risks thawing filesystems it didn't own. This could be > > done differently be e.g., keeping the filesystems that were actually > > frozen on a list and then unfreezing them from that list. This is > > disgustingly unclean though and reeks of an ugly hack. > > > > If the filesystem is already frozen by the time we've frozen all > > userspace processes we don't care to freeze it again. That's userspace's > > job once the process resumes. We only actually freeze filesystems if we > > absolutely have to and we ignore other failures to freeze. > > Hum, I don't follow here. I supposed we'll use FREEZE_MAY_NEST | > FREEZE_HOLDER_KERNEL for freezing from power subsystem. As far as I > remember we have specifically designed nesting of freeze counters so that > this way power subsystem can be sure freezing succeeds even if the > filesystem is already frozen (by userspace or the kernel) and similarly > power subsystem cannot thaw a filesystem frozen by somebody else. It will > just drop its freeze refcount... What am I missing? If we have 10 filesystems and suspend/hibernate manges to freeze 5 and then fails on the 6th for whatever odd reason (current or future) then power needs to undo the freeze of the first 5 filesystems. We can't just walk the list again because while it's unlikely that a new filesystem got added in the meantime we still cannot tell what filesystems the power subsystem actually managed to get a freeze reference count on that we need to drop during thaw. There's various ways out of this ugliness. Either we record the filesystems the power subsystem managed to freeze on a temporary list in the callbacks and then walk that list backwards during thaw to undo the freezing or we make sure that the power subsystem just actually exclusively freezes things it can freeze and marking such filesystems as being owned by power for the duration of the suspend or resume cycle. I opted for the latter as that seemed the clean thing to do even if it means more code changes. What are your thoughts on this?
On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote: > The whole shebang can also be found at: > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > I know nothing about power or hibernation. I've tested it as best as I > could. Works for me (TM). > > I need to catch some actual sleep now... > > --- > > Now all the pieces are in place to actually allow the power subsystem to > freeze/thaw filesystems during suspend/resume. Filesystems are only > frozen and thawed if the power subsystem does actually own the freeze. Urgh, I was relying on all kthreads to be freezable for live-patching: https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net So I understand the problem with freezing filesystems, but can't we leave the TASK_FREEZABLE in the kthreads? The way I understand it, the power subsystem will first freeze the filesystems before it goes freeze threads anyway. So them remaining freezable should not affect anything, right?
On Tue, Apr 01, 2025 at 04:14:07PM +0200, Peter Zijlstra wrote: > On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote: > > The whole shebang can also be found at: > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > > > I know nothing about power or hibernation. I've tested it as best as I > > could. Works for me (TM). > > > > I need to catch some actual sleep now... > > > > --- > > > > Now all the pieces are in place to actually allow the power subsystem to > > freeze/thaw filesystems during suspend/resume. Filesystems are only > > frozen and thawed if the power subsystem does actually own the freeze. > > Urgh, I was relying on all kthreads to be freezable for live-patching: > > https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net > > So I understand the problem with freezing filesystems, but can't we > leave the TASK_FREEZABLE in the kthreads? The way I understand it, the Yeah, we can. > power subsystem will first freeze the filesystems before it goes freeze > threads anyway. So them remaining freezable should not affect anything, > right? Yes. I've dropped the other patches. I've discussed this later downthread with Jan. >
On Tue, Apr 01, 2025 at 04:40:33PM +0200, Christian Brauner wrote: > On Tue, Apr 01, 2025 at 04:14:07PM +0200, Peter Zijlstra wrote: > > On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote: > > > The whole shebang can also be found at: > > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > > > > > I know nothing about power or hibernation. I've tested it as best as I > > > could. Works for me (TM). > > > > > > I need to catch some actual sleep now... > > > > > > --- > > > > > > Now all the pieces are in place to actually allow the power subsystem to > > > freeze/thaw filesystems during suspend/resume. Filesystems are only > > > frozen and thawed if the power subsystem does actually own the freeze. > > > > Urgh, I was relying on all kthreads to be freezable for live-patching: > > > > https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net > > > > So I understand the problem with freezing filesystems, but can't we > > leave the TASK_FREEZABLE in the kthreads? The way I understand it, the > > Yeah, we can. > > > power subsystem will first freeze the filesystems before it goes freeze > > threads anyway. So them remaining freezable should not affect anything, > > right? > > Yes. I've dropped the other patches. I've discussed this later > downthread with Jan. Thanks!
On Tue 01-04-25 15:03:33, Christian Brauner wrote: > On Tue, Apr 01, 2025 at 11:32:49AM +0200, Jan Kara wrote: > > On Tue 01-04-25 02:32:45, Christian Brauner wrote: > > > The whole shebang can also be found at: > > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > > > > > I know nothing about power or hibernation. I've tested it as best as I > > > could. Works for me (TM). > > > > > > I need to catch some actual sleep now... > > > > > > --- > > > > > > Now all the pieces are in place to actually allow the power subsystem to > > > freeze/thaw filesystems during suspend/resume. Filesystems are only > > > frozen and thawed if the power subsystem does actually own the freeze. > > > > > > Othwerwise it risks thawing filesystems it didn't own. This could be > > > done differently be e.g., keeping the filesystems that were actually > > > frozen on a list and then unfreezing them from that list. This is > > > disgustingly unclean though and reeks of an ugly hack. > > > > > > If the filesystem is already frozen by the time we've frozen all > > > userspace processes we don't care to freeze it again. That's userspace's > > > job once the process resumes. We only actually freeze filesystems if we > > > absolutely have to and we ignore other failures to freeze. > > > > Hum, I don't follow here. I supposed we'll use FREEZE_MAY_NEST | > > FREEZE_HOLDER_KERNEL for freezing from power subsystem. As far as I > > remember we have specifically designed nesting of freeze counters so that > > this way power subsystem can be sure freezing succeeds even if the > > filesystem is already frozen (by userspace or the kernel) and similarly > > power subsystem cannot thaw a filesystem frozen by somebody else. It will > > just drop its freeze refcount... What am I missing? > > If we have 10 filesystems and suspend/hibernate manges to freeze 5 and > then fails on the 6th for whatever odd reason (current or future) then > power needs to undo the freeze of the first 5 filesystems. We can't just > walk the list again because while it's unlikely that a new filesystem > got added in the meantime we still cannot tell what filesystems the > power subsystem actually managed to get a freeze reference count on that > we need to drop during thaw. > > There's various ways out of this ugliness. Either we record the > filesystems the power subsystem managed to freeze on a temporary list in > the callbacks and then walk that list backwards during thaw to undo the > freezing or we make sure that the power subsystem just actually > exclusively freezes things it can freeze and marking such filesystems as > being owned by power for the duration of the suspend or resume cycle. I > opted for the latter as that seemed the clean thing to do even if it > means more code changes. What are your thoughts on this? Ah, I see. Thanks for explanation. So failure to freeze filesystem should be rare (mostly only due to IO errors or similar serious issues) hence I'd consider failing hibernation in case we fail to freeze some filesystem appropriate. The function that's walking all superblocks and freezing them could just walk from the superblock where freezing failed towards the end and thaw all filesystems. That way the function also has the nice property that it either freezes everything or keeps things as they were. But you've touched on an interesting case I didn't consider: New superblocks can be added to the end of the list while we are walking it. These superblocks will not be frozen and on resume (or error recovery) this will confuse things. Your "freeze owner" stuff deals with this problem nicely. Somewhat lighter fix for this may be to provide the superblock to start from / end with to these loops iterating and freezing / thawing superblocks. It doesn't seem too hacky but if you prefer your freeze owner approach I won't object. Honza
On Tue, 2025-04-01 at 02:32 +0200, Christian Brauner wrote: > The whole shebang can also be found at: > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > I know nothing about power or hibernation. I've tested it as best as > I could. Works for me (TM). I'm testing the latest you have in work.freeze and it doesn't currently work for me. Patch 7b315c39b67d ("power: freeze filesystems during suspend/resume") doesn't set filesystems_freeze_ptr so it ends up being NULL and tripping over this check +static inline bool may_unfreeze(struct super_block *sb, enum freeze_holder who, + const void *freeze_owner) +{ + WARN_ON_ONCE((who & ~FREEZE_FLAGS)); + WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1); + + if (who & FREEZE_EXCL) { + if (WARN_ON_ONCE(sb->s_writers.freeze_owner == NULL)) + return false; in f15a9ae05a71 ("fs: add owner of freeze/thaw") and failing to resume from hibernate. Setting it to __builtin_return_address(0) in filesystems_freeze() makes everything work as expected, so that's what I'm testing now. I suppose one minor, minor nit is that the vagaries of English grammar mean that the verbs fail and succeed don't take the same grammatical construction, so failed can take the infinitive (failed to thaw) perfectly well, but succeeded takes a prepositional gerund construction instead: "succeeded at/in thawing" instead of the infinitive "succeeded to thaw" ... I've no idea why, but I'd probably blame the Victorians ... Regards, James
On Tue, Apr 01, 2025 at 01:02:07PM -0400, James Bottomley wrote: > On Tue, 2025-04-01 at 02:32 +0200, Christian Brauner wrote: > > The whole shebang can also be found at: > > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > > > I know nothing about power or hibernation. I've tested it as best as > > I could. Works for me (TM). > > I'm testing the latest you have in work.freeze and it doesn't currently > work for me. Patch 7b315c39b67d ("power: freeze filesystems during > suspend/resume") doesn't set filesystems_freeze_ptr so it ends up being > NULL and tripping over this check I haven't pushed the new version there. Sorry about that. I only have it locally. > > +static inline bool may_unfreeze(struct super_block *sb, enum > freeze_holder who, > + const void *freeze_owner) > +{ > + WARN_ON_ONCE((who & ~FREEZE_FLAGS)); > + WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1); > + > + if (who & FREEZE_EXCL) { > + if (WARN_ON_ONCE(sb->s_writers.freeze_owner == NULL)) > + return false; > > > in f15a9ae05a71 ("fs: add owner of freeze/thaw") and failing to resume > from hibernate. Setting it to __builtin_return_address(0) in > filesystems_freeze() makes everything work as expected, so that's what > I'm testing now. +1 I'll send the final version out in a bit.
The whole shebang can also be found at: https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze I know nothing about power or hibernation. I've tested it as best as I could. Works for me (TM). I need to catch some actual sleep now... --- Now all the pieces are in place to actually allow the power subsystem to freeze/thaw filesystems during suspend/resume. Filesystems are only frozen and thawed if the power subsystem does actually own the freeze. Othwerwise it risks thawing filesystems it didn't own. This could be done differently be e.g., keeping the filesystems that were actually frozen on a list and then unfreezing them from that list. This is disgustingly unclean though and reeks of an ugly hack. If the filesystem is already frozen by the time we've frozen all userspace processes we don't care to freeze it again. That's userspace's job once the process resumes. We only actually freeze filesystems if we absolutely have to and we ignore other failures to freeze. We could bubble up errors and fail suspend/resume if the error isn't EBUSY (aka it's already frozen) but I don't think that this is worth it. Filesystem freezing during suspend/resume is best-effort. If the user has 500 ext4 filesystems mounted and 4 fail to freeze for whatever reason then we simply skip them. What we have now is already a big improvement and let's see how we fare with it before making our lives even harder (and uglier) than we have to. Signed-off-by: Christian Brauner <brauner@kernel.org> --- Christian Brauner (3): fs: add owner of freeze/thaw fs: allow pagefault based writers to be frozen power: freeze filesystems during suspend/resume Luis Chamberlain (3): ext4: replace kthread freezing with auto fs freezing btrfs: replace kthread freezing with auto fs freezing xfs: replace kthread freezing with auto fs freezing fs/btrfs/disk-io.c | 4 +-- fs/btrfs/scrub.c | 2 +- fs/ext4/mballoc.c | 2 +- fs/ext4/super.c | 3 -- fs/f2fs/gc.c | 6 ++-- fs/gfs2/super.c | 20 ++++++----- fs/gfs2/sys.c | 4 +-- fs/ioctl.c | 8 ++--- fs/super.c | 82 ++++++++++++++++++++++++++++++++++++--------- fs/xfs/scrub/fscounters.c | 4 +-- fs/xfs/xfs_discard.c | 2 +- fs/xfs/xfs_log.c | 3 +- fs/xfs/xfs_log_cil.c | 2 +- fs/xfs/xfs_mru_cache.c | 2 +- fs/xfs/xfs_notify_failure.c | 6 ++-- fs/xfs/xfs_pwork.c | 2 +- fs/xfs/xfs_super.c | 14 ++++---- fs/xfs/xfs_trans_ail.c | 3 -- fs/xfs/xfs_zone_gc.c | 2 -- include/linux/fs.h | 16 ++++++--- kernel/power/hibernate.c | 13 ++++++- kernel/power/suspend.c | 8 +++++ 22 files changed, 139 insertions(+), 69 deletions(-) --- base-commit: a68c99192db8060f383a2680333866c0be688ece change-id: 20250401-work-freeze-693b5b5a78e0