Message ID | bbcd9ebaeccb3a9e5a875a2ffc1afb498d6b75fe.1724889346.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: interrupt long running operations if the current process is freezing | expand |
On Thu, Aug 29, 2024 at 09:26:17AM +0930, Qu Wenruo wrote: > [BUG] > There is a bug report that running fstrim will prevent the system from > hibernation, result the following dmesg: > PM: suspend entry (deep) > Filesystems sync: 0.060 seconds > Freezing user space processes > Freezing user space processes failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): > task:fstrim state:D stack:0 pid:15564 tgid:15564 ppid:1 flags:0x00004006 > Call Trace: > <TASK> > __schedule+0x381/0x1540 > schedule+0x24/0xb0 > schedule_timeout+0x1ea/0x2a0 > io_schedule_timeout+0x19/0x50 > wait_for_completion_io+0x78/0x140 > submit_bio_wait+0xaa/0xc0 > blkdev_issue_discard+0x65/0xb0 > btrfs_issue_discard+0xcf/0x160 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > btrfs_discard_extent+0x120/0x2a0 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > do_trimming+0xd4/0x220 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > trim_bitmaps+0x418/0x520 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > btrfs_trim_block_group+0xcb/0x130 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > btrfs_trim_fs+0x119/0x460 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > btrfs_ioctl_fitrim+0xfb/0x160 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > btrfs_ioctl+0x11cc/0x29f0 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] > __x64_sys_ioctl+0x92/0xd0 > do_syscall_64+0x5b/0x80 > entry_SYSCALL_64_after_hwframe+0x7c/0xe6 > RIP: 0033:0x7f5f3b529f9b > RSP: 002b:00007fff279ebc20 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fff279ebd60 RCX: 00007f5f3b529f9b > RDX: 00007fff279ebc90 RSI: 00000000c0185879 RDI: 0000000000000003 > RBP: 000055748718b2d0 R08: 00005574871899e8 R09: 00007fff279eb010 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > R13: 000055748718ac40 R14: 000055748718b290 R15: 000055748718b290 > </TASK> > OOM killer enabled. > Restarting tasks ... done. > random: crng reseeded on system resumption > PM: suspend exit > PM: suspend entry (s2idle) > Filesystems sync: 0.047 seconds > > [CAUSE] > PM code is freezing all user space processes before entering > hibernation/suspension, but if a user space process is trapping into the > kernel for a long running operation, it will not be frozen since it's > still inside kernel. > > Normally those long running operations check for fatal signals and exit > early, but freezing user space processes is not done by signals but a > different infrastructure. > > Unfortunately btrfs only checks fatal signals but not if the current > task is being frozen. > > [FIX] > Introduce a helper, btrfs_task_interrupted(), to check both fatal signals > and freezing status, and apply to all long running operations, with > dedicated error code: > > - reflink (-EINTR) > - fstrim (-ERESTARTSYS) > - relocation (-ECANCELD) > - llseek (-EINTR) > - defrag (-EAGAIN) > - fiemap (-EINTR) Is it correct to interrupt the operations? If there's a reflink in progress and system gets hibernated what's the reason to cancel it? It should be possible to just freeze the state and continue after thaw, no? Imagine a case when a long running file copy (reflink) is going on and the system gets frozen. This is wrong from user perspective.
在 2024/8/30 02:29, David Sterba 写道: > On Thu, Aug 29, 2024 at 09:26:17AM +0930, Qu Wenruo wrote: >> [BUG] >> There is a bug report that running fstrim will prevent the system from >> hibernation, result the following dmesg: >> PM: suspend entry (deep) >> Filesystems sync: 0.060 seconds >> Freezing user space processes >> Freezing user space processes failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): >> task:fstrim state:D stack:0 pid:15564 tgid:15564 ppid:1 flags:0x00004006 >> Call Trace: >> <TASK> >> __schedule+0x381/0x1540 >> schedule+0x24/0xb0 >> schedule_timeout+0x1ea/0x2a0 >> io_schedule_timeout+0x19/0x50 >> wait_for_completion_io+0x78/0x140 >> submit_bio_wait+0xaa/0xc0 >> blkdev_issue_discard+0x65/0xb0 >> btrfs_issue_discard+0xcf/0x160 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> btrfs_discard_extent+0x120/0x2a0 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> do_trimming+0xd4/0x220 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> trim_bitmaps+0x418/0x520 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> btrfs_trim_block_group+0xcb/0x130 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> btrfs_trim_fs+0x119/0x460 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> btrfs_ioctl_fitrim+0xfb/0x160 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> btrfs_ioctl+0x11cc/0x29f0 [btrfs 7ab35b9b86062a46f6ff578bb32d55ecf8e6bf82] >> __x64_sys_ioctl+0x92/0xd0 >> do_syscall_64+0x5b/0x80 >> entry_SYSCALL_64_after_hwframe+0x7c/0xe6 >> RIP: 0033:0x7f5f3b529f9b >> RSP: 002b:00007fff279ebc20 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00007fff279ebd60 RCX: 00007f5f3b529f9b >> RDX: 00007fff279ebc90 RSI: 00000000c0185879 RDI: 0000000000000003 >> RBP: 000055748718b2d0 R08: 00005574871899e8 R09: 00007fff279eb010 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 >> R13: 000055748718ac40 R14: 000055748718b290 R15: 000055748718b290 >> </TASK> >> OOM killer enabled. >> Restarting tasks ... done. >> random: crng reseeded on system resumption >> PM: suspend exit >> PM: suspend entry (s2idle) >> Filesystems sync: 0.047 seconds >> >> [CAUSE] >> PM code is freezing all user space processes before entering >> hibernation/suspension, but if a user space process is trapping into the >> kernel for a long running operation, it will not be frozen since it's >> still inside kernel. >> >> Normally those long running operations check for fatal signals and exit >> early, but freezing user space processes is not done by signals but a >> different infrastructure. >> >> Unfortunately btrfs only checks fatal signals but not if the current >> task is being frozen. >> >> [FIX] >> Introduce a helper, btrfs_task_interrupted(), to check both fatal signals >> and freezing status, and apply to all long running operations, with >> dedicated error code: >> >> - reflink (-EINTR) >> - fstrim (-ERESTARTSYS) >> - relocation (-ECANCELD) >> - llseek (-EINTR) >> - defrag (-EAGAIN) >> - fiemap (-EINTR) > > Is it correct to interrupt the operations? If there's a reflink in > progress and system gets hibernated what's the reason to cancel it? It > should be possible to just freeze the state and continue after thaw, no? One process trapped inside kernel can not be frozen. And it's really user-space programs' job to determine if they can resume the work. Thanks, Qu > > Imagine a case when a long running file copy (reflink) is going on and > the system gets frozen. This is wrong from user perspective. >
On Fri, Aug 30, 2024 at 07:07:41AM +0930, Qu Wenruo wrote: > >> different infrastructure. > >> > >> Unfortunately btrfs only checks fatal signals but not if the current > >> task is being frozen. > >> > >> [FIX] > >> Introduce a helper, btrfs_task_interrupted(), to check both fatal signals > >> and freezing status, and apply to all long running operations, with > >> dedicated error code: > >> > >> - reflink (-EINTR) > >> - fstrim (-ERESTARTSYS) > >> - relocation (-ECANCELD) > >> - llseek (-EINTR) > >> - defrag (-EAGAIN) > >> - fiemap (-EINTR) > > > > Is it correct to interrupt the operations? If there's a reflink in > > progress and system gets hibernated what's the reason to cancel it? It > > should be possible to just freeze the state and continue after thaw, no? > > One process trapped inside kernel can not be frozen. I don't know how freezing works but know it's very complex and handles lots of special cases so the above does not say much. We have kernel threads that can do io, like the cleaner kthread and others. There's no explicit point where it would check for freezing request, like it used to be with try_to_freeze() that got removed in ce63f891e1a87ae79c4. There are flags for worker queues that mark them as freezable and that's enough. There were series to make freezable block devices, added in 6.8 and btrfs support was not added (but this may not be relevant for the fstirm problem, I don't know). > And it's really user-space programs' job to determine if they can resume > the work. Yes, but in this case this seems that it's suddenly changing behaviour depending on freeze. And with this patch you change 6 operations while the report was to fix fstrim. > > Imagine a case when a long running file copy (reflink) is going on and > > the system gets frozen. This is wrong from user perspective. So, if this patch changes how cp --reflink behaves I claim it's a usability regression. But we can skip discussing that, the fix needs to be done for fstrim first. Here it's either to end prematurely with ERESTARTSYS or to somehow make it work for freezing with some magic. The patch as it is now cannot be merged as it breaks more than it fixes.
在 2024/8/30 07:53, David Sterba 写道: > On Fri, Aug 30, 2024 at 07:07:41AM +0930, Qu Wenruo wrote: >>>> different infrastructure. >>>> >>>> Unfortunately btrfs only checks fatal signals but not if the current >>>> task is being frozen. >>>> >>>> [FIX] >>>> Introduce a helper, btrfs_task_interrupted(), to check both fatal signals >>>> and freezing status, and apply to all long running operations, with >>>> dedicated error code: >>>> >>>> - reflink (-EINTR) >>>> - fstrim (-ERESTARTSYS) >>>> - relocation (-ECANCELD) >>>> - llseek (-EINTR) >>>> - defrag (-EAGAIN) >>>> - fiemap (-EINTR) >>> >>> Is it correct to interrupt the operations? If there's a reflink in >>> progress and system gets hibernated what's the reason to cancel it? It >>> should be possible to just freeze the state and continue after thaw, no? >> >> One process trapped inside kernel can not be frozen. > > I don't know how freezing works but know it's very complex and handles > lots of special cases so the above does not say much. Well, so far it really looks like for long running operations we should abort the operation. At least for fstrim, and that's also what ext4 does. > > We have kernel threads that can do io, like the cleaner kthread and > others. There's no explicit point where it would check for freezing > request, like it used to be with try_to_freeze() that got removed in > ce63f891e1a87ae79c4. There are flags for worker queues that mark them as > freezable and that's enough. There were series to make freezable block > devices, added in 6.8 and btrfs support was not added (but this may not > be relevant for the fstirm problem, I don't know). Those kthreads are different because they can sleep by themselves, thus no need for special handling. But it's not the same case for ioctl or the other long running operations that trap into the kernel. > >> And it's really user-space programs' job to determine if they can resume >> the work. > > Yes, but in this case this seems that it's suddenly changing behaviour > depending on freeze. And with this patch you change 6 operations while > the report was to fix fstrim. If that's the concern I'm totally happy to only do it for fstrim. Thanks, Qu > >>> Imagine a case when a long running file copy (reflink) is going on and >>> the system gets frozen. This is wrong from user perspective. > > So, if this patch changes how cp --reflink behaves I claim it's a > usability regression. But we can skip discussing that, the fix needs to > be done for fstrim first. Here it's either to end prematurely with > ERESTARTSYS or to somehow make it work for freezing with some magic. > > The patch as it is now cannot be merged as it breaks more than it fixes.
On Fri, Aug 30, 2024 at 07:58:07AM +0930, Qu Wenruo wrote: > >> And it's really user-space programs' job to determine if they can resume > >> the work. > > > > Yes, but in this case this seems that it's suddenly changing behaviour > > depending on freeze. And with this patch you change 6 operations while > > the report was to fix fstrim. > > If that's the concern I'm totally happy to only do it for fstrim. Thanks. We can revisit adding more cancellation points to the other operations later. I think scrub also does not work well with freezing but at least scrub can be safely paused like we do in transaction commit.
diff --git a/fs/btrfs/defrag.h b/fs/btrfs/defrag.h index 878528e086fb..f5e84c6b11cd 100644 --- a/fs/btrfs/defrag.h +++ b/fs/btrfs/defrag.h @@ -5,6 +5,7 @@ #include <linux/types.h> #include <linux/compiler_types.h> +#include "misc.h" struct inode; struct file_ra_state; @@ -26,7 +27,7 @@ int btrfs_defrag_root(struct btrfs_root *root); static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info) { - return signal_pending(current); + return signal_pending(current) || btrfs_task_interrupted(); } #endif diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index feec49e6f9c8..81ed4d1359aa 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4,7 +4,6 @@ */ #include <linux/sched.h> -#include <linux/sched/signal.h> #include <linux/pagemap.h> #include <linux/writeback.h> #include <linux/blkdev.h> @@ -6459,7 +6458,7 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) start += len; *trimmed += bytes; - if (fatal_signal_pending(current)) { + if (btrfs_task_interrupted()) { ret = -ERESTARTSYS; break; } diff --git a/fs/btrfs/fiemap.c b/fs/btrfs/fiemap.c index df7f09f3b02e..a3d9b8922bb9 100644 --- a/fs/btrfs/fiemap.c +++ b/fs/btrfs/fiemap.c @@ -794,7 +794,7 @@ static int extent_fiemap(struct btrfs_inode *inode, prev_extent_end = extent_end; next_item: - if (fatal_signal_pending(current)) { + if (btrfs_task_interrupted()) { ret = -EINTR; goto out_unlock; } diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 76f4cc686af9..088f06dca16e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3676,7 +3676,7 @@ static loff_t find_desired_extent(struct file *file, loff_t offset, int whence) start = extent_end; last_extent_end = extent_end; path->slots[0]++; - if (fatal_signal_pending(current)) { + if (btrfs_task_interrupted()) { ret = -EINTR; goto out; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index eaa1dbd31352..7c9506ee7be6 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -5,7 +5,6 @@ #include <linux/pagemap.h> #include <linux/sched.h> -#include <linux/sched/signal.h> #include <linux/slab.h> #include <linux/math64.h> #include <linux/ratelimit.h> @@ -3809,7 +3808,7 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group, if (async && *total_trimmed) break; - if (fatal_signal_pending(current)) { + if (btrfs_task_interrupted()) { ret = -ERESTARTSYS; break; } @@ -4000,7 +3999,7 @@ static int trim_bitmaps(struct btrfs_block_group *block_group, } block_group->discard_cursor = start; - if (fatal_signal_pending(current)) { + if (btrfs_task_interrupted()) { if (start != offset) reset_trimming_bitmap(ctl, offset); ret = -ERESTARTSYS; diff --git a/fs/btrfs/misc.h b/fs/btrfs/misc.h index 0d599fd847c9..fcc9715b55fe 100644 --- a/fs/btrfs/misc.h +++ b/fs/btrfs/misc.h @@ -9,6 +9,8 @@ #include <linux/wait.h> #include <linux/math64.h> #include <linux/rbtree.h> +#include <linux/sched/signal.h> +#include <linux/freezer.h> /* * Enumerate bits using enum autoincrement. Define the @name as the n-th bit. @@ -163,4 +165,9 @@ static inline bool bitmap_test_range_all_zero(const unsigned long *addr, return (found_set == start + nbits); } +static inline bool btrfs_task_interrupted(void) +{ + return fatal_signal_pending(current) || freezing(current); +} + #endif diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index df6b93b927cd..f2063d388553 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -564,7 +564,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode, btrfs_release_path(path); key.offset = prev_extent_end; - if (fatal_signal_pending(current)) { + if (btrfs_task_interrupted()) { ret = -EINTR; goto out; } diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index ea4ed85919ec..feafa46ef7a5 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2936,7 +2936,7 @@ noinline int btrfs_should_cancel_balance(const struct btrfs_fs_info *fs_info) { return atomic_read(&fs_info->balance_cancel_req) || atomic_read(&fs_info->reloc_cancel_req) || - fatal_signal_pending(current); + btrfs_task_interrupted(); } ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE);