Message ID | 20200819101451.24266-1-tian.xianting@h3c.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: prevent hung check firing during long sync IO | expand |
On Wed, Aug 19, 2020 at 06:14:51PM +0800, Xianting Tian wrote: > For sync and flush io, it may take long time to complete. > So it's better to use wait_for_completion_io_timeout() in a > while loop to avoid prevent hung check and crash(when set > /proc/sys/kernel/hung_task_panic). I wonder if long running IO should trigger the panic/kill of the task at all. A warning means that the system is under load but as long as it's making some progress it should be ok, and that seems to be a separate case from a task that's not making any progress (and terminating it is probably the best option). > This is similar to prevent hung task check in submit_bio_wait(), > blk_execute_rq(). I see, adding that workaround to btrfs would be 3rd occurence and this should go into a wrapper, eg. wait_for_completion_io_nowarn with examples where this should be used.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9ae25f632..1eb560de0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -17,6 +17,7 @@ #include <linux/error-injection.h> #include <linux/crc32c.h> #include <linux/sched/mm.h> +#include <linux/sched/sysctl.h> #include <asm/unaligned.h> #include <crypto/hash.h> #include "ctree.h" @@ -3699,12 +3700,21 @@ static void write_dev_flush(struct btrfs_device *device) static blk_status_t wait_dev_flush(struct btrfs_device *device) { struct bio *bio = device->flush_bio; + unsigned long hang_check; if (!test_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state)) return BLK_STS_OK; clear_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state); - wait_for_completion_io(&device->flush_wait); + + /* Prevent hang_check timer from firing at us during very long I/O */ + hang_check = sysctl_hung_task_timeout_secs; + if (hang_check) + while (!wait_for_completion_io_timeout(&device->flush_wait, + hang_check * (HZ/2))) + ; + else + wait_for_completion_io(&device->flush_wait); return bio->bi_status; }
For sync and flush io, it may take long time to complete. So it's better to use wait_for_completion_io_timeout() in a while loop to avoid prevent hung check and crash(when set /proc/sys/kernel/hung_task_panic). This is similar to prevent hung task check in submit_bio_wait(), blk_execute_rq(). Signed-off-by: Xianting Tian <tian.xianting@h3c.com> --- fs/btrfs/disk-io.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)