diff mbox series

btrfs: prevent hung check firing during long sync IO

Message ID 20200819101451.24266-1-tian.xianting@h3c.com (mailing list archive)
State New, archived
Headers show
Series btrfs: prevent hung check firing during long sync IO | expand

Commit Message

Tianxianting Aug. 19, 2020, 10:14 a.m. UTC
For sync and flush io, it may take long time to complete.
So it's better to use wait_for_completion_io_timeout() in a
while loop to avoid prevent hung check and crash(when set
/proc/sys/kernel/hung_task_panic).

This is similar to prevent hung task check in submit_bio_wait(),
blk_execute_rq().

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
---
 fs/btrfs/disk-io.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

David Sterba Aug. 19, 2020, 1:28 p.m. UTC | #1
On Wed, Aug 19, 2020 at 06:14:51PM +0800, Xianting Tian wrote:
> For sync and flush io, it may take long time to complete.
> So it's better to use wait_for_completion_io_timeout() in a
> while loop to avoid prevent hung check and crash(when set
> /proc/sys/kernel/hung_task_panic).

I wonder if long running IO should trigger the panic/kill of the task at
all. A warning means that the system is under load but as long as it's
making some progress it should be ok, and that seems to be a separate
case from a task that's not making any progress (and terminating it is
probably the best option).

> This is similar to prevent hung task check in submit_bio_wait(),
> blk_execute_rq().

I see, adding that workaround to btrfs would be 3rd occurence and this
should go into a wrapper, eg. wait_for_completion_io_nowarn with
examples where this should be used.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9ae25f632..1eb560de0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -17,6 +17,7 @@ 
 #include <linux/error-injection.h>
 #include <linux/crc32c.h>
 #include <linux/sched/mm.h>
+#include <linux/sched/sysctl.h>
 #include <asm/unaligned.h>
 #include <crypto/hash.h>
 #include "ctree.h"
@@ -3699,12 +3700,21 @@  static void write_dev_flush(struct btrfs_device *device)
 static blk_status_t wait_dev_flush(struct btrfs_device *device)
 {
 	struct bio *bio = device->flush_bio;
+	unsigned long hang_check;
 
 	if (!test_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state))
 		return BLK_STS_OK;
 
 	clear_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state);
-	wait_for_completion_io(&device->flush_wait);
+
+	/* Prevent hang_check timer from firing at us during very long I/O */
+	hang_check = sysctl_hung_task_timeout_secs;
+	if (hang_check)
+		while (!wait_for_completion_io_timeout(&device->flush_wait,
+						hang_check * (HZ/2)))
+			;
+	else
+		wait_for_completion_io(&device->flush_wait);
 
 	return bio->bi_status;
 }