block: Prevent hung_check firing during long sync IO

Message ID	20200220120527.15082-1-ming.lei@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=lp7R=4I=vger.kernel.org=linux-block-owner@kernel.org> From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Salman Qazi <sqazi@google.com>, Jesse Barnes <jsbarnes@google.com>, Bart Van Assche <bvanassche@acm.org> Subject: [PATCH] block: Prevent hung_check firing during long sync IO Date: Thu, 20 Feb 2020 20:05:27 +0800 Message-Id: <20200220120527.15082-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	block: Prevent hung_check firing during long sync IO \| expand block: Prevent hung_check firing during long sync IO

Message ID

20200220120527.15082-1-ming.lei@redhat.com (mailing list archive)

State

New, archived

Headers

From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
        Salman Qazi <sqazi@google.com>,
        Jesse Barnes <jsbarnes@google.com>,
        Bart Van Assche <bvanassche@acm.org>
Subject: [PATCH] block: Prevent hung_check firing during long sync IO
Date: Thu, 20 Feb 2020 20:05:27 +0800
Message-Id: <20200220120527.15082-1-ming.lei@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

block: Prevent hung_check firing during long sync IO | expand

Commit Message

Ming Lei Feb. 20, 2020, 12:05 p.m. UTC

submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.

So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.

Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jesse Barnes Feb. 20, 2020, 2:43 p.m. UTC | #1

Yeah looks good.

Reviewed-by: Jesse Barnes <jsbarnes@google.com>

A further change (just for readability) might be to factor out these
"don't trigger hangcheck" waits from here and blk_execute_rq() into a
small helper with a descriptive name.

Thanks,
Jesse

On Thu, Feb 20, 2020 at 5:05 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
> may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
> takes up to 100 second on some devices. Also any block I/O operation
> that occurs after the BLKSECDISCARD is submitted will also potentially
> be affected by the hung task timeouts.
>
> So prevent hung_check from firing by taking same approach used
> in blk_execute_rq(), and the wake-up interval is set as half the
> hung_check timer period, which keeps overhead low enough.
>
> Cc: Salman Qazi <sqazi@google.com>
> Cc: Jesse Barnes <jsbarnes@google.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Link: https://lkml.org/lkml/2020/2/12/1193
> Reported-by: Salman Qazi <sqazi@google.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bio.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 94d697217887..c9ce19a86de7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -17,6 +17,7 @@
>  #include <linux/cgroup.h>
>  #include <linux/blk-cgroup.h>
>  #include <linux/highmem.h>
> +#include <linux/sched/sysctl.h>
>
>  #include <trace/events/block.h>
>  #include "blk.h"
> @@ -1019,12 +1020,19 @@ static void submit_bio_wait_endio(struct bio *bio)
>  int submit_bio_wait(struct bio *bio)
>  {
>         DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
> +       unsigned long hang_check;
>
>         bio->bi_private = &done;
>         bio->bi_end_io = submit_bio_wait_endio;
>         bio->bi_opf |= REQ_SYNC;
>         submit_bio(bio);
> -       wait_for_completion_io(&done);
> +
> +       /* Prevent hang_check timer from firing at us during very long I/O */
> +       hang_check = sysctl_hung_task_timeout_secs;
> +       if (hang_check)
> +               while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
> +       else
> +               wait_for_completion_io(&done);
>
>         return blk_status_to_errno(bio->bi_status);
>  }
> --
> 2.20.1
>

Salman Qazi Feb. 20, 2020, 7:04 p.m. UTC | #2

Thank you for doing this.  The fix will help, but it is not a complete solution.

+1 to Jesse's comment about putting into a descriptive helper.  This
functionality might be useful elsewhere.

I'd like to reiterate that the problem is broader.  For instance, if I
hack the IOCTL path to be immune from the hung task watchdog from
firing there, the problem simply moves to another I/O that is behind
the IOCTL.  For instance if I have a dd command doing direct I/O
against the same device that has the IOCTL pending, the dd will hang
in uninterruptible sleep and eventually the hung task timer will fire.

[ 5645.712955] dd              D 634a749ef8dd8750     0  5126   4978
0x00000000 last_sleep: 5568344242277.  last_runnable: 5568342587878
[ 5645.712965]  ffff88005cfb2c00 ffff88007ab14940 ffff88007a6df080
ffff8800614a4b00
[ 5645.712973]  0000000000014940 ffff880076f77a00 ffffffffb1076a57
ffff880000000000
[ 5645.712981]  ffff8800614a4b00 7fffffffffffffff ffff880075402940
ffff880076f77a38
[ 5645.712988] Call Trace:
[ 5645.713000]  [<ffffffffb1076a57>] ? __schedule+0x37a/0x7a2
[ 5645.713005]  [<ffffffffb10766a7>] schedule+0x3f/0x75
[ 5645.713009]  [<ffffffffb1079410>] schedule_timeout+0x53/0x6af
[ 5645.713015]  [<ffffffffb1077425>] io_schedule_timeout+0x6b/0x91
[ 5645.713021]  [<ffffffffb09dd66a>] dio_await_one+0x94/0xdb
[ 5645.713025]  [<ffffffffb09dc668>] __blockdev_direct_IO+0x6d5/0x763
[ 5645.713029]  [<ffffffffb09dbde9>] ? blkdev_direct_IO+0x3b/0x3b
[ 5645.713034]  [<ffffffffb09dbde9>] ? blkdev_direct_IO+0x3b/0x3b
[ 5645.713039]  [<ffffffffb09dbde3>] blkdev_direct_IO+0x35/0x3b
[ 5645.713044]  [<ffffffffb095bf30>]
generic_file_direct_write+0xfb/0x16d
[ 5645.713050]  [<ffffffffb098c661>]
__generic_file_write_iter+0x2b0/0x3c1
[ 5645.713056]  [<ffffffffb0bffe9d>] ? write_port+0xdc/0xdc
[ 5645.713061]  [<ffffffffb0b506de>] ? __clear_user+0x41/0x66
[ 5645.713065]  [<ffffffffb09db78f>] blkdev_write_iter+0xd1/0x13e
[ 5645.713072]  [<ffffffffb0a66977>] SyS_write+0x22d/0x431
[ 5645.713077]  [<ffffffffb107b163>] entry_SYSCALL_64_fastpath+0x31/0xab

 Similarly, I am sure there are filesystem specific paths that can
potentially hang.  I suspect that the flushing code in ext4 might be
one such path.  These are just examples.
Also, my worry persists that we are creating more places for bugs to
hide by blanket immunizing I/O paths from the hung task timer.  Having
said all of that, I don't really have a significantly better in-kernel
solution.

Reviewed-by: Salman Qazi <sqazi@google.com>

On Thu, Feb 20, 2020 at 6:43 AM Jesse Barnes <jsbarnes@google.com> wrote:
>
> Yeah looks good.
>
> Reviewed-by: Jesse Barnes <jsbarnes@google.com>
>
> A further change (just for readability) might be to factor out these
> "don't trigger hangcheck" waits from here and blk_execute_rq() into a
> small helper with a descriptive name.
>
> Thanks,
> Jesse
>
> On Thu, Feb 20, 2020 at 5:05 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
> > may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
> > takes up to 100 second on some devices. Also any block I/O operation
> > that occurs after the BLKSECDISCARD is submitted will also potentially
> > be affected by the hung task timeouts.
> >
> > So prevent hung_check from firing by taking same approach used
> > in blk_execute_rq(), and the wake-up interval is set as half the
> > hung_check timer period, which keeps overhead low enough.
> >
> > Cc: Salman Qazi <sqazi@google.com>
> > Cc: Jesse Barnes <jsbarnes@google.com>
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Link: https://lkml.org/lkml/2020/2/12/1193
> > Reported-by: Salman Qazi <sqazi@google.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/bio.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/bio.c b/block/bio.c
> > index 94d697217887..c9ce19a86de7 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/cgroup.h>
> >  #include <linux/blk-cgroup.h>
> >  #include <linux/highmem.h>
> > +#include <linux/sched/sysctl.h>
> >
> >  #include <trace/events/block.h>
> >  #include "blk.h"
> > @@ -1019,12 +1020,19 @@ static void submit_bio_wait_endio(struct bio *bio)
> >  int submit_bio_wait(struct bio *bio)
> >  {
> >         DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
> > +       unsigned long hang_check;
> >
> >         bio->bi_private = &done;
> >         bio->bi_end_io = submit_bio_wait_endio;
> >         bio->bi_opf |= REQ_SYNC;
> >         submit_bio(bio);
> > -       wait_for_completion_io(&done);
> > +
> > +       /* Prevent hang_check timer from firing at us during very long I/O */
> > +       hang_check = sysctl_hung_task_timeout_secs;
> > +       if (hang_check)
> > +               while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
> > +       else
> > +               wait_for_completion_io(&done);
> >
> >         return blk_status_to_errno(bio->bi_status);
> >  }
> > --
> > 2.20.1
> >

Bart Van Assche Feb. 21, 2020, 2:49 a.m. UTC | #3

On 2020-02-20 04:05, Ming Lei wrote:
>  	submit_bio(bio);
> -	wait_for_completion_io(&done);
> +
> +	/* Prevent hang_check timer from firing at us during very long I/O */
> +	hang_check = sysctl_hung_task_timeout_secs;
> +	if (hang_check)
> +		while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
> +	else
> +		wait_for_completion_io(&done);
>  
>  	return blk_status_to_errno(bio->bi_status);
>  }

Please make this patch compliant with the kernel coding style.
Checkpatch reports the following about the above patch:

WARNING: line over 80 characters
#152: FILE: block/bio.c:1032:
+  while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));

ERROR: trailing statements should be on next line
#152: FILE: block/bio.c:1032:
+  while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));

Thanks,

Bart.

diff --git a/block/bio.c b/block/bio.c
index 94d697217887..c9ce19a86de7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -17,6 +17,7 @@ 
 #include <linux/cgroup.h>
 #include <linux/blk-cgroup.h>
 #include <linux/highmem.h>
+#include <linux/sched/sysctl.h>
 
 #include <trace/events/block.h>
 #include "blk.h"
@@ -1019,12 +1020,19 @@  static void submit_bio_wait_endio(struct bio *bio)
 int submit_bio_wait(struct bio *bio)
 {
 	DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
+	unsigned long hang_check;
 
 	bio->bi_private = &done;
 	bio->bi_end_io = submit_bio_wait_endio;
 	bio->bi_opf |= REQ_SYNC;
 	submit_bio(bio);
-	wait_for_completion_io(&done);
+
+	/* Prevent hang_check timer from firing at us during very long I/O */
+	hang_check = sysctl_hung_task_timeout_secs;
+	if (hang_check)
+		while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
+	else
+		wait_for_completion_io(&done);
 
 	return blk_status_to_errno(bio->bi_status);
 }

block: Prevent hung_check firing during long sync IO

Commit Message

Comments

Patch