diff mbox series

[1/2] completion: move blk_wait_io to kernel/sched/completion.c

Message ID 31b118f3-bc8d-b18b-c4b9-e57d74a73f@redhat.com (mailing list archive)
State New
Headers show
Series [1/2] completion: move blk_wait_io to kernel/sched/completion.c | expand

Commit Message

Mikulas Patocka April 17, 2024, 5:49 p.m. UTC
The block layer has a function blk_wait_io - it works like
wait_for_completion_io, except that it doesn't warn if the wait takes too
long. This commit renames the function to wait_for_completion_long_io and
moves it to kernel/sched/completion.c so that other kernel subsystems can
use it. It will be needed by the dm-io subsystem.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 block/bio.c                |    2 +-
 block/blk-mq.c             |    2 +-
 block/blk.h                |   12 ------------
 include/linux/completion.h |    1 +
 kernel/sched/completion.c  |   20 ++++++++++++++++++++
 5 files changed, 23 insertions(+), 14 deletions(-)

Comments

Peter Zijlstra April 17, 2024, 5:55 p.m. UTC | #1
On Wed, Apr 17, 2024 at 07:49:17PM +0200, Mikulas Patocka wrote:
> Index: linux-2.6/kernel/sched/completion.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
> +++ linux-2.6/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
> @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str
>  EXPORT_SYMBOL(wait_for_completion_killable_timeout);
>  
>  /**
> + * wait_for_completion_long_io - waits for completion of a task
> + * @x:  holds the state of this particular completion
> + *
> + * This is like wait_for_completion_io, but it doesn't warn if the wait takes
> + * too long.
> + */
> +void wait_for_completion_long_io(struct completion *x)
> +{
> +	/* Prevent hang_check timer from firing at us during very long I/O */
> +	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
> +
> +	if (timeout)
> +		while (!wait_for_completion_io_timeout(x, timeout))
> +			;
> +	else
> +		wait_for_completion_io(x);
> +}
> +EXPORT_SYMBOL(wait_for_completion_long_io);

Urgh, why is it a sane thing to circumvent the hang check timer?
Mikulas Patocka April 17, 2024, 6 p.m. UTC | #2
On Wed, 17 Apr 2024, Peter Zijlstra wrote:

> On Wed, Apr 17, 2024 at 07:49:17PM +0200, Mikulas Patocka wrote:
> > Index: linux-2.6/kernel/sched/completion.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
> > +++ linux-2.6/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
> > @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str
> >  EXPORT_SYMBOL(wait_for_completion_killable_timeout);
> >  
> >  /**
> > + * wait_for_completion_long_io - waits for completion of a task
> > + * @x:  holds the state of this particular completion
> > + *
> > + * This is like wait_for_completion_io, but it doesn't warn if the wait takes
> > + * too long.
> > + */
> > +void wait_for_completion_long_io(struct completion *x)
> > +{
> > +	/* Prevent hang_check timer from firing at us during very long I/O */
> > +	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
> > +
> > +	if (timeout)
> > +		while (!wait_for_completion_io_timeout(x, timeout))
> > +			;
> > +	else
> > +		wait_for_completion_io(x);
> > +}
> > +EXPORT_SYMBOL(wait_for_completion_long_io);
> 
> Urgh, why is it a sane thing to circumvent the hang check timer? 

The block layer already does it - the bios can have arbitrary size, so 
waiting for them takes arbitrary time.

Mikulas
Christoph Hellwig April 18, 2024, 4:57 a.m. UTC | #3
On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote:
> > > +EXPORT_SYMBOL(wait_for_completion_long_io);
> > 
> > Urgh, why is it a sane thing to circumvent the hang check timer? 
> 
> The block layer already does it - the bios can have arbitrary size, so 
> waiting for them takes arbitrary time.

And as mentioned the last few times around, I think we want a task
state to say that task can sleep long or even forever and not propagate
this hack even further.
Jens Axboe April 18, 2024, 2:30 p.m. UTC | #4
On 4/17/24 10:57 PM, Christoph Hellwig wrote:
> On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote:
>>>> +EXPORT_SYMBOL(wait_for_completion_long_io);
>>>
>>> Urgh, why is it a sane thing to circumvent the hang check timer? 
>>
>> The block layer already does it - the bios can have arbitrary size, so 
>> waiting for them takes arbitrary time.
> 
> And as mentioned the last few times around, I think we want a task
> state to say that task can sleep long or even forever and not propagate
> this hack even further.

It certainly is a hack/work-around, but unless there are a lot more that
should be using something like this, I don't think adding extra core
complexity in terms of a special task state (or per-task flag, at least
that would be easier) is really warranted.
Christoph Hellwig April 18, 2024, 2:46 p.m. UTC | #5
On Thu, Apr 18, 2024 at 08:30:14AM -0600, Jens Axboe wrote:
> It certainly is a hack/work-around, but unless there are a lot more that
> should be using something like this, I don't think adding extra core
> complexity in terms of a special task state (or per-task flag, at least
> that would be easier) is really warranted.

Basically any kernel thread doing on-demand work has the same problem.
It just has an easier workaround hack, as the kernel threads can simply
claim to do an interruptible sleep to not trigger the softlockup
warnings.
Jens Axboe April 18, 2024, 3:09 p.m. UTC | #6
On 4/18/24 8:46 AM, Christoph Hellwig wrote:
> On Thu, Apr 18, 2024 at 08:30:14AM -0600, Jens Axboe wrote:
>> It certainly is a hack/work-around, but unless there are a lot more that
>> should be using something like this, I don't think adding extra core
>> complexity in terms of a special task state (or per-task flag, at least
>> that would be easier) is really warranted.
> 
> Basically any kernel thread doing on-demand work has the same problem.
> It just has an easier workaround hack, as the kernel threads can simply
> claim to do an interruptible sleep to not trigger the softlockup
> warnings.

A kernel thread can just use TASK_INTERRUPTIBLE, as it doesn't take
signals anyway. But yeah, I guess you could view that as a work-around
as well.

Outside of that, mostly only a block problem, where our sleep is always
uninterruptible. Unless there are similar hacks elsewhere in the kernel
that I'm not aware of?
Peter Zijlstra April 22, 2024, 10:54 a.m. UTC | #7
On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote:
> 
> 
> On Wed, 17 Apr 2024, Peter Zijlstra wrote:
> 
> > On Wed, Apr 17, 2024 at 07:49:17PM +0200, Mikulas Patocka wrote:
> > > Index: linux-2.6/kernel/sched/completion.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
> > > +++ linux-2.6/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
> > > @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str
> > >  EXPORT_SYMBOL(wait_for_completion_killable_timeout);
> > >  
> > >  /**
> > > + * wait_for_completion_long_io - waits for completion of a task
> > > + * @x:  holds the state of this particular completion
> > > + *
> > > + * This is like wait_for_completion_io, but it doesn't warn if the wait takes
> > > + * too long.
> > > + */
> > > +void wait_for_completion_long_io(struct completion *x)
> > > +{
> > > +	/* Prevent hang_check timer from firing at us during very long I/O */
> > > +	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
> > > +
> > > +	if (timeout)
> > > +		while (!wait_for_completion_io_timeout(x, timeout))
> > > +			;
> > > +	else
> > > +		wait_for_completion_io(x);
> > > +}
> > > +EXPORT_SYMBOL(wait_for_completion_long_io);
> > 
> > Urgh, why is it a sane thing to circumvent the hang check timer? 
> 
> The block layer already does it - the bios can have arbitrary size, so 
> waiting for them takes arbitrary time.

Yeah, but now you make it generic and your comment doesn't warn people
away, it makes them think this is a sane thing to do.
Peter Zijlstra April 22, 2024, 10:59 a.m. UTC | #8
On Wed, Apr 17, 2024 at 09:57:04PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote:
> > > > +EXPORT_SYMBOL(wait_for_completion_long_io);
> > > 
> > > Urgh, why is it a sane thing to circumvent the hang check timer? 
> > 
> > The block layer already does it - the bios can have arbitrary size, so 
> > waiting for them takes arbitrary time.
> 
> And as mentioned the last few times around, I think we want a task
> state to say that task can sleep long or even forever and not propagate
> this hack even further.

A bit like TASK_NOLOAD (which is used to make TASK_IDLE work), but
different I suppose.

TASK_NOHUNG would be trivial to add ofc. But is it worth it?

Anyway, as per the other email, anything like this needs to come with a
big fat warning. You get to keep the pieces etc..

---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3c2abbc587b4..83b25327c233 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -112,7 +112,8 @@ struct user_event_mm;
 #define TASK_FREEZABLE			0x00002000
 #define __TASK_FREEZABLE_UNSAFE	       (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP))
 #define TASK_FROZEN			0x00008000
-#define TASK_STATE_MAX			0x00010000
+#define TASK_NOHUNG			0x00010000
+#define TASK_STATE_MAX			0x00020000
 
 #define TASK_ANY			(TASK_STATE_MAX-1)
 
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index b2fc2727d654..126fac835e5e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -210,7 +210,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		state = READ_ONCE(t->__state);
 		if ((state & TASK_UNINTERRUPTIBLE) &&
 		    !(state & TASK_WAKEKILL) &&
-		    !(state & TASK_NOLOAD))
+		    !(state & TASK_NOLOAD) &&
+		    !(state & TASK_NOHUNG))
 			check_hung_task(t, timeout);
 	}
  unlock:
Mikulas Patocka April 23, 2024, 12:36 p.m. UTC | #9
On Mon, 22 Apr 2024, Peter Zijlstra wrote:

> On Wed, Apr 17, 2024 at 09:57:04PM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote:
> > > > > +EXPORT_SYMBOL(wait_for_completion_long_io);
> > > > 
> > > > Urgh, why is it a sane thing to circumvent the hang check timer? 
> > > 
> > > The block layer already does it - the bios can have arbitrary size, so 
> > > waiting for them takes arbitrary time.
> > 
> > And as mentioned the last few times around, I think we want a task
> > state to say that task can sleep long or even forever and not propagate
> > this hack even further.
> 
> A bit like TASK_NOLOAD (which is used to make TASK_IDLE work), but
> different I suppose.
> 
> TASK_NOHUNG would be trivial to add ofc. But is it worth it?
> 
> Anyway, as per the other email, anything like this needs to come with a
> big fat warning. You get to keep the pieces etc..

This seems better than the blk_wait_io hack.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>

> ---
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 3c2abbc587b4..83b25327c233 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -112,7 +112,8 @@ struct user_event_mm;
>  #define TASK_FREEZABLE			0x00002000
>  #define __TASK_FREEZABLE_UNSAFE	       (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP))
>  #define TASK_FROZEN			0x00008000
> -#define TASK_STATE_MAX			0x00010000
> +#define TASK_NOHUNG			0x00010000
> +#define TASK_STATE_MAX			0x00020000
>  
>  #define TASK_ANY			(TASK_STATE_MAX-1)
>  
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index b2fc2727d654..126fac835e5e 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -210,7 +210,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  		state = READ_ONCE(t->__state);
>  		if ((state & TASK_UNINTERRUPTIBLE) &&
>  		    !(state & TASK_WAKEKILL) &&
> -		    !(state & TASK_NOLOAD))
> +		    !(state & TASK_NOLOAD) &&
> +		    !(state & TASK_NOHUNG))
>  			check_hung_task(t, timeout);
>  	}
>   unlock:
>
Christoph Hellwig April 26, 2024, 6:06 a.m. UTC | #10
On Mon, Apr 22, 2024 at 12:59:56PM +0200, Peter Zijlstra wrote:
> A bit like TASK_NOLOAD (which is used to make TASK_IDLE work), but
> different I suppose.
> 
> TASK_NOHUNG would be trivial to add ofc. But is it worth it?

Yes.  And it would allow us to kill the horrible existing block hack.
diff mbox series

Patch

Index: linux-2.6/block/blk.h
===================================================================
--- linux-2.6.orig/block/blk.h	2024-04-17 19:41:14.000000000 +0200
+++ linux-2.6/block/blk.h	2024-04-17 19:41:14.000000000 +0200
@@ -72,18 +72,6 @@  static inline int bio_queue_enter(struct
 	return __bio_queue_enter(q, bio);
 }
 
-static inline void blk_wait_io(struct completion *done)
-{
-	/* Prevent hang_check timer from firing at us during very long I/O */
-	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
-
-	if (timeout)
-		while (!wait_for_completion_io_timeout(done, timeout))
-			;
-	else
-		wait_for_completion_io(done);
-}
-
 #define BIO_INLINE_VECS 4
 struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
 		gfp_t gfp_mask);
Index: linux-2.6/include/linux/completion.h
===================================================================
--- linux-2.6.orig/include/linux/completion.h	2024-04-17 19:41:14.000000000 +0200
+++ linux-2.6/include/linux/completion.h	2024-04-17 19:41:14.000000000 +0200
@@ -112,6 +112,7 @@  extern long wait_for_completion_interrup
 	struct completion *x, unsigned long timeout);
 extern long wait_for_completion_killable_timeout(
 	struct completion *x, unsigned long timeout);
+extern void wait_for_completion_long_io(struct completion *x);
 extern bool try_wait_for_completion(struct completion *x);
 extern bool completion_done(struct completion *x);
 
Index: linux-2.6/block/bio.c
===================================================================
--- linux-2.6.orig/block/bio.c	2024-04-17 19:41:14.000000000 +0200
+++ linux-2.6/block/bio.c	2024-04-17 19:41:14.000000000 +0200
@@ -1378,7 +1378,7 @@  int submit_bio_wait(struct bio *bio)
 	bio->bi_end_io = submit_bio_wait_endio;
 	bio->bi_opf |= REQ_SYNC;
 	submit_bio(bio);
-	blk_wait_io(&done);
+	wait_for_completion_long_io(&done);
 
 	return blk_status_to_errno(bio->bi_status);
 }
Index: linux-2.6/block/blk-mq.c
===================================================================
--- linux-2.6.orig/block/blk-mq.c	2024-04-17 19:41:14.000000000 +0200
+++ linux-2.6/block/blk-mq.c	2024-04-17 19:41:14.000000000 +0200
@@ -1407,7 +1407,7 @@  blk_status_t blk_execute_rq(struct reque
 	if (blk_rq_is_poll(rq))
 		blk_rq_poll_completion(rq, &wait.done);
 	else
-		blk_wait_io(&wait.done);
+		wait_for_completion_long_io(&wait.done);
 
 	return wait.ret;
 }
Index: linux-2.6/kernel/sched/completion.c
===================================================================
--- linux-2.6.orig/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
+++ linux-2.6/kernel/sched/completion.c	2024-04-17 19:41:14.000000000 +0200
@@ -290,6 +290,26 @@  wait_for_completion_killable_timeout(str
 EXPORT_SYMBOL(wait_for_completion_killable_timeout);
 
 /**
+ * wait_for_completion_long_io - waits for completion of a task
+ * @x:  holds the state of this particular completion
+ *
+ * This is like wait_for_completion_io, but it doesn't warn if the wait takes
+ * too long.
+ */
+void wait_for_completion_long_io(struct completion *x)
+{
+	/* Prevent hang_check timer from firing at us during very long I/O */
+	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
+
+	if (timeout)
+		while (!wait_for_completion_io_timeout(x, timeout))
+			;
+	else
+		wait_for_completion_io(x);
+}
+EXPORT_SYMBOL(wait_for_completion_long_io);
+
+/**
  *	try_wait_for_completion - try to decrement a completion without blocking
  *	@x:	completion structure
  *