Message ID | d99ce2f7c98d4408aea50f515bbb6b89bc7850e8.1569139018.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Optimise io_uring completion waiting | expand |
On Sun, Sep 22, 2019 at 11:08:50AM +0300, Pavel Begunkov (Silence) wrote: > From: Pavel Begunkov <asml.silence@gmail.com> > > Add wait_threshold -- a custom wait_event derivative, that waits until > a value is equal to or greater than the specified threshold. This is quite insufficient justification for this monster... what exact semantics do you want? Why can't you do this exact same with a slightly more complicated @cond ?
Just in case duplicating a mail from the cover-letter thread: It could be done with @cond indeed, that's how it works for now. However, this addresses performance issues only. The problem with wait_event_*() is that, if we have a counter and are trying to wake up tasks after each increment, it would schedule each waiting task O(threshold) times just for it to spuriously check @cond and go back to sleep. All that overhead (memory barriers, registers save/load, accounting, etc) turned out to be enough for some workloads to slow down the system. With this specialisation it still traverses a wait list and makes indirect calls to the checker callback, but the list supposedly is fairly small, so performance there shouldn't be a problem, at least for now. Regarding semantics; It should wake a task when a value passed to wake_up_threshold() is greater or equal then a task's threshold, that is specified individually for each task in wait_threshold_*(). In pseudo code: ``` def wake_up_threshold(n, wait_queue): for waiter in wait_queue: waiter.wake_up_if(n >= waiter.threshold); ``` Any thoughts how to do it better? Ideas are very welcome. BTW, this monster is mostly a copy-paste from wait_event_*(), wait_bit_*(). We could try to extract some common parts from these three, but that's another topic. On 23/09/2019 10:19, Peter Zijlstra wrote: > On Sun, Sep 22, 2019 at 11:08:50AM +0300, Pavel Begunkov (Silence) wrote: >> From: Pavel Begunkov <asml.silence@gmail.com> >> >> Add wait_threshold -- a custom wait_event derivative, that waits until >> a value is equal to or greater than the specified threshold. > > This is quite insufficient justification for this monster... what exact > semantics do you want? > > Why can't you do this exact same with a slightly more complicated @cond > ? >
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? On Mon, Sep 23, 2019 at 07:37:46PM +0300, Pavel Begunkov wrote: > Just in case duplicating a mail from the cover-letter thread: Just because every patch should have a self contained and coherent Changelog. > It could be done with @cond indeed, that's how it works for now. > However, this addresses performance issues only. > > The problem with wait_event_*() is that, if we have a counter and are > trying to wake up tasks after each increment, it would schedule each > waiting task O(threshold) times just for it to spuriously check @cond > and go back to sleep. All that overhead (memory barriers, registers > save/load, accounting, etc) turned out to be enough for some workloads > to slow down the system. > > With this specialisation it still traverses a wait list and makes > indirect calls to the checker callback, but the list supposedly is > fairly small, so performance there shouldn't be a problem, at least for > now. > > Regarding semantics; It should wake a task when a value passed to > wake_up_threshold() is greater or equal then a task's threshold, that is > specified individually for each task in wait_threshold_*(). > > In pseudo code: > ``` > def wake_up_threshold(n, wait_queue): > for waiter in wait_queue: > waiter.wake_up_if(n >= waiter.threshold); > ``` > > Any thoughts how to do it better? Ideas are very welcome. > > BTW, this monster is mostly a copy-paste from wait_event_*(), > wait_bit_*(). We could try to extract some common parts from these > three, but that's another topic. I don't think that is another topic at all. It is a quality of implementation issue. We already have too many copies of all that (3). So basically you want to fudge the wake function to do the/a @cond test, not unlike what wait_bit already does, but differenly. I'm really rather annoyed with C for not having proper lambda functions; that would make all this so much easier. Anyway, let me have a poke at this in the morning, it's late already. Also, is anything actually using wait_queue_entry::private ? I'm not finding any in a hurry.
On Mon, Sep 23, 2019 at 09:27:42PM +0200, Peter Zijlstra wrote: > Also, is anything actually using wait_queue_entry::private ? I'm > not finding any in a hurry. That is; outside of default_wake_function(). I don't see a sensible reason for that field to be 'void *' nor for the name 'private'. But _that_ is something we can fix later.
On 23/09/2019 22:27, Peter Zijlstra wrote: > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > > On Mon, Sep 23, 2019 at 07:37:46PM +0300, Pavel Begunkov wrote: >> Just in case duplicating a mail from the cover-letter thread: > > Just because every patch should have a self contained and coherent > Changelog. Well, I will expand the patch description, if we agree on the implementation (at least conceptually). >> >> BTW, this monster is mostly a copy-paste from wait_event_*(), >> wait_bit_*(). We could try to extract some common parts from these >> three, but that's another topic. > > I don't think that is another topic at all. It is a quality of > implementation issue. We already have too many copies of all that (3). For example, ___wait_event() is copied in ___wait_var_event(). Instead it could accept a wait entry generator or just accept entry from above and be reused in both cases. I've had such a patch, but want to think what else could be done. e.g. ``` #define generic_wait_event(ENTRY_GEN, ...) ENTRY_GEN(wq_entry_name); do_wait_event(wq_entry_name); #define WBQ_ENTRY_GEN(name) struct wait_bit_queue_entry tmp = WBQ_INITIALIZER; struct wait_queue_entry name = &tmp->wq_entry; ``` > > So basically you want to fudge the wake function to do the/a @cond test, > not unlike what wait_bit already does, but differenly. > Yes > I'm really rather annoyed with C for not having proper lambda functions; > that would make all this so much easier. Anyway, let me have a poke at > this in the morning, it's late already. > > Also, is anything actually using wait_queue_entry::private ? I'm > not finding any in a hurry. > >
diff --git a/include/linux/wait_threshold.h b/include/linux/wait_threshold.h new file mode 100644 index 000000000000..d8b054504c26 --- /dev/null +++ b/include/linux/wait_threshold.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_WAIT_THRESHOLD_H +#define _LINUX_WAIT_THRESHOLD_H + +#include <linux/wait.h> + +#define WQ_THRESHOLD_WAKE_ALWAYS (~0ui) + +struct wait_threshold_queue_entry { + struct wait_queue_entry wq_entry; + unsigned int threshold; +}; + + +void init_wait_threshold_entry(struct wait_threshold_queue_entry *wtq_entry, + unsigned int threshold); + +static inline void wake_up_threshold(struct wait_queue_head *wq_head, + unsigned int val) +{ + void *arg = (void *)(unsigned long)val; + + __wake_up(wq_head, TASK_NORMAL, 1, arg); +} + +#define ___wait_threshold_event(q, thresh, condition, state, \ + exclusive, ret, cmd) \ +({ \ + __label__ __out; \ + struct wait_queue_head *__wq_head = &q; \ + struct wait_threshold_queue_entry __wtq_entry; \ + struct wait_queue_entry *__wq_entry = &__wtq_entry.wq_entry; \ + long __ret = ret; /* explicit shadow */ \ + \ + init_wait_threshold_entry(&__wtq_entry, thresh); \ + for (;;) { \ + long __int = prepare_to_wait_event(__wq_head, \ + __wq_entry, \ + state); \ + if (condition) \ + break; \ + \ + if (___wait_is_interruptible(state) && __int) { \ + __ret = __int; \ + goto __out; \ + } \ + \ + cmd; \ + } \ + finish_wait(__wq_head, __wq_entry); \ +__out: __ret; \ +}) + +#define __wait_threshold_interruptible(q, thresh, condition) \ + ___wait_threshold_event(q, thresh, condition, TASK_INTERRUPTIBLE, 0, 0,\ + schedule()) + +#define wait_threshold_interruptible(q, threshold, condition) \ +({ \ + int __ret = 0; \ + might_sleep(); \ + if (!(condition)) \ + __ret = __wait_threshold_interruptible(q, \ + threshold, condition); \ + __ret; \ +}) +#endif /* _LINUX_WAIT_THRESHOLD_H */ diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 21fb5a5662b5..bb895a3184f9 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -18,7 +18,7 @@ endif obj-y += core.o loadavg.o clock.o cputime.o obj-y += idle.o fair.o rt.o deadline.o -obj-y += wait.o wait_bit.o swait.o completion.o +obj-y += wait.o wait_bit.o wait_threshold.o swait.o completion.o obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o diff --git a/kernel/sched/wait_threshold.c b/kernel/sched/wait_threshold.c new file mode 100644 index 000000000000..80a027c02ff3 --- /dev/null +++ b/kernel/sched/wait_threshold.c @@ -0,0 +1,26 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "sched.h" +#include <linux/wait_threshold.h> + +static int wake_threshold_function(struct wait_queue_entry *wq_entry, + unsigned int mode, int sync, void *arg) +{ + unsigned int val = (unsigned int)(unsigned long)arg; + struct wait_threshold_queue_entry *wtq_entry = + container_of(wq_entry, struct wait_threshold_queue_entry, + wq_entry); + + if (val < wtq_entry->threshold) + return 0; + + return default_wake_function(wq_entry, mode, sync, arg); +} + +void init_wait_threshold_entry(struct wait_threshold_queue_entry *wtq_entry, + unsigned int threshold) +{ + init_wait_entry(&wtq_entry->wq_entry, 0); + wtq_entry->wq_entry.func = wake_threshold_function; + wtq_entry->threshold = threshold; +} +EXPORT_SYMBOL(init_wait_threshold_entry);