[1/2] wait: add wq_has_multiple_sleepers helper

Message ID	20190710195227.92322-1-josef@toxicpanda.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Josef Bacik <josef@toxicpanda.com> To: kernel-team@fb.com, axboe@kernel.dk, linux-block@vger.kernel.org Subject: [PATCH 1/2] wait: add wq_has_multiple_sleepers helper Date: Wed, 10 Jul 2019 15:52:26 -0400 Message-Id: <20190710195227.92322-1-josef@toxicpanda.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	[1/2] wait: add wq_has_multiple_sleepers helper \| expand [1/2] wait: add wq_has_multiple_sleepers helper [2/2] rq-qos: fix missed wake-ups in rq_qos_throttle

Josef Bacik July 10, 2019, 7:52 p.m. UTC

rq-qos sits in the io path so we want to take locks as sparingly as
possible.  To accomplish this we try not to take the waitqueue head lock
unless we are sure we need to go to sleep, and we have an optimization
to make sure that we don't starve out existing waiters.  Since we check
if there are existing waiters locklessly we need to be able to update
our view of the waitqueue list after we've added ourselves to the
waitqueue.  Accomplish this by adding this helper to see if there are
more than two waiters on the waitqueue.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 include/linux/wait.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Jens Axboe July 10, 2019, 8:23 p.m. UTC | #1

On 7/10/19 1:52 PM, Josef Bacik wrote:
> rq-qos sits in the io path so we want to take locks as sparingly as
> possible.  To accomplish this we try not to take the waitqueue head lock
> unless we are sure we need to go to sleep, and we have an optimization
> to make sure that we don't starve out existing waiters.  Since we check
> if there are existing waiters locklessly we need to be able to update
> our view of the waitqueue list after we've added ourselves to the
> waitqueue.  Accomplish this by adding this helper to see if there are
> more than two waiters on the waitqueue.
> 
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   include/linux/wait.h | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
> 
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index b6f77cf60dd7..89c41a7b3046 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -126,6 +126,27 @@ static inline int waitqueue_active(struct wait_queue_head *wq_head)
>   	return !list_empty(&wq_head->head);
>   }
>   
> +/**
> + * wq_has_multiple_sleepers - check if there are multiple waiting prcesses
> + * @wq_head: wait queue head
> + *
> + * Returns true of wq_head has multiple waiting processes.
> + *
> + * Please refer to the comment for waitqueue_active.
> + */
> +static inline bool wq_has_multiple_sleepers(struct wait_queue_head *wq_head)
> +{
> +	/*
> +	 * We need to be sure we are in sync with the
> +	 * add_wait_queue modifications to the wait queue.
> +	 *
> +	 * This memory barrier should be paired with one on the
> +	 * waiting side.
> +	 */
> +	smp_mb();
> +	return !list_is_singular(&wq_head->head);
> +}
> +
>   /**
>    * wq_has_sleeper - check if there are any waiting processes
>    * @wq_head: wait queue head

This (and 2/2) looks good to me, better than v1 for sure. Peter/Ingo,
are you OK with adding this new helper? For reference, this (and the
next patch) replace the alternative, which is an open-coding of
prepare_to_wait():

https://lore.kernel.org/linux-block/20190710190514.86911-1-josef@toxicpanda.com/

Peter Zijlstra July 10, 2019, 8:35 p.m. UTC | #2

On Wed, Jul 10, 2019 at 02:23:23PM -0600, Jens Axboe wrote:
> On 7/10/19 1:52 PM, Josef Bacik wrote:
> > rq-qos sits in the io path so we want to take locks as sparingly as
> > possible.  To accomplish this we try not to take the waitqueue head lock
> > unless we are sure we need to go to sleep, and we have an optimization
> > to make sure that we don't starve out existing waiters.  Since we check
> > if there are existing waiters locklessly we need to be able to update
> > our view of the waitqueue list after we've added ourselves to the
> > waitqueue.  Accomplish this by adding this helper to see if there are
> > more than two waiters on the waitqueue.
> > 
> > Suggested-by: Jens Axboe <axboe@kernel.dk>
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >   include/linux/wait.h | 21 +++++++++++++++++++++
> >   1 file changed, 21 insertions(+)
> > 
> > diff --git a/include/linux/wait.h b/include/linux/wait.h
> > index b6f77cf60dd7..89c41a7b3046 100644
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -126,6 +126,27 @@ static inline int waitqueue_active(struct wait_queue_head *wq_head)
> >   	return !list_empty(&wq_head->head);
> >   }
> >   
> > +/**
> > + * wq_has_multiple_sleepers - check if there are multiple waiting prcesses
> > + * @wq_head: wait queue head
> > + *
> > + * Returns true of wq_head has multiple waiting processes.
> > + *
> > + * Please refer to the comment for waitqueue_active.
> > + */
> > +static inline bool wq_has_multiple_sleepers(struct wait_queue_head *wq_head)
> > +{
> > +	/*
> > +	 * We need to be sure we are in sync with the
> > +	 * add_wait_queue modifications to the wait queue.
> > +	 *
> > +	 * This memory barrier should be paired with one on the
> > +	 * waiting side.
> > +	 */
> > +	smp_mb();
> > +	return !list_is_singular(&wq_head->head);
> > +}
> > +
> >   /**
> >    * wq_has_sleeper - check if there are any waiting processes
> >    * @wq_head: wait queue head
> 
> This (and 2/2) looks good to me, better than v1 for sure. Peter/Ingo,
> are you OK with adding this new helper? For reference, this (and the
> next patch) replace the alternative, which is an open-coding of
> prepare_to_wait():
> 
> https://lore.kernel.org/linux-block/20190710190514.86911-1-josef@toxicpanda.com/

Yet another approach would be to have prepare_to_wait*() return this
state, but I think this is ok.

The smp_mb() is superfluous -- in your specific case -- since
preprare_to_wait*() already does one through set_current_state().

So you could do without it, I think.

Jens Axboe July 10, 2019, 8:39 p.m. UTC | #3

On 7/10/19 2:35 PM, Peter Zijlstra wrote:
> On Wed, Jul 10, 2019 at 02:23:23PM -0600, Jens Axboe wrote:
>> On 7/10/19 1:52 PM, Josef Bacik wrote:
>>> rq-qos sits in the io path so we want to take locks as sparingly as
>>> possible.  To accomplish this we try not to take the waitqueue head lock
>>> unless we are sure we need to go to sleep, and we have an optimization
>>> to make sure that we don't starve out existing waiters.  Since we check
>>> if there are existing waiters locklessly we need to be able to update
>>> our view of the waitqueue list after we've added ourselves to the
>>> waitqueue.  Accomplish this by adding this helper to see if there are
>>> more than two waiters on the waitqueue.
>>>
>>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>    include/linux/wait.h | 21 +++++++++++++++++++++
>>>    1 file changed, 21 insertions(+)
>>>
>>> diff --git a/include/linux/wait.h b/include/linux/wait.h
>>> index b6f77cf60dd7..89c41a7b3046 100644
>>> --- a/include/linux/wait.h
>>> +++ b/include/linux/wait.h
>>> @@ -126,6 +126,27 @@ static inline int waitqueue_active(struct wait_queue_head *wq_head)
>>>    	return !list_empty(&wq_head->head);
>>>    }
>>>    
>>> +/**
>>> + * wq_has_multiple_sleepers - check if there are multiple waiting prcesses
>>> + * @wq_head: wait queue head
>>> + *
>>> + * Returns true of wq_head has multiple waiting processes.
>>> + *
>>> + * Please refer to the comment for waitqueue_active.
>>> + */
>>> +static inline bool wq_has_multiple_sleepers(struct wait_queue_head *wq_head)
>>> +{
>>> +	/*
>>> +	 * We need to be sure we are in sync with the
>>> +	 * add_wait_queue modifications to the wait queue.
>>> +	 *
>>> +	 * This memory barrier should be paired with one on the
>>> +	 * waiting side.
>>> +	 */
>>> +	smp_mb();
>>> +	return !list_is_singular(&wq_head->head);
>>> +}
>>> +
>>>    /**
>>>     * wq_has_sleeper - check if there are any waiting processes
>>>     * @wq_head: wait queue head
>>
>> This (and 2/2) looks good to me, better than v1 for sure. Peter/Ingo,
>> are you OK with adding this new helper? For reference, this (and the
>> next patch) replace the alternative, which is an open-coding of
>> prepare_to_wait():
>>
>> https://lore.kernel.org/linux-block/20190710190514.86911-1-josef@toxicpanda.com/
> 
> Yet another approach would be to have prepare_to_wait*() return this
> state, but I think this is ok.

We did discuss that case, but it seems somewhat random to have it
return that specific piece of info. But it'd work for this case.

> The smp_mb() is superfluous -- in your specific case -- since
> preprare_to_wait*() already does one through set_current_state().
> 
> So you could do without it, I think.

But that's specific to this use case. Maybe it's the only one we'll
have, and then it's fine, but as a generic helper it seems safer to
include the same ordering protection as wq_has_sleeper().

Oleg Nesterov July 11, 2019, 11:45 a.m. UTC | #4

Jens,

I managed to convince myself I understand why 2/2 needs this change...
But rq_qos_wait() still looks suspicious to me. Why can't the main loop
"break" right after io_schedule()? rq_qos_wake_function() either sets
data->got_token = true or it doesn't wakeup the waiter sleeping in
io_schedule()

This means that data.got_token = F at the 2nd iteration is only possible
after a spurious wakeup, right? But in this case we need to set state =
TASK_UNINTERRUPTIBLE again to avoid busy-wait looping ?

Oleg.

Oleg Nesterov July 11, 2019, 1:40 p.m. UTC | #5

On 07/11, Oleg Nesterov wrote:
>
> Jens,
>
> I managed to convince myself I understand why 2/2 needs this change...
> But rq_qos_wait() still looks suspicious to me. Why can't the main loop
> "break" right after io_schedule()? rq_qos_wake_function() either sets
> data->got_token = true or it doesn't wakeup the waiter sleeping in
> io_schedule()
>
> This means that data.got_token = F at the 2nd iteration is only possible
> after a spurious wakeup, right? But in this case we need to set state =
> TASK_UNINTERRUPTIBLE again to avoid busy-wait looping ?

Oh. I can be easily wrong, I never read this code before, but it seems to
me there is another unrelated race.

rq_qos_wait() can't rely on finish_wait() because it doesn't necessarily
take wq_head->lock.

rq_qos_wait() inside the main loop does

		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
			finish_wait(&rqw->wait, &data.wq);

			/*
			 * We raced with wbt_wake_function() getting a token,
			 * which means we now have two. Put our local token
			 * and wake anyone else potentially waiting for one.
			 */
			if (data.got_token)
				cleanup_cb(rqw, private_data);
			break;
		}

finish_wait() + "if (data.got_token)" can race with rq_qos_wake_function()
which does

	data->got_token = true;
	list_del_init(&curr->entry);

rq_qos_wait() can see these changes out-of-order: finish_wait() can see
list_empty_careful() == T and avoid wq_head->lock, and in this case the
code above can see data->got_token = false.

No?

and I don't really understand

	has_sleeper = false;

at the end of the main loop. I think it should do "has_sleeper = true",
we need to execute the code above only once, right after prepare_to_wait().
But this is harmless.

Oleg.

Josef Bacik July 11, 2019, 7:21 p.m. UTC | #6

On Thu, Jul 11, 2019 at 03:40:06PM +0200, Oleg Nesterov wrote:
> On 07/11, Oleg Nesterov wrote:
> >
> > Jens,
> >
> > I managed to convince myself I understand why 2/2 needs this change...
> > But rq_qos_wait() still looks suspicious to me. Why can't the main loop
> > "break" right after io_schedule()? rq_qos_wake_function() either sets
> > data->got_token = true or it doesn't wakeup the waiter sleeping in
> > io_schedule()
> >
> > This means that data.got_token = F at the 2nd iteration is only possible
> > after a spurious wakeup, right? But in this case we need to set state =
> > TASK_UNINTERRUPTIBLE again to avoid busy-wait looping ?
> 
> Oh. I can be easily wrong, I never read this code before, but it seems to
> me there is another unrelated race.
> 
> rq_qos_wait() can't rely on finish_wait() because it doesn't necessarily
> take wq_head->lock.
> 
> rq_qos_wait() inside the main loop does
> 
> 		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
> 			finish_wait(&rqw->wait, &data.wq);
> 
> 			/*
> 			 * We raced with wbt_wake_function() getting a token,
> 			 * which means we now have two. Put our local token
> 			 * and wake anyone else potentially waiting for one.
> 			 */
> 			if (data.got_token)
> 				cleanup_cb(rqw, private_data);
> 			break;
> 		}
> 
> finish_wait() + "if (data.got_token)" can race with rq_qos_wake_function()
> which does
> 
> 	data->got_token = true;
> 	list_del_init(&curr->entry);
> 

Argh finish_wait() does __set_current_state, well that's shitty.  I guess we
need to do

data->got_token = true;
smp_wmb()
list_del_init(&curr->entry);

and then do

smp_rmb();
if (data.got_token)
	cleanup_cb(rqw, private_data);

to be safe?

> rq_qos_wait() can see these changes out-of-order: finish_wait() can see
> list_empty_careful() == T and avoid wq_head->lock, and in this case the
> code above can see data->got_token = false.
> 
> No?
> 
> and I don't really understand
> 
> 	has_sleeper = false;
> 
> at the end of the main loop. I think it should do "has_sleeper = true",
> we need to execute the code above only once, right after prepare_to_wait().
> But this is harmless.

We want has_sleeper = false because the second time around we just want to grab
the inflight counter.  Yes we should have been worken up by our special thing
and so should already have data.got_token, but that sort of thinking ends in
hung boxes and me having to try to mitigate thousands of boxes suddenly hitting
a case we didn't think was possible.  Thanks,

Josef

Oleg Nesterov July 12, 2019, 8:05 a.m. UTC | #7

On 07/11, Josef Bacik wrote:
>
> On Thu, Jul 11, 2019 at 03:40:06PM +0200, Oleg Nesterov wrote:
> > rq_qos_wait() inside the main loop does
> >
> > 		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
> > 			finish_wait(&rqw->wait, &data.wq);
> >
> > 			/*
> > 			 * We raced with wbt_wake_function() getting a token,
> > 			 * which means we now have two. Put our local token
> > 			 * and wake anyone else potentially waiting for one.
> > 			 */
> > 			if (data.got_token)
> > 				cleanup_cb(rqw, private_data);
> > 			break;
> > 		}
> >
> > finish_wait() + "if (data.got_token)" can race with rq_qos_wake_function()
> > which does
> >
> > 	data->got_token = true;
> > 	list_del_init(&curr->entry);
> >
>
> Argh finish_wait() does __set_current_state, well that's shitty.

Hmm. I think this is irrelevant,

> data->got_token = true;
> smp_wmb()
> list_del_init(&curr->entry);
>
> and then do
>
> smp_rmb();
> if (data.got_token)
> 	cleanup_cb(rqw, private_data);

Yes, this should work,

> > and I don't really understand
> >
> > 	has_sleeper = false;
> >
> > at the end of the main loop. I think it should do "has_sleeper = true",
> > we need to execute the code above only once, right after prepare_to_wait().
> > But this is harmless.
>
> We want has_sleeper = false because the second time around we just want to grab
> the inflight counter.

I don't think so.

> Yes we should have been worken up by our special thing
> and so should already have data.got_token,

Yes. Again, unless wakeup was spurious and this needs another trivial fix.

If we can't rely on this then this code is simply broken?

> but that sort of thinking ends in
> hung boxes and me having to try to mitigate thousands of boxes suddenly hitting
> a case we didn't think was possible.  Thanks,

I can't understand this logic, but I can't argue. However, in this case I'd
suggest the patch below instead of this series.

If rq_qos_wait() does the unnecessary acquire_inflight_cb() because it can
hit a case we didn't think was possible, then why can't it do on the first
iteration for the same reason? This should equally fix the problem and
simplify the code.

In case it is not clear: no, I don't like it. Just I can't understand your
logic.

And btw... again, I won't argue, but wq_has_multiple_sleepers is badly named,
and the comments are simply wrong. It can return T if wq has no sleepers, iow
if list_empty(wq_head->head). 2/2 actualy uses !wq_has_multiple_sleepers(),
this turns the condition back into list_is_singular(), but to me this alll
looks very confusing.

Plus I too do not understand smp_mb() in this helper.

Oleg.

--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -247,7 +247,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 	do {
 		if (data.got_token)
 			break;
-		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
+		if (acquire_inflight_cb(rqw, private_data)) {
 			finish_wait(&rqw->wait, &data.wq);

 			/*
@@ -260,7 +260,6 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 			break;
 		}
 		io_schedule();
-		has_sleeper = false;
 	} while (1);
 	finish_wait(&rqw->wait, &data.wq);
 }

[1/2] wait: add wq_has_multiple_sleepers helper

Commit Message

Comments

Patch