diff mbox series

[RFC] sbitmap: fix batching wakeup

Message ID 20230721095715.232728-1-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC] sbitmap: fix batching wakeup | expand

Commit Message

Ming Lei July 21, 2023, 9:57 a.m. UTC
From: David Jeffery <djeffery@redhat.com>

Current code supposes that it is enough to provide forward progress by just
waking up one wait queue after one completion batch is done.

Unfortunately this way isn't enough, cause waiter can be added to
wait queue just after it is woken up.

Follows one example(64 depth, wake_batch is 8)

1) all 64 tags are active

2) in each wait queue, there is only one single waiter

3) each time one completion batch(8 completions) wakes up just one waiter in each wait
queue, then immediately one new sleeper is added to this wait queue

4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
wait queue

5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
can't be waken up anymore.

Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
single batch.

Cc: David Jeffery <djeffery@redhat.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 lib/sbitmap.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Comments

Keith Busch July 21, 2023, 10:40 a.m. UTC | #1
On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>

...
 
> Cc: David Jeffery <djeffery@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Shouldn't the author include their sign off? Or is this supposed to be
from you?
Ming Lei July 21, 2023, 10:50 a.m. UTC | #2
On Fri, Jul 21, 2023 at 12:40:19PM +0200, Keith Busch wrote:
> On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
> > From: David Jeffery <djeffery@redhat.com>
> 
> ...
>  
> > Cc: David Jeffery <djeffery@redhat.com>
> > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> Shouldn't the author include their sign off? Or is this supposed to be
> from you?

I understand signed-off-by needs to be explicit from David, and let's
focus on patch itself. And I believe David can reply with his
signed-off-by when the patch is ready to go.


Thanks,
Ming
Keith Busch July 21, 2023, 11:51 a.m. UTC | #3
On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
>  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  {
> -	int i, wake_index;
> +	int i, wake_index, woken;
>  
>  	if (!atomic_read(&sbq->ws_active))
>  		return;
> @@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  		 */
>  		wake_index = sbq_index_inc(wake_index);
>  
> -		/*
> -		 * It is sufficient to wake up at least one waiter to
> -		 * guarantee forward progress.
> -		 */
> -		if (waitqueue_active(&ws->wait) &&
> -		    wake_up_nr(&ws->wait, nr))
> -			break;
> +		if (waitqueue_active(&ws->wait)) {
> +			woken = wake_up_nr(&ws->wait, nr);
> +			if (woken == nr)
> +				break;
> +			nr -= woken;
> +		}
>  	}

This looks good. I had something similiar at one point, but after all
the churn this file had gone through, I somehow convinced myself it
wasn't necessary anymore. Your analysis and fix look correct to me!

Reviewed-by: Keith Busch <kbusch@kernel.org>
Gabriel Krisman Bertazi July 21, 2023, 4:35 p.m. UTC | #4
Ming Lei <ming.lei@redhat.com> writes:

> From: David Jeffery <djeffery@redhat.com>
>
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
>
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
>
> Follows one example(64 depth, wake_batch is 8)
>
> 1) all 64 tags are active
>
> 2) in each wait queue, there is only one single waiter
>
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
>
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue
>
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
>
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.

yes, I think this makes sense. When working on this algorithm I remember
I considered it (thus wake_up_nr being ready), but ended up believing it
wasn't needed.  please take:

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>

I wonder how likely it is to reach it.  Did you get a bug report?

Thanks,
Jens Axboe July 21, 2023, 5:29 p.m. UTC | #5
On 7/21/23 3:57?AM, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
> Follows one example(64 depth, wake_batch is 8)
> 
> 1) all 64 tags are active
> 
> 2) in each wait queue, there is only one single waiter
> 
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
> 
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue
> 
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
> 
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.

Change looks good, but commit message needs:

1) Signed-off-by from author
2) To be wrapped at ~72 chars, it's got lines that are way too long.

I can edit the commit message, but David does need to reply with his SOB
before it can get applied.
David Jeffery July 21, 2023, 5:38 p.m. UTC | #6
On Fri, Jul 21, 2023 at 6:50 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Fri, Jul 21, 2023 at 12:40:19PM +0200, Keith Busch wrote:
> > On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
> > > From: David Jeffery <djeffery@redhat.com>
> >
> > ...
> >
> > > Cc: David Jeffery <djeffery@redhat.com>
> > > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >
> > Shouldn't the author include their sign off? Or is this supposed to be
> > from you?
>
> I understand signed-off-by needs to be explicit from David, and let's
> focus on patch itself. And I believe David can reply with his
> signed-off-by when the patch is ready to go.
>
>
> Thanks,
> Ming
>

Thank you for the initial checking of the patch, Ming. I appreciate
your patience and wisdom looking over my ideas. Please add my
signed-off-by to the patch.

Signed-off-by: David Jeffery <djeffery@redhat.com>

Regards,
David Jeffery
Jens Axboe July 21, 2023, 5:40 p.m. UTC | #7
On Fri, 21 Jul 2023 17:57:15 +0800, Ming Lei wrote:
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
> Follows one example(64 depth, wake_batch is 8)
> 
> [...]

Applied, thanks!

[1/1] sbitmap: fix batching wakeup
      commit: 106397376c0369fcc01c58dd189ff925a2724a57

Best regards,
Ming Lei July 22, 2023, 2:42 a.m. UTC | #8
On Fri, Jul 21, 2023 at 12:35:43PM -0400, Gabriel Krisman Bertazi wrote:
> Ming Lei <ming.lei@redhat.com> writes:
> 
> > From: David Jeffery <djeffery@redhat.com>
> >
> > Current code supposes that it is enough to provide forward progress by just
> > waking up one wait queue after one completion batch is done.
> >
> > Unfortunately this way isn't enough, cause waiter can be added to
> > wait queue just after it is woken up.
> >
> > Follows one example(64 depth, wake_batch is 8)
> >
> > 1) all 64 tags are active
> >
> > 2) in each wait queue, there is only one single waiter
> >
> > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > queue, then immediately one new sleeper is added to this wait queue
> >
> > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > wait queue
> >
> > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > can't be waken up anymore.
> >
> > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > single batch.
> 
> yes, I think this makes sense. When working on this algorithm I remember
> I considered it (thus wake_up_nr being ready), but ended up believing it
> wasn't needed.  please take:
> 
> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> I wonder how likely it is to reach it.  Did you get a bug report?

It was reported from one RH customer, and I am also hit once
when running dbench on loop(bfq) over scsi_debug in my routine test.

David figured out the idea, and we discussed other solutions too, but
turns out others can't work, and the above (extreme)example seems easier
to follow, from me.

Per David's early analysis, it should be easier to trigger since commit
26edb30dd1c0 ("sbitmap: Try each queue to wake up at least one waiter")
because 'wake_index' isn't updated before calling wake_up_nr(), then
multiple completion batch may only wakeup one same wait queue, meantime
multiple sleepers are added to same wait queue.


Thanks, 
Ming
Jan Kara Aug. 2, 2023, 4:05 p.m. UTC | #9
On Fri 21-07-23 17:57:15, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
> Follows one example(64 depth, wake_batch is 8)
> 
> 1) all 64 tags are active
> 
> 2) in each wait queue, there is only one single waiter
> 
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
> 
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue
> 
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
> 
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.
> 
> Cc: David Jeffery <djeffery@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

I'm sorry for the delay - I was on vacation. I can see the patch got
already merged and I'm not strictly against that (although I think Gabriel
was experimenting with this exact wakeup scheme and as far as I remember
the more eager waking up was causing performance decrease for some
configurations). But let me challenge the analysis above a bit. For the
sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
fail the blk_mq_get_driver_tag() call. Which means that all tags were used
at that moment. To summarize, anytime we add any new waiter to the
waitqueue, all tags are used and thus we should eventually receive enough
wakeups to wake all of them. What am I missing?

								Honza

> ---
>  lib/sbitmap.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index eff4e42c425a..d0a5081dfd12 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -550,7 +550,7 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
>  
>  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  {
> -	int i, wake_index;
> +	int i, wake_index, woken;
>  
>  	if (!atomic_read(&sbq->ws_active))
>  		return;
> @@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  		 */
>  		wake_index = sbq_index_inc(wake_index);
>  
> -		/*
> -		 * It is sufficient to wake up at least one waiter to
> -		 * guarantee forward progress.
> -		 */
> -		if (waitqueue_active(&ws->wait) &&
> -		    wake_up_nr(&ws->wait, nr))
> -			break;
> +		if (waitqueue_active(&ws->wait)) {
> +			woken = wake_up_nr(&ws->wait, nr);
> +			if (woken == nr)
> +				break;
> +			nr -= woken;
> +		}
>  	}
>  
>  	if (wake_index != atomic_read(&sbq->wake_index))
> -- 
> 2.40.1
>
Ming Lei Aug. 8, 2023, 8:18 a.m. UTC | #10
On Wed, Aug 02, 2023 at 06:05:53PM +0200, Jan Kara wrote:
> On Fri 21-07-23 17:57:15, Ming Lei wrote:
> > From: David Jeffery <djeffery@redhat.com>
> > 
> > Current code supposes that it is enough to provide forward progress by just
> > waking up one wait queue after one completion batch is done.
> > 
> > Unfortunately this way isn't enough, cause waiter can be added to
> > wait queue just after it is woken up.
> > 
> > Follows one example(64 depth, wake_batch is 8)
> > 
> > 1) all 64 tags are active
> > 
> > 2) in each wait queue, there is only one single waiter
> > 
> > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > queue, then immediately one new sleeper is added to this wait queue
> > 
> > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > wait queue
> > 
> > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > can't be waken up anymore.
> > 
> > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > single batch.
> > 
> > Cc: David Jeffery <djeffery@redhat.com>
> > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> I'm sorry for the delay - I was on vacation. I can see the patch got
> already merged and I'm not strictly against that (although I think Gabriel
> was experimenting with this exact wakeup scheme and as far as I remember
> the more eager waking up was causing performance decrease for some
> configurations). But let me challenge the analysis above a bit. For the
> sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
> fail the blk_mq_get_driver_tag() call. Which means that all tags were used

Here only allocating request by blk_mq_get_tag() is involved, and
getting driver tag isn't involved.

> at that moment. To summarize, anytime we add any new waiter to the
> waitqueue, all tags are used and thus we should eventually receive enough
> wakeups to wake all of them. What am I missing?

When running the final retry(__blk_mq_get_tag) before
sleeping(io_schedule()) in blk_mq_get_tag(), the sleeper has been added to
wait queue.

So when two completion batch comes, the two may wake up same wq because
same ->wake_index can be observed from two completion path, and both two
wake_up_nr() can return > 0 because adding sleeper into wq and wake_up_nr()
can be interleaved, then 16 completions just wakeup 2 sleepers added to
same wq.

If the story happens on one wq with >= 8 sleepers, io hang will be
triggered, if there are another two pending wq.


Thanks, 
Ming
Jan Kara Aug. 8, 2023, 10:30 a.m. UTC | #11
On Tue 08-08-23 16:18:50, Ming Lei wrote:
> On Wed, Aug 02, 2023 at 06:05:53PM +0200, Jan Kara wrote:
> > On Fri 21-07-23 17:57:15, Ming Lei wrote:
> > > From: David Jeffery <djeffery@redhat.com>
> > > 
> > > Current code supposes that it is enough to provide forward progress by just
> > > waking up one wait queue after one completion batch is done.
> > > 
> > > Unfortunately this way isn't enough, cause waiter can be added to
> > > wait queue just after it is woken up.
> > > 
> > > Follows one example(64 depth, wake_batch is 8)
> > > 
> > > 1) all 64 tags are active
> > > 
> > > 2) in each wait queue, there is only one single waiter
> > > 
> > > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > > queue, then immediately one new sleeper is added to this wait queue
> > > 
> > > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > > wait queue
> > > 
> > > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > > can't be waken up anymore.
> > > 
> > > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > > single batch.
> > > 
> > > Cc: David Jeffery <djeffery@redhat.com>
> > > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > 
> > I'm sorry for the delay - I was on vacation. I can see the patch got
> > already merged and I'm not strictly against that (although I think Gabriel
> > was experimenting with this exact wakeup scheme and as far as I remember
> > the more eager waking up was causing performance decrease for some
> > configurations). But let me challenge the analysis above a bit. For the
> > sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
> > fail the blk_mq_get_driver_tag() call. Which means that all tags were used
> 
> Here only allocating request by blk_mq_get_tag() is involved, and
> getting driver tag isn't involved.
> 
> > at that moment. To summarize, anytime we add any new waiter to the
> > waitqueue, all tags are used and thus we should eventually receive enough
> > wakeups to wake all of them. What am I missing?
> 
> When running the final retry(__blk_mq_get_tag) before
> sleeping(io_schedule()) in blk_mq_get_tag(), the sleeper has been added to
> wait queue.
> 
> So when two completion batch comes, the two may wake up same wq because
> same ->wake_index can be observed from two completion path, and both two
> wake_up_nr() can return > 0 because adding sleeper into wq and wake_up_nr()
> can be interleaved, then 16 completions just wakeup 2 sleepers added to
> same wq.
> 
> If the story happens on one wq with >= 8 sleepers, io hang will be
> triggered, if there are another two pending wq.

OK, thanks for explanation! I think I see the problem now.

								Honza
Kemeng Shi Jan. 15, 2024, 9:51 a.m. UTC | #12
on 7/21/2023 5:57 PM, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
Sorry for missed this. The change looks good and makes code simpler. But the
issue is not supposed to heppen in real world.
> Follows one example(64 depth, wake_batch is 8)
> 
> 1) all 64 tags are active
> 
> 2) in each wait queue, there is only one single waiter
> 
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
> 
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue>
As we only wait when all tags are consumed (After sbitmap_prepare_to_wait,
we will try sbitmap_queue_get before sleep), there will be 64 active tags again
when any new sleeper is added. Then the menthioned issue should not exist.
If there was no any gain with old way to wakeup, this looks good. Otherwise
we may need to reconsider this.
Wish this helps, thanks!
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
> 
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.
> 
> Cc: David Jeffery <djeffery@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  lib/sbitmap.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index eff4e42c425a..d0a5081dfd12 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -550,7 +550,7 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
>  
>  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  {
> -	int i, wake_index;
> +	int i, wake_index, woken;
>  
>  	if (!atomic_read(&sbq->ws_active))
>  		return;
> @@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  		 */
>  		wake_index = sbq_index_inc(wake_index);
>  
> -		/*
> -		 * It is sufficient to wake up at least one waiter to
> -		 * guarantee forward progress.
> -		 */
> -		if (waitqueue_active(&ws->wait) &&
> -		    wake_up_nr(&ws->wait, nr))
> -			break;
> +		if (waitqueue_active(&ws->wait)) {
> +			woken = wake_up_nr(&ws->wait, nr);
> +			if (woken == nr)
> +				break;
> +			nr -= woken;
> +		}
>  	}
>  
>  	if (wake_index != atomic_read(&sbq->wake_index))
>
diff mbox series

Patch

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index eff4e42c425a..d0a5081dfd12 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -550,7 +550,7 @@  EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
 
 static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
 {
-	int i, wake_index;
+	int i, wake_index, woken;
 
 	if (!atomic_read(&sbq->ws_active))
 		return;
@@ -567,13 +567,12 @@  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
 		 */
 		wake_index = sbq_index_inc(wake_index);
 
-		/*
-		 * It is sufficient to wake up at least one waiter to
-		 * guarantee forward progress.
-		 */
-		if (waitqueue_active(&ws->wait) &&
-		    wake_up_nr(&ws->wait, nr))
-			break;
+		if (waitqueue_active(&ws->wait)) {
+			woken = wake_up_nr(&ws->wait, nr);
+			if (woken == nr)
+				break;
+			nr -= woken;
+		}
 	}
 
 	if (wake_index != atomic_read(&sbq->wake_index))