diff mbox

kyber: fix hang on domain token wait queue

Message ID 0c81254e266944c463e7dc12da15cc99edc466b8.1507743307.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval Oct. 11, 2017, 5:39 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

When we're getting a domain token, if we fail to get a token on our
first attempt, we put the current hardware queue on a wait queue and
then try again just in case a token was freed after our initial attempt
but before we got on the wait queue. If this second attempt succeeds, we
currently leave the hardware queue on the wait queue. Usually this is
okay; we'll just run the hardware queue one extra time when another
token is freed. However, if the hardware queue doesn't have any other
requests waiting, then when it it gets the extra wakeup, it won't have
anything to free and therefore won't wake up any other hardware queues.
If tokens are limited, then we won't make forward progress and the
device will hang.

Reported-by: Bin Zha <zhabin.zb@alibaba-inc.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Based on Bin Zha's patch with an added comment and open-coded
remove_wait_queue() using list_del_init() instead of doing a
INIT_LIST_HEAD() after the wait queue lock has been dropped.

Based on v4.14-rc4.

 block/kyber-iosched.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jens Axboe Oct. 17, 2017, 10:18 p.m. UTC | #1
On 10/11/2017 11:39 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> When we're getting a domain token, if we fail to get a token on our
> first attempt, we put the current hardware queue on a wait queue and
> then try again just in case a token was freed after our initial attempt
> but before we got on the wait queue. If this second attempt succeeds, we
> currently leave the hardware queue on the wait queue. Usually this is
> okay; we'll just run the hardware queue one extra time when another
> token is freed. However, if the hardware queue doesn't have any other
> requests waiting, then when it it gets the extra wakeup, it won't have
> anything to free and therefore won't wake up any other hardware queues.
> If tokens are limited, then we won't make forward progress and the
> device will hang.

Applied for 4.15, thanks Omar.
diff mbox

Patch

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index f58cab82105b..db5bfc6342d3 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -541,9 +541,17 @@  static int kyber_get_domain_token(struct kyber_queue_data *kqd,
 
 		/*
 		 * Try again in case a token was freed before we got on the wait
-		 * queue.
+		 * queue. The waker may have already removed the entry from the
+		 * wait queue, but list_del_init() is okay with that.
 		 */
 		nr = __sbitmap_queue_get(domain_tokens);
+		if (nr >= 0) {
+			unsigned long flags;
+
+			spin_lock_irqsave(&ws->wait.lock, flags);
+			list_del_init(&wait->entry);
+			spin_unlock_irqrestore(&ws->wait.lock, flags);
+		}
 	}
 	return nr;
 }