From patchwork Wed Oct 11 17:39:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10000145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E067C602BF for ; Wed, 11 Oct 2017 17:39:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B2E5422230 for ; Wed, 11 Oct 2017 17:39:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A43A827165; Wed, 11 Oct 2017 17:39:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 232BE22230 for ; Wed, 11 Oct 2017 17:39:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751994AbdJKRjZ (ORCPT ); Wed, 11 Oct 2017 13:39:25 -0400 Received: from mail-pg0-f50.google.com ([74.125.83.50]:49242 "EHLO mail-pg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751487AbdJKRjY (ORCPT ); Wed, 11 Oct 2017 13:39:24 -0400 Received: by mail-pg0-f50.google.com with SMTP id v13so1372133pgq.6 for ; Wed, 11 Oct 2017 10:39:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=FYrwFKM/OMwrkAbt+n7fVyweFddoiTH9uFXjgWvs8cI=; b=Z7nfIGC04/N/9CCgu5nbmDtnj6UiKHjW4m1tf346NHAqAxAd0bOU2kFdkIPSk5AE8B ZQO3bO93p+OtzyAWGdl6nzaaLmfMK76Eb/3jK+fBvwkNGNjXllup1XYHGn1WkfcyJ1A0 7aU432lFis7Q4xgJ0AJEsc9vCdIEJPDHrtqcj2sw9EC00o+YGFwLDeldGVzB8CD+EdQH Nrnr8Rt/ka2bqYpbDWSQ89tgo7kJjytGmAyVPNeTcuOaac/XI8t8gw5Y2A7oJQEETy8z O00hSGEvKu/y5lcwrCGp+GOVj4RizflXpPoeT9kfklKLVyrcv8maYaJBOR16TwFiKi3G 3fyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=FYrwFKM/OMwrkAbt+n7fVyweFddoiTH9uFXjgWvs8cI=; b=aS28aMUJYgDkxnnsD+Yu3j4v4ujED5ESNK/+XvAbGfVfEGUMxzfbZNkGl6jOO9vPm+ Yi/gu1XIk56t6s6VpW8XuWUBFPtZXPolbRd3MlvzrqwcrgnJoBbztYdXKCMrcVYc+0bd leOgnCC1LN982IV2NCcbUTvdX4kqQEFFpetrQ+/60ibksrUrFuTUpQ6e5uyC2NDMNei/ P8bMUbOatwxVnun9vW94PxOnavYhJq4/PJZlHaszvMMn1WPZ38SbiHaV0zccarfUksZY zxmJJMdHDLzoS4GocFSgo+Q7bKXQa6MADuyfKDskbEv8pf25OZlLf9Va9zoOHH2oII3p FvSQ== X-Gm-Message-State: AMCzsaXVJK/IMBFDTF4n5QQb7j5i/sZeYCTQ6mgCaOKL495KjeA/ySv+ rX8ksJCeFvu68YEpoSbWFSEeQ6bMhTY= X-Google-Smtp-Source: AOwi7QA2t7sjgsNqgB1x4avLAg252GxSLoHaCq+Q4Da/f6pciOwZbRSQIFLJqRGjxKNmotw146x98w== X-Received: by 10.99.185.71 with SMTP id v7mr311008pgo.24.1507743563621; Wed, 11 Oct 2017 10:39:23 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:180::1:c5bd]) by smtp.gmail.com with ESMTPSA id t19sm24804784pfa.73.2017.10.11.10.39.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Oct 2017 10:39:23 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: kernel-team@fb.com, Bin Zha Subject: [PATCH] kyber: fix hang on domain token wait queue Date: Wed, 11 Oct 2017 10:39:15 -0700 Message-Id: <0c81254e266944c463e7dc12da15cc99edc466b8.1507743307.git.osandov@fb.com> X-Mailer: git-send-email 2.14.2 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval When we're getting a domain token, if we fail to get a token on our first attempt, we put the current hardware queue on a wait queue and then try again just in case a token was freed after our initial attempt but before we got on the wait queue. If this second attempt succeeds, we currently leave the hardware queue on the wait queue. Usually this is okay; we'll just run the hardware queue one extra time when another token is freed. However, if the hardware queue doesn't have any other requests waiting, then when it it gets the extra wakeup, it won't have anything to free and therefore won't wake up any other hardware queues. If tokens are limited, then we won't make forward progress and the device will hang. Reported-by: Bin Zha Signed-off-by: Omar Sandoval --- Based on Bin Zha's patch with an added comment and open-coded remove_wait_queue() using list_del_init() instead of doing a INIT_LIST_HEAD() after the wait queue lock has been dropped. Based on v4.14-rc4. block/kyber-iosched.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index f58cab82105b..db5bfc6342d3 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -541,9 +541,17 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd, /* * Try again in case a token was freed before we got on the wait - * queue. + * queue. The waker may have already removed the entry from the + * wait queue, but list_del_init() is okay with that. */ nr = __sbitmap_queue_get(domain_tokens); + if (nr >= 0) { + unsigned long flags; + + spin_lock_irqsave(&ws->wait.lock, flags); + list_del_init(&wait->entry); + spin_unlock_irqrestore(&ws->wait.lock, flags); + } } return nr; }