From patchwork Fri Jun 22 22:51:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10483263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id F1E7460383 for ; Fri, 22 Jun 2018 22:51:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E120028FF3 for ; Fri, 22 Jun 2018 22:51:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D597F29065; Fri, 22 Jun 2018 22:51:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 363BA28FF3 for ; Fri, 22 Jun 2018 22:51:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754560AbeFVWvd (ORCPT ); Fri, 22 Jun 2018 18:51:33 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:33961 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754434AbeFVWvb (ORCPT ); Fri, 22 Jun 2018 18:51:31 -0400 Received: by mail-it0-f67.google.com with SMTP id y127-v6so6738643itd.1 for ; Fri, 22 Jun 2018 15:51:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=/qGTTHGyccb7JUrHz1UXvnLvcqs175w83fzTcUJHiPU=; b=LKCF0gJqt1e1nVay8ja0/Vuq2o5fqbwrO+w0Um6lq2gegyZvbJ/eUbUOXMAH1W6nCH yY/VJmrD+YgVlkSzAw+/LDe42JV38Kk/Np4PEXi+TVqDUJOGkg7dm/g3BQc+ZeOiazJ1 Qj1ibuGILchXr9eb+NsiciJ1xDVKmIYp/nJAdlbWrQ/kqIGB94kpeoY/r7cEyUBpcbr+ w6un+Zt/3bmyVlr3kptSKEk/ucX81h7mDss7FKFuBYCtDlxuU4K1eCzd5sTfNiYI2mXZ vu4K6+CVWA0pJ8GZ1spEdlPDqXxsZzBrvsHO/Ie5TNuoCAur4Q8NdI5IH3VUapfPPiiZ MjIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=/qGTTHGyccb7JUrHz1UXvnLvcqs175w83fzTcUJHiPU=; b=K2nJzXnqITiRoe+kPlRiqyv+xty6XCZzW58JekuJXYCqy2Ix7zKXeECOtEkripA2ec xl48VQq50NgtwBiYP6wzfPvIlqpoWgGMzeKSkseA4/sQS2cpI992R2BAgFXEOTmpvSR5 Pu39C4oNzBh0C8PIvnJ6YSJWWaYoecBu5Qytgcezfg/OjGBgiawpAR40V9jli2CTk+e1 B46I/Te2vSBB7K1wIa6BRcR5Qk78P+eVyg36I7Rh/DM86wg+QkFUoOMDuaa7dzW3SpZs HDAOd3faHy/M5PYgqZ1cmMtKTnCV8OwHB9xSdBMbU0qbbXHaV03saiqKBmnRlKmGwHJp lArg== X-Gm-Message-State: APt69E0X9zneSHua76bpqNGw0F/CIiLjTAuVMZQ/BN1bDvW8yuE8d7ff AJH4d+Yoet21l34GnALaSH2VTR3/564= X-Google-Smtp-Source: ADUXVKKlze9Fmsj5Mos2Vj/l+Oet7etQrvb06aH3f+HRHzMGgX2Ef+d5QKAYQBQdq/Uy3F+FvesHVA== X-Received: by 2002:a24:1848:: with SMTP id 69-v6mr2927831itr.57.1529707889902; Fri, 22 Jun 2018 15:51:29 -0700 (PDT) Received: from ?IPv6:2620:10d:c081:1130::1222? ([2620:10d:c090:180::1:60d8]) by smtp.gmail.com with ESMTPSA id w193-v6sm1605273ita.35.2018.06.22.15.51.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Jun 2018 15:51:28 -0700 (PDT) Subject: Re: [PATCH] blk-wbt: fix indefinite background writeback sleep To: Ming Lei Cc: "linux-block@vger.kernel.org" References: <0fa9cc9f-7d34-2490-cdaf-fe7a761e4790@kernel.dk> <20180622224301.GD2949@ming.t460p> From: Jens Axboe Message-ID: <052ecf61-d38b-5750-0cb0-a40ad60c33ce@kernel.dk> Date: Fri, 22 Jun 2018 16:51:26 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <20180622224301.GD2949@ming.t460p> Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 6/22/18 4:43 PM, Ming Lei wrote: > On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote: >> blk-wbt adds waiters to the tail of the waitqueue, and factors in the >> task placement in its decision making on whether or not the current task >> can proceed. This can cause issues for the lowest class of writers, >> since they can get woken up, denied access, and then put back to sleep >> at the end of the waitqueue. >> >> Fix this so that we only utilize the tail add for the initial sleep, and >> we don't factor in the wait queue placement after we've slept (and are >> now using the head addition). >> >> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") >> Signed-off-by: Jens Axboe >> >> diff --git a/block/blk-wbt.c b/block/blk-wbt.c >> index 4f89b28fa652..7beeabd05f4a 100644 >> --- a/block/blk-wbt.c >> +++ b/block/blk-wbt.c >> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw, >> * If the waitqueue is already active and we are not the next >> * in line to be woken up, wait for our turn. >> */ >> - if (waitqueue_active(&rqw->wait) && >> + if (wait && waitqueue_active(&rqw->wait) && >> rqw->wait.head.next != &wait->entry) >> return false; >> >> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct, >> __acquires(lock) >> { >> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct); >> + struct wait_queue_entry *waitptr = NULL; >> DEFINE_WAIT(wait); >> >> - if (may_queue(rwb, rqw, &wait, rw)) >> + if (may_queue(rwb, rqw, waitptr, rw)) >> return; >> >> + waitptr = &wait; >> do { >> - prepare_to_wait_exclusive(&rqw->wait, &wait, >> + /* >> + * Don't add ourselves to the wq tail if we've already >> + * slept. Otherwise we can penalize background writes >> + * indefinitely. >> + */ > > I saw this indefinite wbt_wait() in systemd-journal when running > aio-stress read test, but just once, not figured out one approach > to reproduce it yet, just wondering if you have quick test case for > reproducing and verifying this issue. I've seen it in production, but I'm currently relying on someone else to reproduce it synthetically. I'm just providing the patches for testing. >> + if (waitptr) >> + prepare_to_wait_exclusive(&rqw->wait, &wait, >> + TASK_UNINTERRUPTIBLE); >> + else >> + prepare_to_wait(&rqw->wait, &wait, >> TASK_UNINTERRUPTIBLE); > > Could you explain a bit why the 'wait_entry' order matters wrt. this > issue? Since other 'wait_entry' still may come at the head meantime to > the same wq before checking in may_queue(). Let's say we have 10 tasks queued up. Each one gets added to the tail, so when it's our turn, we've now reached the head. We fail to get a queue token, so we go back to sleep. At that point we should add back to the head, not the tail, for fairness purposes. > Can we remove 'rqw->wait.head.next != &wait->entry' from may_queue()? > I guess that should be one optimization, but seems quite dangerous since > 'rqw->wait.head.next' may point to one freed stack variable. I actually changed it somewhat, since the initial check wasn't great still. See below. It doesn't matter if the contents are stale, we aren't going to dereference them anyway. commit e68d8e22a8b9712c47ead489394f75e9df5a02d1 Author: Jens Axboe Date: Fri Jun 22 13:44:22 2018 -0600 blk-wbt: fix indefinite background writeback sleep blk-wbt adds waiters to the tail of the waitqueue, and factors in the task placement in its decision making on whether or not the current task can proceed. This can cause issues for the lowest class of writers, since they can get woken up, denied access, and then put back to sleep at the end of the waitqueue. Fix this so that we only utilize the tail add for the initial sleep, and we don't factor in the wait queue placement after we've slept (and are now using the head addition). Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") Signed-off-by: Jens Axboe diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 4f89b28fa652..41607c0bd849 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -550,8 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw, * If the waitqueue is already active and we are not the next * in line to be woken up, wait for our turn. */ - if (waitqueue_active(&rqw->wait) && - rqw->wait.head.next != &wait->entry) + if (wait && rqw->wait.head.next != &wait->entry) return false; return atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)); @@ -567,16 +566,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct, __acquires(lock) { struct rq_wait *rqw = get_rq_wait(rwb, wb_acct); + struct wait_queue_entry *waitptr = NULL; DEFINE_WAIT(wait); - if (may_queue(rwb, rqw, &wait, rw)) + if (!waitqueue_active(&rqw->wait) && may_queue(rwb, rqw, waitptr, rw)) return; + waitptr = &wait; do { - prepare_to_wait_exclusive(&rqw->wait, &wait, + /* + * Don't add ourselves to the wq tail if we've already + * slept. Otherwise we can penalize background writes + * indefinitely. + */ + if (waitptr) + prepare_to_wait_exclusive(&rqw->wait, &wait, + TASK_UNINTERRUPTIBLE); + else + prepare_to_wait(&rqw->wait, &wait, TASK_UNINTERRUPTIBLE); - if (may_queue(rwb, rqw, &wait, rw)) + if (may_queue(rwb, rqw, waitptr, rw)) break; if (lock) { @@ -585,6 +595,12 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct, spin_lock_irq(lock); } else io_schedule(); + + /* + * After we've slept, we don't want to factor in wq head + * placement anymore for may_queue(). + */ + waitptr = NULL; } while (1); finish_wait(&rqw->wait, &wait);