From patchwork Tue Apr 30 07:34:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922937 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FA13933 for ; Tue, 30 Apr 2019 07:35:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECD35288BB for ; Tue, 30 Apr 2019 07:35:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E02A6288C4; Tue, 30 Apr 2019 07:35:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B4EA288BB for ; Tue, 30 Apr 2019 07:35:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726349AbfD3HfO (ORCPT ); Tue, 30 Apr 2019 03:35:14 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:41954 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726683AbfD3HfN (ORCPT ); Tue, 30 Apr 2019 03:35:13 -0400 Received: by mail-lf1-f65.google.com with SMTP id t30so9921366lfd.8; Tue, 30 Apr 2019 00:35:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4Akp5uxp13EweIV3YZW+2vy2S3lWmyRPWKOuymZaNBQ=; b=LFLixQ6b6+xa9/QZVqDDGA21oDHgPCKaF3ael1isibVX7xxiKAthhuYeKMRq8INoMY MrkjZsyED6OgcB5DbWY1NaE9vG0iRgMOpEwrXWZ/w7iKvNlHNzQccL94pJCJ9dw/akWa VydPi/RwXz7qmbQ5AR9CEzvO5iP81KQTvUhxarrrPKKwrl/toGTeguZiSzuiZ0gcIgij 7MY9SnM6umJ76UIBnegY4Q7haFT+ZMfwgNS26spdYL09UFp7uehv0FdUMX2itADK20IF 3ZlZpUydeBuHTdK/LThLpAOYG8n0bCYlG7e6Wzt5tw7XvcLtfb/D1n7jDqr7mYKHg4Jh ddkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4Akp5uxp13EweIV3YZW+2vy2S3lWmyRPWKOuymZaNBQ=; b=rCvZAEVrMRrN7tF1jKOQfR1zS5ho3t7WoSutVJbVcbhzMSKSikDA9BuPIJj0ciKf+J IpFmM+Nkp+ZYUgrnRf9fK0/tow92Psa08M+YCsZbfX+io/04+6cV4GhvyxcmXKcpq5bE 38k1c3JBRXYdUm42Zoq88evtAkA0IkkzYTdIEhcmsZsXfimkF88Mpjgxgh2ETBFxlSwG fT0ZGplBt6uaK+zgfFcfQFm5Qm+YQePogZGz1ySKcVTHDPMJnzu1yPZG4moIZWWjpTGg sjG0AEXmtembIelVlLvURPe1NTlUlSjD7PLQ22YhulRnLnQwtdXb64M/Y3dvCHI8NbeX uK2g== X-Gm-Message-State: APjAAAVRMjyS3TdOVNu6C4sJmOel9bBThefkz0bQm+8WErCppcXijS2c 4FkVZ0cf69LHZ0q1B4mII3o= X-Google-Smtp-Source: APXvYqzAVtaeLvJcIE/R+qJxl8ttLN6W3l69ULlOrb+R12MtLxhFyXnmQXwOHsclaEN/tVWSHOqa1w== X-Received: by 2002:ac2:5582:: with SMTP id v2mr9892379lfg.19.1556609710946; Tue, 30 Apr 2019 00:35:10 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.35.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:35:10 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 7/7] blk-mq: Adjust hybrid poll sleep time Date: Tue, 30 Apr 2019 10:34:19 +0300 Message-Id: <90ea71d810084eec70fb1632587b450b3037ce85.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov Sleep for (mean / 2) in the adaptive polling is often too pessimistic, use a variation of the 3-sigma rule (mean - 4 * lmd) and tune it in runtime using percentage of missed (i.e. overslept) requests: 1. if more than ~3% of requests are missed, then fallback to (mean / 2) 2. if more than ~0.4% is missed, then scale down Pitfalls: 1. any missed request increases the mean, synergistically increasing mean and sleep time, so, scale down fast in the case 2. even if the sleep time is predicted well, sleep loop could greatly oversleep by itself. Then try to detect it and skip the miss accounting. Tested on an NVMe SSD: {4K,8K} read-only workloads give similar latency distribution (up to 7 nines), and decreases CPU load twice (50% -> 25%). New method even outperform the old one a bit (in terms of throughput and latencies), presumably, because it alleviates the 2nd pitfall. For write-only workload it falls back to (mean / 2). Signed-off-by: Pavel Begunkov --- block/blk-mq.c | 44 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ec7cde754c2f..efa44a617bea 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3338,10 +3338,21 @@ static void blk_mq_poll_stats_start(struct request_queue *q) blk_stat_activate_msecs(q->poll_cb, 100); } +/* + * Thresholds are ilog2(nr_requests / nr_misses) + * To calculate tolerated miss ratio from it, use + * f(x) ~= 2 ^ -(x + 1) + * + * fallback ~ 3.1% + * throttle ~ 0.4% + */ +#define BLK_POLL_FALLBACK_THRESHOLD 4 +#define BLK_POLL_THROTTLE_THRESHOLD 7 + static void blk_mq_update_poll_info(struct poll_info *pi, struct blk_rq_stat *stat) { - u64 sleep_ns; + u64 half_mean, indent, sleep_ns; u32 nr_misses, nr_samples; nr_samples = stat->nr_samples; @@ -3349,14 +3360,33 @@ static void blk_mq_update_poll_info(struct poll_info *pi, if (nr_misses > nr_samples) nr_misses = nr_samples; - if (!nr_samples) + half_mean = (stat->mean + 1) / 2; + indent = stat->lmd * 4; + + if (!stat->nr_samples) { sleep_ns = 0; - else - sleep_ns = (stat->mean + 1) / 2; + } else if (!stat->lmd || stat->mean <= indent) { + sleep_ns = half_mean; + } else { + int ratio = INT_MAX; - /* - * Use miss ratio here to adjust sleep time - */ + sleep_ns = stat->mean - indent; + + /* + * If a completion is overslept, the observable time will + * be greater than the actual, so increasing mean. It + * also increases sleep time estimation, synergistically + * backfiring on mean. Need to scale down / fallback early. + */ + if (nr_misses) + ratio = ilog2(nr_samples / nr_misses); + if (ratio <= BLK_POLL_FALLBACK_THRESHOLD) + sleep_ns = half_mean; + else if (ratio <= BLK_POLL_THROTTLE_THRESHOLD) + sleep_ns -= sleep_ns / 4; + + sleep_ns = max(sleep_ns, half_mean); + } pi->stat = *stat; pi->sleep_ns = sleep_ns;