From patchwork Tue Apr 30 07:34:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C30081395 for ; Tue, 30 Apr 2019 07:35:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ACDD4288BB for ; Tue, 30 Apr 2019 07:35:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A153F288C4; Tue, 30 Apr 2019 07:35:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DA12288BB for ; Tue, 30 Apr 2019 07:35:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726537AbfD3He7 (ORCPT ); Tue, 30 Apr 2019 03:34:59 -0400 Received: from mail-lf1-f67.google.com ([209.85.167.67]:33263 "EHLO mail-lf1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725554AbfD3He6 (ORCPT ); Tue, 30 Apr 2019 03:34:58 -0400 Received: by mail-lf1-f67.google.com with SMTP id j11so10076026lfm.0; Tue, 30 Apr 2019 00:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HvAqw7tCZYei3xD/pwkIEpXrRiHrC/rYQbdBE+/n0iM=; b=pWNLw/KiQu+15z6cjskRAADRy26wd9hNr8NMve2g/524ip+o0+UD1vzVEA7pcBgFVS QopHfj5VHNqBZgg5NSqWYoX4CVVzpzMcTpD5FIFLc38MXLwNwpnxdbo/U8aUE0aNQbIp pLgYVpLbWex5VuLb0bxovHwLFjK5aXIbMFLzCDU2AIsPqccQkhbLrv9568dnaJfNx2Mb rm4rLcQ8pIFK+7NMOYlVKcWW7XgDeIRSWHac0zr5pc92URAJ3SCct24/teRE7j4Lyo0h tIMNJLdCRvebkmrw2tipXmVDkVsK6lUsqbjtAbSWbtzMcGETS5LRdcTTwtMxhSaWK+dW xghw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HvAqw7tCZYei3xD/pwkIEpXrRiHrC/rYQbdBE+/n0iM=; b=jdoOu2OZveJmoJEFu83uo6xanpRk31x3in/27gCY6Zea70XKeoxdCO+JB7MXpUmJHG llL4cLDKMWZ8lSYEuIe7nSFf0RFqozXPZmGQqL6f5DQ+gk2U06rBdioQEO4LtNLR/efr jxvCtE3uL7iL6zO5dhPwhpxNbrnmlrhSGfq7pYmRXq/3XFYaaLawrg+qYiS5tpnat3oN w6ZehcD0rFqpK/qsPOxquldclKQcqNy2KaL+VZhEe9ekploi7H2Ea6mxkLG6MDrCljmC KgJZt0z/fJYsEOVxqYsO1cK4eq37CotYeAWIIKjM+3SeEy6aCKLA6wYJs5RZu6F9FJXk aFJA== X-Gm-Message-State: APjAAAXh0qdTTJyQh2YAfQ2Oswf+ZqqYKupLK7cBmRQGeru8OYcTO74M /k4gmuOGHsN2hoX9/c3jxWU= X-Google-Smtp-Source: APXvYqwPSGPUKOjRpnOthFlA3apSAuoe7lRIrzeGAcYid8k9fBo0NDmZiw+Ic6fdPk2O7wNaqe5FGA== X-Received: by 2002:ac2:485a:: with SMTP id 26mr34567832lfy.23.1556609695248; Tue, 30 Apr 2019 00:34:55 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.34.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:34:54 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 1/7] blk-iolatency: Fix zero mean in previous stats Date: Tue, 30 Apr 2019 10:34:13 +0300 Message-Id: <897b1aaf6749a707b2190e138d2bf7ba22920082.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov blk_rq_stat_sum() expects src argument (struct blk_rq_stat) to have valid batch field and won't calculate it for dst. Thus, former dst shouldn't be used as an src arg. iolatency_check_latencies() violates that, making iolat->cur_stat.rqs.mean always to be 0 for non-ssd devices. Use 2 distinct functions instead: one to collect intermediate stats (i.e. with valid batch), and the second one for merging already accumulated stats (i.e. with valid mean). Signed-off-by: Pavel Begunkov --- block/blk-iolatency.c | 21 ++++++++++++++++----- block/blk-stat.c | 20 ++++++++++++++++++-- block/blk-stat.h | 3 ++- 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c index 507212d75ee2..4010152ebeb2 100644 --- a/block/blk-iolatency.c +++ b/block/blk-iolatency.c @@ -198,7 +198,7 @@ static inline void latency_stat_init(struct iolatency_grp *iolat, blk_rq_stat_init(&stat->rqs); } -static inline void latency_stat_sum(struct iolatency_grp *iolat, +static inline void latency_stat_merge(struct iolatency_grp *iolat, struct latency_stat *sum, struct latency_stat *stat) { @@ -206,7 +206,18 @@ static inline void latency_stat_sum(struct iolatency_grp *iolat, sum->ps.total += stat->ps.total; sum->ps.missed += stat->ps.missed; } else - blk_rq_stat_sum(&sum->rqs, &stat->rqs); + blk_rq_stat_merge(&sum->rqs, &stat->rqs); +} + +static inline void latency_stat_collect(struct iolatency_grp *iolat, + struct latency_stat *sum, + struct latency_stat *stat) +{ + if (iolat->ssd) { + sum->ps.total += stat->ps.total; + sum->ps.missed += stat->ps.missed; + } else + blk_rq_stat_collect(&sum->rqs, &stat->rqs); } static inline void latency_stat_record_time(struct iolatency_grp *iolat, @@ -530,7 +541,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now) for_each_online_cpu(cpu) { struct latency_stat *s; s = per_cpu_ptr(iolat->stats, cpu); - latency_stat_sum(iolat, &stat, s); + latency_stat_collect(iolat, &stat, s); latency_stat_init(iolat, s); } preempt_enable(); @@ -551,7 +562,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now) /* Somebody beat us to the punch, just bail. */ spin_lock_irqsave(&lat_info->lock, flags); - latency_stat_sum(iolat, &iolat->cur_stat, &stat); + latency_stat_merge(iolat, &iolat->cur_stat, &stat); lat_info->nr_samples -= iolat->nr_samples; lat_info->nr_samples += latency_stat_samples(iolat, &iolat->cur_stat); iolat->nr_samples = latency_stat_samples(iolat, &iolat->cur_stat); @@ -912,7 +923,7 @@ static size_t iolatency_ssd_stat(struct iolatency_grp *iolat, char *buf, for_each_online_cpu(cpu) { struct latency_stat *s; s = per_cpu_ptr(iolat->stats, cpu); - latency_stat_sum(iolat, &stat, s); + latency_stat_collect(iolat, &stat, s); } preempt_enable(); diff --git a/block/blk-stat.c b/block/blk-stat.c index 696a04176e4d..a6da68af45db 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -25,7 +25,7 @@ void blk_rq_stat_init(struct blk_rq_stat *stat) } /* src is a per-cpu stat, mean isn't initialized */ -void blk_rq_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src) +void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src) { if (!src->nr_samples) return; @@ -39,6 +39,21 @@ void blk_rq_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src) dst->nr_samples += src->nr_samples; } +void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src) +{ + if (!src->nr_samples) + return; + + dst->min = min(dst->min, src->min); + dst->max = max(dst->max, src->max); + + dst->mean = div_u64(src->mean * src->nr_samples + + dst->mean * dst->nr_samples, + dst->nr_samples + src->nr_samples); + + dst->nr_samples += src->nr_samples; +} + void blk_rq_stat_add(struct blk_rq_stat *stat, u64 value) { stat->min = min(stat->min, value); @@ -89,7 +104,8 @@ static void blk_stat_timer_fn(struct timer_list *t) cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu); for (bucket = 0; bucket < cb->buckets; bucket++) { - blk_rq_stat_sum(&cb->stat[bucket], &cpu_stat[bucket]); + blk_rq_stat_collect(&cb->stat[bucket], + &cpu_stat[bucket]); blk_rq_stat_init(&cpu_stat[bucket]); } } diff --git a/block/blk-stat.h b/block/blk-stat.h index 17b47a86eefb..5597ecc34ef5 100644 --- a/block/blk-stat.h +++ b/block/blk-stat.h @@ -165,7 +165,8 @@ static inline void blk_stat_activate_msecs(struct blk_stat_callback *cb, } void blk_rq_stat_add(struct blk_rq_stat *, u64); -void blk_rq_stat_sum(struct blk_rq_stat *, struct blk_rq_stat *); +void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src); +void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src); void blk_rq_stat_init(struct blk_rq_stat *); #endif From patchwork Tue Apr 30 07:34:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922945 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99EDD1395 for ; Tue, 30 Apr 2019 07:35:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8462F288BB for ; Tue, 30 Apr 2019 07:35:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 78617288C4; Tue, 30 Apr 2019 07:35:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C7A1C288BB for ; Tue, 30 Apr 2019 07:35:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726560AbfD3HfB (ORCPT ); Tue, 30 Apr 2019 03:35:01 -0400 Received: from mail-lj1-f193.google.com ([209.85.208.193]:37900 "EHLO mail-lj1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbfD3HfA (ORCPT ); Tue, 30 Apr 2019 03:35:00 -0400 Received: by mail-lj1-f193.google.com with SMTP id e18so5901599lja.5; Tue, 30 Apr 2019 00:34:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ruXerJK2lhLfErL9M//JhKigc+Dy/ckGxlO5z13UynM=; b=Mr3JRRzHKfelKtj8fAoavSGgBRd76y/y3JQBCRUmTCafh5CePCG/3FRLFgY4lgaCza FRVzFlB0K+G9QHeXVTblcHA2g7zT47p4rOIPG8ZLphAdobVESZ+iI/2WnqLt4RmRjndg MQhmvVXLiIKNwGuGbGBmoVbXU7c66qYNlc5hwB3jYps9kEYzHzqT5jGd2Uouhe+Ovr7J dMJWH2rcQOxhiU706dWq570JDYIA8pTL5nPL+mo8DFqys9sTfYK1N4/5zB2jobiwrCZF cLroWrmzLoe2fqJDXtsRcHhHpS2FU2zl12XdynU4QNJgYNvsFwpqETViCtzFi1yty/9I Xb8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ruXerJK2lhLfErL9M//JhKigc+Dy/ckGxlO5z13UynM=; b=U2m7oS0RcW4Dv7uh323xOX01r7m86R9rnvbE+puGgcgGutYg3DGfXsYRAyAL00/3Lx oAJlcAC3jdUwuY2U5M6ZFZ0/zc9TRG7VmPCpF3aFiiKolRngcBAEi01G96mCa3Z81zi2 MNCvZj1jvktT3ohBFeTRowZvb7f5K1L5rFOGqaMjDjw3xlVOclR5HhcJ7EpCO69ZLjU9 Ra1KcWbqq8v9x5xcdL+vUkMjyEpcfpXZATxRjT7mhTksB1vlzvTy5jRUY36wtYU7I+gY Wv1izvUORqFfgB1N9NroqsIok+MIHwLYlwm4TyJ5rZY/7oiLY9S2xupIqhHQLQCkXtNo 1Ldw== X-Gm-Message-State: APjAAAWObh/CS3DeYM/XlGUkYVR/OsYQtqmWooFCznCwxb/7yLwdOAcf Wtl4Sk8AKrCq8b5hAkxvx7DwQlg2 X-Google-Smtp-Source: APXvYqxPA9tkDZl7eYGKLwLIMT4rxKqxM9LJESFUlBnOdLRTO8hblIZcC2jLpQO9F8erxexgRiKTjQ== X-Received: by 2002:a2e:96cf:: with SMTP id d15mr16247037ljj.66.1556609697396; Tue, 30 Apr 2019 00:34:57 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.34.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:34:56 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 2/7] blk-stats: Introduce explicit stat staging buffers Date: Tue, 30 Apr 2019 10:34:14 +0300 Message-Id: <702524f38b2705c98e16ada0db9cc6f7eff40fc2.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov struct blk_rq_stat could be in one of two implicit states, which use different set of fields: 1. per-cpu intermediate (i.e. staging) (keep batch, invalid mean) 2. calculated stats (see blk_rq_stat_collect) (w/o batch, w/ mean) blk_rq_stat_*() expect their arguments to be in the right state, and it is not documented in which. That's error prone. Split blk_rq_stat into 2 structs corresponding to one of the states. That requires some code duplication, but 1. prevents misuses (compile-time type-system check) 2. reduces memory needed 3. makes it easier to extend stats Signed-off-by: Pavel Begunkov --- block/blk-iolatency.c | 41 +++++++++++++++++++++++++++++---------- block/blk-stat.c | 30 +++++++++++++++++----------- block/blk-stat.h | 8 +++++--- include/linux/blk_types.h | 6 ++++++ 4 files changed, 61 insertions(+), 24 deletions(-) diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c index 4010152ebeb2..df9d37398a0f 100644 --- a/block/blk-iolatency.c +++ b/block/blk-iolatency.c @@ -129,9 +129,16 @@ struct latency_stat { }; }; +struct latency_stat_staging { + union { + struct percentile_stats ps; + struct blk_rq_stat_staging rqs; + }; +}; + struct iolatency_grp { struct blkg_policy_data pd; - struct latency_stat __percpu *stats; + struct latency_stat_staging __percpu *stats; struct latency_stat cur_stat; struct blk_iolatency *blkiolat; struct rq_depth rq_depth; @@ -198,6 +205,16 @@ static inline void latency_stat_init(struct iolatency_grp *iolat, blk_rq_stat_init(&stat->rqs); } +static inline void latency_stat_init_staging(struct iolatency_grp *iolat, + struct latency_stat_staging *stat) +{ + if (iolat->ssd) { + stat->ps.total = 0; + stat->ps.missed = 0; + } else + blk_rq_stat_init_staging(&stat->rqs); +} + static inline void latency_stat_merge(struct iolatency_grp *iolat, struct latency_stat *sum, struct latency_stat *stat) @@ -211,7 +228,7 @@ static inline void latency_stat_merge(struct iolatency_grp *iolat, static inline void latency_stat_collect(struct iolatency_grp *iolat, struct latency_stat *sum, - struct latency_stat *stat) + struct latency_stat_staging *stat) { if (iolat->ssd) { sum->ps.total += stat->ps.total; @@ -223,7 +240,8 @@ static inline void latency_stat_collect(struct iolatency_grp *iolat, static inline void latency_stat_record_time(struct iolatency_grp *iolat, u64 req_time) { - struct latency_stat *stat = get_cpu_ptr(iolat->stats); + struct latency_stat_staging *stat = get_cpu_ptr(iolat->stats); + if (iolat->ssd) { if (req_time >= iolat->min_lat_nsec) stat->ps.missed++; @@ -539,10 +557,11 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now) latency_stat_init(iolat, &stat); preempt_disable(); for_each_online_cpu(cpu) { - struct latency_stat *s; + struct latency_stat_staging *s; + s = per_cpu_ptr(iolat->stats, cpu); latency_stat_collect(iolat, &stat, s); - latency_stat_init(iolat, s); + latency_stat_init_staging(iolat, s); } preempt_enable(); @@ -921,7 +940,8 @@ static size_t iolatency_ssd_stat(struct iolatency_grp *iolat, char *buf, latency_stat_init(iolat, &stat); preempt_disable(); for_each_online_cpu(cpu) { - struct latency_stat *s; + struct latency_stat_staging *s; + s = per_cpu_ptr(iolat->stats, cpu); latency_stat_collect(iolat, &stat, s); } @@ -965,8 +985,8 @@ static struct blkg_policy_data *iolatency_pd_alloc(gfp_t gfp, int node) iolat = kzalloc_node(sizeof(*iolat), gfp, node); if (!iolat) return NULL; - iolat->stats = __alloc_percpu_gfp(sizeof(struct latency_stat), - __alignof__(struct latency_stat), gfp); + iolat->stats = __alloc_percpu_gfp(sizeof(struct latency_stat_staging), + __alignof__(struct latency_stat_staging), gfp); if (!iolat->stats) { kfree(iolat); return NULL; @@ -989,9 +1009,10 @@ static void iolatency_pd_init(struct blkg_policy_data *pd) iolat->ssd = false; for_each_possible_cpu(cpu) { - struct latency_stat *stat; + struct latency_stat_staging *stat; + stat = per_cpu_ptr(iolat->stats, cpu); - latency_stat_init(iolat, stat); + latency_stat_init_staging(iolat, stat); } latency_stat_init(iolat, &iolat->cur_stat); diff --git a/block/blk-stat.c b/block/blk-stat.c index a6da68af45db..13f93249fd5f 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -17,15 +17,22 @@ struct blk_queue_stats { bool enable_accounting; }; +void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat) +{ + stat->min = -1ULL; + stat->max = 0; + stat->batch = 0; + stat->nr_samples = 0; +} + void blk_rq_stat_init(struct blk_rq_stat *stat) { stat->min = -1ULL; stat->max = stat->nr_samples = stat->mean = 0; - stat->batch = 0; } -/* src is a per-cpu stat, mean isn't initialized */ -void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src) +void blk_rq_stat_collect(struct blk_rq_stat *dst, + struct blk_rq_stat_staging *src) { if (!src->nr_samples) return; @@ -54,7 +61,7 @@ void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src) dst->nr_samples += src->nr_samples; } -void blk_rq_stat_add(struct blk_rq_stat *stat, u64 value) +void blk_rq_stat_add(struct blk_rq_stat_staging *stat, u64 value) { stat->min = min(stat->min, value); stat->max = max(stat->max, value); @@ -66,7 +73,7 @@ void blk_stat_add(struct request *rq, u64 now) { struct request_queue *q = rq->q; struct blk_stat_callback *cb; - struct blk_rq_stat *stat; + struct blk_rq_stat_staging *stat; int bucket; u64 value; @@ -100,13 +107,13 @@ static void blk_stat_timer_fn(struct timer_list *t) blk_rq_stat_init(&cb->stat[bucket]); for_each_online_cpu(cpu) { - struct blk_rq_stat *cpu_stat; + struct blk_rq_stat_staging *cpu_stat; cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu); for (bucket = 0; bucket < cb->buckets; bucket++) { blk_rq_stat_collect(&cb->stat[bucket], &cpu_stat[bucket]); - blk_rq_stat_init(&cpu_stat[bucket]); + blk_rq_stat_init_staging(&cpu_stat[bucket]); } } @@ -130,8 +137,9 @@ blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *), kfree(cb); return NULL; } - cb->cpu_stat = __alloc_percpu(buckets * sizeof(struct blk_rq_stat), - __alignof__(struct blk_rq_stat)); + cb->cpu_stat = __alloc_percpu( + buckets * sizeof(struct blk_rq_stat_staging), + __alignof__(struct blk_rq_stat_staging)); if (!cb->cpu_stat) { kfree(cb->stat); kfree(cb); @@ -154,11 +162,11 @@ void blk_stat_add_callback(struct request_queue *q, int cpu; for_each_possible_cpu(cpu) { - struct blk_rq_stat *cpu_stat; + struct blk_rq_stat_staging *cpu_stat; cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu); for (bucket = 0; bucket < cb->buckets; bucket++) - blk_rq_stat_init(&cpu_stat[bucket]); + blk_rq_stat_init_staging(&cpu_stat[bucket]); } spin_lock(&q->stats->lock); diff --git a/block/blk-stat.h b/block/blk-stat.h index 5597ecc34ef5..e5c753fbd6e6 100644 --- a/block/blk-stat.h +++ b/block/blk-stat.h @@ -30,7 +30,7 @@ struct blk_stat_callback { /** * @cpu_stat: Per-cpu statistics buckets. */ - struct blk_rq_stat __percpu *cpu_stat; + struct blk_rq_stat_staging __percpu *cpu_stat; /** * @bucket_fn: Given a request, returns which statistics bucket it @@ -164,9 +164,11 @@ static inline void blk_stat_activate_msecs(struct blk_stat_callback *cb, mod_timer(&cb->timer, jiffies + msecs_to_jiffies(msecs)); } -void blk_rq_stat_add(struct blk_rq_stat *, u64); -void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src); +void blk_rq_stat_add(struct blk_rq_stat_staging *stat, u64); +void blk_rq_stat_collect(struct blk_rq_stat *dst, + struct blk_rq_stat_staging *src); void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src); void blk_rq_stat_init(struct blk_rq_stat *); +void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat); #endif diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 791fee35df88..5718a4e2e731 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -446,7 +446,13 @@ struct blk_rq_stat { u64 min; u64 max; u32 nr_samples; +}; + +struct blk_rq_stat_staging { + u64 min; + u64 max; u64 batch; + u32 nr_samples; }; #endif /* __LINUX_BLK_TYPES_H */ From patchwork Tue Apr 30 07:34:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ACF52933 for ; Tue, 30 Apr 2019 07:35:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 989DA288BB for ; Tue, 30 Apr 2019 07:35:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8CEFC288C4; Tue, 30 Apr 2019 07:35:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3ECD0288BB for ; Tue, 30 Apr 2019 07:35:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726583AbfD3HfE (ORCPT ); Tue, 30 Apr 2019 03:35:04 -0400 Received: from mail-lf1-f66.google.com ([209.85.167.66]:40259 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725554AbfD3HfC (ORCPT ); Tue, 30 Apr 2019 03:35:02 -0400 Received: by mail-lf1-f66.google.com with SMTP id o16so10040046lfl.7; Tue, 30 Apr 2019 00:35:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HtnRUiYbdvR5AVRauOVFfKBp7TPmzvmFc+xKArpska4=; b=gALhrWtxTm42YAFjHAQoNmJyyM6VIqPqDM35LxPpRh6msiZO8A9AENG/JE7I258FcL B7zgtK0o+dAwJG0srni8Ug0In9dmatRM0jNjJ307iLsyKhBRefOznXoSsgx2qY1JHj/r xLNI5WzOIwfiU6kM7qNNSlDR3o5gRy7jzYD76Lim+7cnn/DxajEVLqMiKJAasjdu2W+l DAFGsneyKVj/KQRi0XY85xu4iF1Jq+SUoN5jVcsO2UTJP9GlciDZLWOjFwdD/YHsETbO MUOfa3+w91RvettsRzazNMIvJ9qnhoSX8ORNZDB+L9E2mVAMwMAQXwshh2fURV1PC//O Szpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HtnRUiYbdvR5AVRauOVFfKBp7TPmzvmFc+xKArpska4=; b=ALqtSr8J2AAZWmX0eF85VQAXqaR4ARIeXqBRO7nVzngfLNAvTHmxMVP1RkSBxV39LT WhF08k958eT97SyAnB4V73mG3J2iptYoGX3uRntY1V6GQVWbW3tLS9g8SIcOkBS/l6kz +EmwgxyAC4NFCJt12LJpal0Tu3/2PpGhHg6OnxgyoYeF223ntM32dzM87TPKfVhOY9k0 d+MSX6AWZgXmLM12hZvCPy1zj71jfOdX+4vtk2gPIYF4lKuaXVOQF7nYKtidkCsqVk9x p6Yozr49ZopXpCVTXJBxyH/JWV8Rqnhm7Qu5KTHJBkn+xmzxVstyx/LOc7ty8tFhqYEr Ctkg== X-Gm-Message-State: APjAAAWK2Y+NyJjIo/yFcosZecBU0L0OZwe0BbnH2qzCv4xZdmTttn/Q VuUtN5BO8rdah2YdOpJ0CUo= X-Google-Smtp-Source: APXvYqxX4WBUaiuWGupXHmDqZxf75tc8GU/UV42ia+8uJ6iVMxobS281e9ydM/mZSZwNKesmT8NPww== X-Received: by 2002:a19:c113:: with SMTP id r19mr34939498lff.64.1556609699788; Tue, 30 Apr 2019 00:34:59 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.34.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:34:59 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 3/7] blk-mq: Fix disabled hybrid polling Date: Tue, 30 Apr 2019 10:34:15 +0300 Message-Id: <87e3f35a44cf987cc71a8dcc38238bc61164fb11.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov Commit 4bc6339a583cec650b05 ("block: move blk_stat_add() to __blk_mq_end_request()") moved blk_stat_add(), so now it's called after blk_update_request(), which zeroes rq->__data_len. Without length, blk_stat_add() can't calculate stat bucket and returns error, effectively disabling hybrid polling. Move it back to __blk_mq_complete_request. Signed-off-by: Pavel Begunkov --- block/blk-mq.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index fc60ed7e940e..cc3f73e4e01c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -535,11 +535,6 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) if (blk_mq_need_time_stamp(rq)) now = ktime_get_ns(); - if (rq->rq_flags & RQF_STATS) { - blk_mq_poll_stats_start(rq->q); - blk_stat_add(rq, now); - } - if (rq->internal_tag != -1) blk_mq_sched_completed_request(rq, now); @@ -578,6 +573,11 @@ static void __blk_mq_complete_request(struct request *rq) int cpu; WRITE_ONCE(rq->state, MQ_RQ_COMPLETE); + + if (rq->rq_flags & RQF_STATS) { + blk_mq_poll_stats_start(rq->q); + blk_stat_add(rq, ktime_get_ns()); + } /* * Most of single queue controllers, there is only one irq vector * for handling IO completion, and the only irq's affinity is set From patchwork Tue Apr 30 07:34:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922935 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 852A6933 for ; Tue, 30 Apr 2019 07:35:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6EB9527FC0 for ; Tue, 30 Apr 2019 07:35:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6105928684; Tue, 30 Apr 2019 07:35:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB47C27FC0 for ; Tue, 30 Apr 2019 07:35:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726632AbfD3HfJ (ORCPT ); Tue, 30 Apr 2019 03:35:09 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:45230 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbfD3HfF (ORCPT ); Tue, 30 Apr 2019 03:35:05 -0400 Received: by mail-lf1-f65.google.com with SMTP id t11so9912876lfl.12; Tue, 30 Apr 2019 00:35:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CfImJDU825GWzDyzRsKQyMJ5Upvz3fWVsSYZH4ZfskU=; b=vhrmaAladSNAGVu0fLthS9JVDA/ImG64hktD6Wq/M8YtE5ZsATapMpiZhgrtWNS7tm 1QxTfhf9LVTOLj6QNYn16mP1/fQTCUVB33s5U+rDPRdBM8KnDNbapVjWueQYxZ/8GESw DGfgbXbUkNEd8dCgN8Eee4fonS95v2kzzSNzlAvqpVjcf36mQjWHVrn+GAnb6Rn79N1j 6EIE+5VtwubPTc2PwYohBhd4K0jDdWALsU79z9VdKqbt8GLl8YX/RxSGsyNuHoUDKuFr fRK9YHCD4Gr5rEgHu85QtHHgJVVz/ATmQmodeO1IwV12TAHt6mbnIbrqDkMFkmq+FWkr u7qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CfImJDU825GWzDyzRsKQyMJ5Upvz3fWVsSYZH4ZfskU=; b=hlD9pRmAieYTSQO1J99O7XGCIQW7eeReWxKUqcqLNFNx4IhG47PVAjnkKiC5WFoVCK B0WVfXpzFXWgfW95M/j/69MBNku9CkGbrRLp5Vl23FgbaJmXVMC/WHkSWG5rQa3U9Crt VYaFwcK6K4BI3l4J/HPEQdg2chndnysVn7Ghsdi3QwXWakRZH8T8IdGmhh491Tbu5O+t VeOuSIOJAAYU/klh/ltzPTcvoXFrnCxbp/3pWFJwRdyV17SU/zKOqm286UhYNeF07er3 Hn62hizZPBJ1G7rVFgRaBIb/3NKeEhXZj/RV6WPw0NwAOa3UVqgG9PIWk6dFlKb1SSTO PhCw== X-Gm-Message-State: APjAAAURLQtxnJK1L5EvQYnrzw/PAU3F671R0/w9qH8yBdmAay28z7qD AQtTlnlgBZ2XF0Wc5vqWUQ0cqd+q X-Google-Smtp-Source: APXvYqzrYQOaoXmefQCI0XvJeMIEHrPowYGFfu32nmBzry/8uqTu8NIgXTtCU4A2VY1v+yC7cQgLYA== X-Received: by 2002:a19:f703:: with SMTP id z3mr34911683lfe.119.1556609702882; Tue, 30 Apr 2019 00:35:02 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.35.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:35:02 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 4/7] blk-stats: Add left mean deviation to blk_stats Date: Tue, 30 Apr 2019 10:34:16 +0300 Message-Id: <243815abd0a89d660c56739172365556a8f94546.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov The basic idea is to use the 3-sigma rule to guess adaptive polling sleep time. Effective standard deviation calculation could easily overflow u64, thus decided to use mean absolute deviation (MAD) as an approximation. As only the left bound is needed, to increase accuracy MAD is replaced by the left mean deviation (LMD). Signed-off-by: Pavel Begunkov --- block/blk-mq-debugfs.c | 10 ++++++---- block/blk-stat.c | 21 +++++++++++++++++++-- block/blk-stat.h | 6 ++++++ include/linux/blk_types.h | 3 +++ 4 files changed, 34 insertions(+), 6 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index ec1d18cb643c..b62bd4468db3 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -27,12 +27,14 @@ static void print_stat(struct seq_file *m, struct blk_rq_stat *stat) { - if (stat->nr_samples) { - seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu", - stat->nr_samples, stat->mean, stat->min, stat->max); - } else { + if (!stat->nr_samples) { seq_puts(m, "samples=0"); + return; } + + seq_printf(m, "samples=%d, mean=%llu, min=%llu, max=%llu, lmd=%llu", + stat->nr_samples, stat->mean, stat->min, stat->max, + stat->lmd); } static int queue_poll_stat_show(void *data, struct seq_file *m) diff --git a/block/blk-stat.c b/block/blk-stat.c index 13f93249fd5f..e1915a4e41b9 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -17,14 +17,21 @@ struct blk_queue_stats { bool enable_accounting; }; -void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat) +void blk_rq_stat_reset(struct blk_rq_stat_staging *stat) { stat->min = -1ULL; stat->max = 0; stat->batch = 0; + stat->lmd_batch = 0; stat->nr_samples = 0; } +void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat) +{ + blk_rq_stat_reset(stat); + stat->mean_last = 0; +} + void blk_rq_stat_init(struct blk_rq_stat *stat) { stat->min = -1ULL; @@ -42,8 +49,12 @@ void blk_rq_stat_collect(struct blk_rq_stat *dst, dst->mean = div_u64(src->batch + dst->mean * dst->nr_samples, dst->nr_samples + src->nr_samples); + dst->lmd = div_u64(src->lmd_batch + dst->lmd * dst->nr_samples, + dst->nr_samples + src->nr_samples); dst->nr_samples += src->nr_samples; + /* pass mean back for lmd computation */ + src->mean_last = dst->mean; } void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src) @@ -57,6 +68,9 @@ void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src) dst->mean = div_u64(src->mean * src->nr_samples + dst->mean * dst->nr_samples, dst->nr_samples + src->nr_samples); + dst->lmd = div_u64(src->lmd * src->nr_samples + + dst->lmd * dst->nr_samples, + dst->nr_samples + src->nr_samples); dst->nr_samples += src->nr_samples; } @@ -67,6 +81,9 @@ void blk_rq_stat_add(struct blk_rq_stat_staging *stat, u64 value) stat->max = max(stat->max, value); stat->batch += value; stat->nr_samples++; + + if (value < stat->mean_last) + stat->lmd_batch += stat->mean_last - value; } void blk_stat_add(struct request *rq, u64 now) @@ -113,7 +130,7 @@ static void blk_stat_timer_fn(struct timer_list *t) for (bucket = 0; bucket < cb->buckets; bucket++) { blk_rq_stat_collect(&cb->stat[bucket], &cpu_stat[bucket]); - blk_rq_stat_init_staging(&cpu_stat[bucket]); + blk_rq_stat_reset(&cpu_stat[bucket]); } } diff --git a/block/blk-stat.h b/block/blk-stat.h index e5c753fbd6e6..ad81b2ce58bf 100644 --- a/block/blk-stat.h +++ b/block/blk-stat.h @@ -170,5 +170,11 @@ void blk_rq_stat_collect(struct blk_rq_stat *dst, void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src); void blk_rq_stat_init(struct blk_rq_stat *); void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat); +/* + * Prepare stat to the next statistics round. Similar to + * blk_rq_stat_init_staging, but retains some information + * about the previous round (see last_mean). + */ +void blk_rq_stat_reset(struct blk_rq_stat_staging *stat); #endif diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 5718a4e2e731..fe0ad7b2e6ca 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -445,13 +445,16 @@ struct blk_rq_stat { u64 mean; u64 min; u64 max; + u64 lmd; /* left mean deviation */ u32 nr_samples; }; struct blk_rq_stat_staging { + u64 mean_last; u64 min; u64 max; u64 batch; + u64 lmd_batch; u32 nr_samples; }; From patchwork Tue Apr 30 07:34:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2309A1395 for ; Tue, 30 Apr 2019 07:35:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0CB4F288BB for ; Tue, 30 Apr 2019 07:35:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 011A4288C4; Tue, 30 Apr 2019 07:35:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 765B5288BB for ; Tue, 30 Apr 2019 07:35:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726622AbfD3HfJ (ORCPT ); Tue, 30 Apr 2019 03:35:09 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:45548 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725554AbfD3HfI (ORCPT ); Tue, 30 Apr 2019 03:35:08 -0400 Received: by mail-lj1-f195.google.com with SMTP id w12so42439ljh.12; Tue, 30 Apr 2019 00:35:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mtTeypozSEdVW8H84rjErApUVCz52elY8ixwIbauNlE=; b=N19U9oDaIa0WH54p4wVwJulsqsAi7JJpPotAcBQKUnrnOLXGyWemV6NBT+9ikDwQMr +9ww2vlWtsrIZTrG8M0yXoa8oMIDULAusp8o8tjuuJXTlK2ic4WjZHcE6Eoz5hlHTmh8 5MzaXAO7YcIe0z2eIYuY9ZzgqsQKdJoKcQ33p7i0P5bsoI7hxBavjeJzufy2M5NYJ+7Z r7VtfJCu/PWAYEcXg/gld+oXXfSSxc89hXrTNhvRKXYzFP3R5OOFlZw2SMIsbZ/Nh5Zx QUfMz1EZl+OBxJwENe4k7GKa+i4xYu5awaRXN9Nw1TkhxqWUrl8IAlNlb8bLqMNHk7Fn 2gow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mtTeypozSEdVW8H84rjErApUVCz52elY8ixwIbauNlE=; b=d1nwZl/qneIu3EV0DZ5PoM8SLu+PFqIQvn/B05MMkwUNtXTW9PoqMAnMA3uOzjlLs0 LTDq1LQj44grN5Cangjksa13T6kwjo3B3Fh5VIntsohtqKJOrcFJ6SwjBbOj7+Y9wXy8 8ULKdOb5Hpf5vVYjuDCUwZ99RAq/Rm6YBpANoUhtkLsaobvlDuxTLaUGPOi0kipF1U8J fdimrOswD2y9TEUYyJPBE3/lnrYO+2yny0JY5ueINOf4jyiQ1/x+UPiX0BKVZSnrdFxs aa1900EcU9eWzdhxk1HWuksCZFAu41+f8ecYmEoBSq87PgdJoK4gH7osRHYHKtIRfbwL tH8A== X-Gm-Message-State: APjAAAUhf3uArzoTKOMQbpbwCY8Rc+X+dbf4Bnfb8+AkWI832cDgEFNA N/EpbXldP/grQZsZWEHZwG0= X-Google-Smtp-Source: APXvYqxlzXiycrEmuePqR2+Qh9TD8x6HKtIy1DSOa8Jhg8qnv5FqJtTZmY2KgpulLigr5KphIx9tGQ== X-Received: by 2002:a2e:8e93:: with SMTP id z19mr15920182ljk.159.1556609705372; Tue, 30 Apr 2019 00:35:05 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.35.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:35:04 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 5/7] blk-mq: Precalculate hybrid polling time Date: Tue, 30 Apr 2019 10:34:17 +0300 Message-Id: <25f7593a0350e07997fc31d1317218ffeec4f6bf.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov Calculation of sleep time for adaptive hybrid polling on per-request basis could become time consuming in the future. Precalculate it once per statistics gathering round. Signed-off-by: Pavel Begunkov --- block/blk-core.c | 5 ++++- block/blk-mq-debugfs.c | 4 ++-- block/blk-mq.c | 39 ++++++++++++++++++++++----------------- include/linux/blkdev.h | 8 +++++++- 4 files changed, 35 insertions(+), 21 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index a55389ba8779..daadce545e43 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -474,7 +474,7 @@ static void blk_timeout_work(struct work_struct *work) struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) { struct request_queue *q; - int ret; + int ret, bucket; q = kmem_cache_alloc_node(blk_requestq_cachep, gfp_mask | __GFP_ZERO, node_id); @@ -536,6 +536,9 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (blkcg_init_queue(q)) goto fail_ref; + for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS; bucket++) + q->poll_info[bucket].sleep_ns = 0; + return q; fail_ref: diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index b62bd4468db3..ab55446cb570 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -44,11 +44,11 @@ static int queue_poll_stat_show(void *data, struct seq_file *m) for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS/2; bucket++) { seq_printf(m, "read (%d Bytes): ", 1 << (9+bucket)); - print_stat(m, &q->poll_stat[2*bucket]); + print_stat(m, &q->poll_info[2*bucket].stat); seq_puts(m, "\n"); seq_printf(m, "write (%d Bytes): ", 1 << (9+bucket)); - print_stat(m, &q->poll_stat[2*bucket+1]); + print_stat(m, &q->poll_info[2*bucket+1].stat); seq_puts(m, "\n"); } return 0; diff --git a/block/blk-mq.c b/block/blk-mq.c index cc3f73e4e01c..4e54a004e345 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3312,14 +3312,32 @@ static void blk_mq_poll_stats_start(struct request_queue *q) blk_stat_activate_msecs(q->poll_cb, 100); } +static void blk_mq_update_poll_info(struct poll_info *pi, + struct blk_rq_stat *stat) +{ + u64 sleep_ns; + + if (!stat->nr_samples) + sleep_ns = 0; + else + sleep_ns = (stat->mean + 1) / 2; + + pi->stat = *stat; + pi->sleep_ns = sleep_ns; +} + static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb) { struct request_queue *q = cb->data; int bucket; for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS; bucket++) { - if (cb->stat[bucket].nr_samples) - q->poll_stat[bucket] = cb->stat[bucket]; + if (cb->stat[bucket].nr_samples) { + struct poll_info *pi = &q->poll_info[bucket]; + struct blk_rq_stat *stat = &cb->stat[bucket]; + + blk_mq_update_poll_info(pi, stat); + } } } @@ -3327,7 +3345,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q, struct blk_mq_hw_ctx *hctx, struct request *rq) { - unsigned long ret = 0; int bucket; /* @@ -3337,23 +3354,11 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q, if (!blk_poll_stats_enable(q)) return 0; - /* - * As an optimistic guess, use half of the mean service time - * for this type of request. We can (and should) make this smarter. - * For instance, if the completion latencies are tight, we can - * get closer than just half the mean. This is especially - * important on devices where the completion latencies are longer - * than ~10 usec. We do use the stats for the relevant IO size - * if available which does lead to better estimates. - */ bucket = blk_mq_poll_stats_bkt(rq); if (bucket < 0) - return ret; - - if (q->poll_stat[bucket].nr_samples) - ret = (q->poll_stat[bucket].mean + 1) / 2; + return 0; - return ret; + return q->poll_info[bucket].sleep_ns; } static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 317ab30d2904..40c77935fd61 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -385,6 +385,12 @@ static inline int blkdev_reset_zones_ioctl(struct block_device *bdev, #endif /* CONFIG_BLK_DEV_ZONED */ +struct poll_info +{ + struct blk_rq_stat stat; + u64 sleep_ns; +}; + struct request_queue { /* * Together with queue_head for cacheline sharing @@ -477,7 +483,7 @@ struct request_queue { int poll_nsec; struct blk_stat_callback *poll_cb; - struct blk_rq_stat poll_stat[BLK_MQ_POLL_STATS_BKTS]; + struct poll_info poll_info[BLK_MQ_POLL_STATS_BKTS]; struct timer_list timeout; struct work_struct timeout_work; From patchwork Tue Apr 30 07:34:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922939 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 798021395 for ; Tue, 30 Apr 2019 07:35:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62617288BB for ; Tue, 30 Apr 2019 07:35:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56BF3288C4; Tue, 30 Apr 2019 07:35:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B023288BB for ; Tue, 30 Apr 2019 07:35:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725996AbfD3Hf2 (ORCPT ); Tue, 30 Apr 2019 03:35:28 -0400 Received: from mail-lj1-f196.google.com ([209.85.208.196]:39561 "EHLO mail-lj1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726616AbfD3HfK (ORCPT ); Tue, 30 Apr 2019 03:35:10 -0400 Received: by mail-lj1-f196.google.com with SMTP id q10so11798578ljc.6; Tue, 30 Apr 2019 00:35:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=k4L9J9dAEoQlGCP+x9t5jKWTgnLhmW8dV6Nb+oW0n3A=; b=K35uN9Zlj58R341Z/qXn0mrBFqREe5xLoDm+1zlj+w2BquLt6gB8q/XwH22N58JBgP hezSNnR7CLnbaRE/O1fu7boo6XcudBklxYusfbVTL5Sz2EXhqWG6i3kus6aCN1h0BfEM 190dn4ENHuQM7mpZdTZannSUJ8hpxMjzGPTjK+pCrK7hJhmfMiXkgynvKrCfUv8ncamr b/d8tXKPcU+T83vX4CxljWN9Wyvat+U11VCmU2omeNhlb41dVtCcOInBY0DsA+DJdFH1 9HVMkaoMwNHiI5LSJQneido3h90OBSfLlasKXJORNcBzZ2xNxiimD9WiBrIJC8iHV5ow V/JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=k4L9J9dAEoQlGCP+x9t5jKWTgnLhmW8dV6Nb+oW0n3A=; b=JfZzYNty9p4hhZ3LuMhu5gx46qWllalMfabilTGK3b5qjKkdMDBWQHHCHQeBEfc19i +1hCKke0fPx0KwKjH1LMJj/vrGpri7m/RL8uAWnT3Zq3dXUMqDn302zXl38g2poTWs/Y CYsNrQIvATHwTlEuiqJdvUe0dqsh8eLz9YxU+9gsXkhHhECVCMBmifW/rPkp9zkwIKsU UNrhNUzPoaeIbKiGhs6kmkc2N8eAZj29lfXHOQuaE07Y7I7PcEbnYh3JGNPLcKvDkpkD b+k49tXP5xAFl9zSoI5juESeIjgGkNdIy0f4nWX6QG0yHJbsrs/v49VgvXcHtP7PYRa6 YqSw== X-Gm-Message-State: APjAAAXptwi59yEFCSzkg/Ai3C+GGmPfhnpd4H5pKpxxJJS14oY6aIb4 Lv4v9zaZQJVVen8YPJvgG7U= X-Google-Smtp-Source: APXvYqxrI3bf3njEPBSGRY9DDunnyCW7fa8B2f49T9zIIbcxhHcvtx3pHieXY9+Ntj5b4rHCdRnD0Q== X-Received: by 2002:a2e:8703:: with SMTP id m3mr33795416lji.107.1556609707787; Tue, 30 Apr 2019 00:35:07 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.35.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:35:07 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 6/7] blk-mq: Track num of overslept by hybrid poll rqs Date: Tue, 30 Apr 2019 10:34:18 +0300 Message-Id: <0096e72ed1e1c94021a33cddeffee36abe78338b.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov To fine-tune adaptive polling sleep time, it's needed to know how accurate the current estimate is, which could be done using the ratio of missed (i.e., overslept) requests. The collection of the missed number is performed with an assumption, that a request needs to busy poll for some time after wake up to complete. And if it was completed by the first poll call, than that's a miss. Signed-off-by: Pavel Begunkov --- block/blk-core.c | 4 +- block/blk-mq.c | 94 ++++++++++++++++++++++++++++++------------ block/blk-stat.c | 2 +- include/linux/blkdev.h | 9 ++++ 4 files changed, 81 insertions(+), 28 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index daadce545e43..88d8ec4268ca 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -536,8 +536,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (blkcg_init_queue(q)) goto fail_ref; - for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS; bucket++) + for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS; bucket++) { q->poll_info[bucket].sleep_ns = 0; + atomic_set(&q->poll_info[bucket].nr_misses, 0); + } return q; diff --git a/block/blk-mq.c b/block/blk-mq.c index 4e54a004e345..ec7cde754c2f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -528,6 +528,34 @@ void blk_mq_free_request(struct request *rq) } EXPORT_SYMBOL_GPL(blk_mq_free_request); +static inline void blk_mq_record_stats(struct request *rq, u64 now) +{ + int bucket = blk_mq_poll_stats_bkt(rq); + + if (bucket >= 0 && !(rq->rq_flags & RQF_MQ_POLLED)) { + struct poll_info *pi; + u64 threshold; + + pi = &rq->q->poll_info[bucket]; + /* + * Even if the time for hybrid polling predicted well, the + * completion could oversleep because of a timer's lag. Try + * to detect and skip accounting for such outliers. + */ + threshold = pi->stat.mean; + + /* + * Ideally, miss count should be close to 0, + * so should not happen often. + */ + if (blk_rq_io_time(rq, now) < threshold) + atomic_inc(&pi->nr_misses); + } + + blk_mq_poll_stats_start(rq->q); + blk_stat_add(rq, now); +} + inline void __blk_mq_end_request(struct request *rq, blk_status_t error) { u64 now = 0; @@ -574,10 +602,8 @@ static void __blk_mq_complete_request(struct request *rq) WRITE_ONCE(rq->state, MQ_RQ_COMPLETE); - if (rq->rq_flags & RQF_STATS) { - blk_mq_poll_stats_start(rq->q); - blk_stat_add(rq, ktime_get_ns()); - } + if (rq->rq_flags & RQF_STATS) + blk_mq_record_stats(rq, ktime_get_ns()); /* * Most of single queue controllers, there is only one irq vector * for handling IO completion, and the only irq's affinity is set @@ -3316,14 +3342,25 @@ static void blk_mq_update_poll_info(struct poll_info *pi, struct blk_rq_stat *stat) { u64 sleep_ns; + u32 nr_misses, nr_samples; + + nr_samples = stat->nr_samples; + nr_misses = atomic_read(&pi->nr_misses); + if (nr_misses > nr_samples) + nr_misses = nr_samples; - if (!stat->nr_samples) + if (!nr_samples) sleep_ns = 0; else sleep_ns = (stat->mean + 1) / 2; + /* + * Use miss ratio here to adjust sleep time + */ + pi->stat = *stat; pi->sleep_ns = sleep_ns; + atomic_set(&pi->nr_misses, 0); } static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb) @@ -3389,10 +3426,6 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, rq->rq_flags |= RQF_MQ_POLL_SLEPT; - /* - * This will be replaced with the stats tracking code, using - * 'avg_completion_time / 2' as the pre-sleep target. - */ kt = nsecs; mode = HRTIMER_MODE_REL; @@ -3417,30 +3450,34 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, } static bool blk_mq_poll_hybrid(struct request_queue *q, - struct blk_mq_hw_ctx *hctx, blk_qc_t cookie) + struct blk_mq_hw_ctx *hctx, + struct request *rq) { - struct request *rq; - if (q->poll_nsec == BLK_MQ_POLL_CLASSIC) return false; - if (!blk_qc_t_is_internal(cookie)) - rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie)); - else { - rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie)); - /* - * With scheduling, if the request has completed, we'll - * get a NULL return here, as we clear the sched tag when - * that happens. The request still remains valid, like always, - * so we should be safe with just the NULL check. - */ - if (!rq) - return false; - } + /* + * With scheduling, if the request has completed, we'll + * get a NULL request here, as we clear the sched tag when + * that happens. The request still remains valid, like always, + * so we should be safe with just the NULL check. + */ + if (!rq) + return false; return blk_mq_poll_hybrid_sleep(q, hctx, rq); } +static inline struct request *qc_t_to_request(struct blk_mq_hw_ctx *hctx, + blk_qc_t cookie) +{ + struct blk_mq_tags *tags; + + tags = blk_qc_t_is_internal(cookie) ? hctx->sched_tags : hctx->tags; + + return blk_mq_tag_to_rq(tags, blk_qc_t_to_tag(cookie)); +} + /** * blk_poll - poll for IO completions * @q: the queue @@ -3456,6 +3493,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q, int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) { struct blk_mq_hw_ctx *hctx; + struct request *rq; long state; if (!blk_qc_t_valid(cookie) || @@ -3466,6 +3504,7 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) blk_flush_plug_list(current->plug, false); hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)]; + rq = qc_t_to_request(hctx, cookie); /* * If we sleep, have the caller restart the poll loop to reset @@ -3474,7 +3513,7 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) * the IO isn't complete, we'll get called again and will go * straight to the busy poll loop. */ - if (blk_mq_poll_hybrid(q, hctx, cookie)) + if (blk_mq_poll_hybrid(q, hctx, rq)) return 1; hctx->poll_considered++; @@ -3486,6 +3525,9 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) hctx->poll_invoked++; ret = q->mq_ops->poll(hctx); + if (rq) + rq->rq_flags |= RQF_MQ_POLLED; + if (ret > 0) { hctx->poll_success++; __set_current_state(TASK_RUNNING); diff --git a/block/blk-stat.c b/block/blk-stat.c index e1915a4e41b9..33b7b9c35791 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -94,7 +94,7 @@ void blk_stat_add(struct request *rq, u64 now) int bucket; u64 value; - value = (now >= rq->io_start_time_ns) ? now - rq->io_start_time_ns : 0; + value = blk_rq_io_time(rq, now); blk_throtl_stat_add(rq, value); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 40c77935fd61..36f17ed1376a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -109,6 +109,9 @@ typedef __u32 __bitwise req_flags_t; #define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 20)) /* ->timeout has been called, don't expire again */ #define RQF_TIMED_OUT ((__force req_flags_t)(1 << 21)) +/* Request has been polled at least once */ +#define RQF_MQ_POLLED ((__force req_flags_t)(1 << 22)) + /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \ @@ -389,6 +392,7 @@ struct poll_info { struct blk_rq_stat stat; u64 sleep_ns; + atomic_t nr_misses; }; struct request_queue { @@ -924,6 +928,11 @@ static inline unsigned int blk_rq_zone_is_seq(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +static inline u64 blk_rq_io_time(struct request *rq, u64 now) +{ + return (now >= rq->io_start_time_ns) ? now - rq->io_start_time_ns : 0; +} + /* * Some commands like WRITE SAME have a payload or data transfer size which * is different from the size of the request. Any driver that supports such From patchwork Tue Apr 30 07:34:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 10922937 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FA13933 for ; Tue, 30 Apr 2019 07:35:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECD35288BB for ; Tue, 30 Apr 2019 07:35:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E02A6288C4; Tue, 30 Apr 2019 07:35:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B4EA288BB for ; Tue, 30 Apr 2019 07:35:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726349AbfD3HfO (ORCPT ); Tue, 30 Apr 2019 03:35:14 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:41954 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726683AbfD3HfN (ORCPT ); Tue, 30 Apr 2019 03:35:13 -0400 Received: by mail-lf1-f65.google.com with SMTP id t30so9921366lfd.8; Tue, 30 Apr 2019 00:35:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4Akp5uxp13EweIV3YZW+2vy2S3lWmyRPWKOuymZaNBQ=; b=LFLixQ6b6+xa9/QZVqDDGA21oDHgPCKaF3ael1isibVX7xxiKAthhuYeKMRq8INoMY MrkjZsyED6OgcB5DbWY1NaE9vG0iRgMOpEwrXWZ/w7iKvNlHNzQccL94pJCJ9dw/akWa VydPi/RwXz7qmbQ5AR9CEzvO5iP81KQTvUhxarrrPKKwrl/toGTeguZiSzuiZ0gcIgij 7MY9SnM6umJ76UIBnegY4Q7haFT+ZMfwgNS26spdYL09UFp7uehv0FdUMX2itADK20IF 3ZlZpUydeBuHTdK/LThLpAOYG8n0bCYlG7e6Wzt5tw7XvcLtfb/D1n7jDqr7mYKHg4Jh ddkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4Akp5uxp13EweIV3YZW+2vy2S3lWmyRPWKOuymZaNBQ=; b=rCvZAEVrMRrN7tF1jKOQfR1zS5ho3t7WoSutVJbVcbhzMSKSikDA9BuPIJj0ciKf+J IpFmM+Nkp+ZYUgrnRf9fK0/tow92Psa08M+YCsZbfX+io/04+6cV4GhvyxcmXKcpq5bE 38k1c3JBRXYdUm42Zoq88evtAkA0IkkzYTdIEhcmsZsXfimkF88Mpjgxgh2ETBFxlSwG fT0ZGplBt6uaK+zgfFcfQFm5Qm+YQePogZGz1ySKcVTHDPMJnzu1yPZG4moIZWWjpTGg sjG0AEXmtembIelVlLvURPe1NTlUlSjD7PLQ22YhulRnLnQwtdXb64M/Y3dvCHI8NbeX uK2g== X-Gm-Message-State: APjAAAVRMjyS3TdOVNu6C4sJmOel9bBThefkz0bQm+8WErCppcXijS2c 4FkVZ0cf69LHZ0q1B4mII3o= X-Google-Smtp-Source: APXvYqzAVtaeLvJcIE/R+qJxl8ttLN6W3l69ULlOrb+R12MtLxhFyXnmQXwOHsclaEN/tVWSHOqa1w== X-Received: by 2002:ac2:5582:: with SMTP id v2mr9892379lfg.19.1556609710946; Tue, 30 Apr 2019 00:35:10 -0700 (PDT) Received: from localhost.localdomain ([109.126.133.52]) by smtp.gmail.com with ESMTPSA id v23sm2400572ljk.14.2019.04.30.00.35.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 00:35:10 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 7/7] blk-mq: Adjust hybrid poll sleep time Date: Tue, 30 Apr 2019 10:34:19 +0300 Message-Id: <90ea71d810084eec70fb1632587b450b3037ce85.1556609582.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Begunkov Sleep for (mean / 2) in the adaptive polling is often too pessimistic, use a variation of the 3-sigma rule (mean - 4 * lmd) and tune it in runtime using percentage of missed (i.e. overslept) requests: 1. if more than ~3% of requests are missed, then fallback to (mean / 2) 2. if more than ~0.4% is missed, then scale down Pitfalls: 1. any missed request increases the mean, synergistically increasing mean and sleep time, so, scale down fast in the case 2. even if the sleep time is predicted well, sleep loop could greatly oversleep by itself. Then try to detect it and skip the miss accounting. Tested on an NVMe SSD: {4K,8K} read-only workloads give similar latency distribution (up to 7 nines), and decreases CPU load twice (50% -> 25%). New method even outperform the old one a bit (in terms of throughput and latencies), presumably, because it alleviates the 2nd pitfall. For write-only workload it falls back to (mean / 2). Signed-off-by: Pavel Begunkov --- block/blk-mq.c | 44 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ec7cde754c2f..efa44a617bea 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3338,10 +3338,21 @@ static void blk_mq_poll_stats_start(struct request_queue *q) blk_stat_activate_msecs(q->poll_cb, 100); } +/* + * Thresholds are ilog2(nr_requests / nr_misses) + * To calculate tolerated miss ratio from it, use + * f(x) ~= 2 ^ -(x + 1) + * + * fallback ~ 3.1% + * throttle ~ 0.4% + */ +#define BLK_POLL_FALLBACK_THRESHOLD 4 +#define BLK_POLL_THROTTLE_THRESHOLD 7 + static void blk_mq_update_poll_info(struct poll_info *pi, struct blk_rq_stat *stat) { - u64 sleep_ns; + u64 half_mean, indent, sleep_ns; u32 nr_misses, nr_samples; nr_samples = stat->nr_samples; @@ -3349,14 +3360,33 @@ static void blk_mq_update_poll_info(struct poll_info *pi, if (nr_misses > nr_samples) nr_misses = nr_samples; - if (!nr_samples) + half_mean = (stat->mean + 1) / 2; + indent = stat->lmd * 4; + + if (!stat->nr_samples) { sleep_ns = 0; - else - sleep_ns = (stat->mean + 1) / 2; + } else if (!stat->lmd || stat->mean <= indent) { + sleep_ns = half_mean; + } else { + int ratio = INT_MAX; - /* - * Use miss ratio here to adjust sleep time - */ + sleep_ns = stat->mean - indent; + + /* + * If a completion is overslept, the observable time will + * be greater than the actual, so increasing mean. It + * also increases sleep time estimation, synergistically + * backfiring on mean. Need to scale down / fallback early. + */ + if (nr_misses) + ratio = ilog2(nr_samples / nr_misses); + if (ratio <= BLK_POLL_FALLBACK_THRESHOLD) + sleep_ns = half_mean; + else if (ratio <= BLK_POLL_THROTTLE_THRESHOLD) + sleep_ns -= sleep_ns / 4; + + sleep_ns = max(sleep_ns, half_mean); + } pi->stat = *stat; pi->sleep_ns = sleep_ns;