From patchwork Thu Sep 27 22:55:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10618719 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6852E175A for ; Thu, 27 Sep 2018 22:56:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C52E2B54B for ; Thu, 27 Sep 2018 22:56:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5068C2B568; Thu, 27 Sep 2018 22:56:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA47D2B561 for ; Thu, 27 Sep 2018 22:56:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726081AbeI1FRJ (ORCPT ); Fri, 28 Sep 2018 01:17:09 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:34140 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725957AbeI1FRJ (ORCPT ); Fri, 28 Sep 2018 01:17:09 -0400 Received: by mail-pf1-f195.google.com with SMTP id k19-v6so2920283pfi.1 for ; Thu, 27 Sep 2018 15:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6RxSj5+6EBg3qzf9tC3pVGES1JNyglFu1U7aXTvybls=; b=ukKI58cgg11xsZTYODXlxn0OP0VgXKQLM8OVsUPdAEGcR89mZEvzLnN5uWC28gW7Xe FmWliDJf5SoujHSlhcov/Qi3e23h5MDloRxrmwwqjL1lrgcfV39Bh9FX+lQcYX6M2YIt Pkppe9gMudRGkX+tmrURmTBnbRqi8OXzBSipSsBfFGHamc+Ijt+wNKxz7rzUnOx/cFlg is9m5/Fw6l+n7pOS+jtN+nwxcZgyUwnRdH3NV4tREuofZmAzz0bxTFYIYi+MK6Fj5C4f EFg0Vk6ZSx05yDZg1fbeGNM3Cyzp6q6XUUkDpv3SjJ2NLOlvvuuJdvzdFGUnaR+HErpX s7Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6RxSj5+6EBg3qzf9tC3pVGES1JNyglFu1U7aXTvybls=; b=UlNjI/Pomj2VZL5zr2Hd39LaLno32KEgZ9qZYzRrP2+y5kN59sprNDqE2dLRCbToD+ rPtpndlALhxM1eMgOaRsNVWyhL2ohoKmOtOIKOjIsuc+kMuZTLHtJu1cv7ZGJBE9rBh+ AJdUT+PjBLl8xGe9/alJp9e6Wiif6KQTn3SZK60AXdnO985MEOe4d1TWzRVDKUPyj74J 5okxwKJJGhWneQ5qWL1QWPZz6B72qm7t3PAr+Xa0OR+XfTkUIjvJNH6NTdeEX02LD4Su WJITfZ/LbyELV3xOr4jCQ4qGeJZAzsA1X7ltw2zL+MnSCUv/T0+ZiHn3PZ8ezPJ1uNw+ eRiQ== X-Gm-Message-State: ABuFfohDYwFgV9P7hlrsypePt63vEi8B1jV1KE9x5RkMbNuOsxMzbET1 kMOIw5MAvXNlb7N0+2mtoQpHBV0MFkc= X-Google-Smtp-Source: ACcGV623x4OqUs5gNNS4H2JxQEAWtbX4pp57J5+guNnPaFkw6kb2aOihpukYqD9cTV3W35mIrDmIzw== X-Received: by 2002:a63:4281:: with SMTP id p123-v6mr12105256pga.91.1538088991081; Thu, 27 Sep 2018 15:56:31 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::5:3e64]) by smtp.gmail.com with ESMTPSA id t1-v6sm3716721pgp.32.2018.09.27.15.56.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 15:56:30 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [PATCH v2 1/5] block: move call of scheduler's ->completed_request() hook Date: Thu, 27 Sep 2018 15:55:51 -0700 Message-Id: <0af9e13d8c4e092f4b407f83c39421be1b90b7f1.1538088475.git.osandov@fb.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Commit 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()") consolidated some calls using ktime_get() so we'd only need to call it once. Kyber's ->completed_request() hook also calls ktime_get(), so let's move it to the same place, too. Signed-off-by: Omar Sandoval --- block/blk-mq-sched.h | 4 ++-- block/blk-mq.c | 5 +++-- block/kyber-iosched.c | 5 ++--- include/linux/elevator.h | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 4e028ee42430..8a9544203173 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -49,12 +49,12 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq, return true; } -static inline void blk_mq_sched_completed_request(struct request *rq) +static inline void blk_mq_sched_completed_request(struct request *rq, u64 now) { struct elevator_queue *e = rq->q->elevator; if (e && e->type->ops.mq.completed_request) - e->type->ops.mq.completed_request(rq); + e->type->ops.mq.completed_request(rq, now); } static inline void blk_mq_sched_started_request(struct request *rq) diff --git a/block/blk-mq.c b/block/blk-mq.c index d384ab700afd..1e72d53e8f2d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -528,6 +528,9 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) blk_stat_add(rq, now); } + if (rq->internal_tag != -1) + blk_mq_sched_completed_request(rq, now); + blk_account_io_done(rq, now); if (rq->end_io) { @@ -564,8 +567,6 @@ static void __blk_mq_complete_request(struct request *rq) if (!blk_mq_mark_complete(rq)) return; - if (rq->internal_tag != -1) - blk_mq_sched_completed_request(rq); if (!test_bit(QUEUE_FLAG_SAME_COMP, &rq->q->queue_flags)) { rq->q->softirq_done_fn(rq); diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index a1660bafc912..95d062c07c61 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -558,12 +558,12 @@ static void kyber_finish_request(struct request *rq) rq_clear_domain_token(kqd, rq); } -static void kyber_completed_request(struct request *rq) +static void kyber_completed_request(struct request *rq, u64 now) { struct request_queue *q = rq->q; struct kyber_queue_data *kqd = q->elevator->elevator_data; unsigned int sched_domain; - u64 now, latency, target; + u64 latency, target; /* * Check if this request met our latency goal. If not, quickly gather @@ -585,7 +585,6 @@ static void kyber_completed_request(struct request *rq) if (blk_stat_is_active(kqd->cb)) return; - now = ktime_get_ns(); if (now < rq->io_start_time_ns) return; diff --git a/include/linux/elevator.h b/include/linux/elevator.h index a02deea30185..015bb59c0331 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -111,7 +111,7 @@ struct elevator_mq_ops { void (*insert_requests)(struct blk_mq_hw_ctx *, struct list_head *, bool); struct request *(*dispatch_request)(struct blk_mq_hw_ctx *); bool (*has_work)(struct blk_mq_hw_ctx *); - void (*completed_request)(struct request *); + void (*completed_request)(struct request *, u64); void (*started_request)(struct request *); void (*requeue_request)(struct request *); struct request *(*former_request)(struct request_queue *, struct request *); From patchwork Thu Sep 27 22:55:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10618721 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 348DE913 for ; Thu, 27 Sep 2018 22:56:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 284862B54B for ; Thu, 27 Sep 2018 22:56:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1CADD2B56B; Thu, 27 Sep 2018 22:56:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD1302B54B for ; Thu, 27 Sep 2018 22:56:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726091AbeI1FRK (ORCPT ); Fri, 28 Sep 2018 01:17:10 -0400 Received: from mail-pf1-f180.google.com ([209.85.210.180]:34818 "EHLO mail-pf1-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725957AbeI1FRK (ORCPT ); Fri, 28 Sep 2018 01:17:10 -0400 Received: by mail-pf1-f180.google.com with SMTP id p12-v6so2920499pfh.2 for ; Thu, 27 Sep 2018 15:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UXuHFIndZIYil86jbMwPW/YOI70jZYWJpardaeoEq78=; b=sZsyH24QqvSS1phPyQXppHYqfr6k7pn1trR51kiY+uhPt7ihHlsX9sqzM2y96Oycz5 DnRktXgGyxNqVlWsav6rQ4ac4Hxa1yfw7H7VF1CTDqyeR4+witolg/x/mxabNZ10mq2c IdJe207jhPsCu5l7styHOhtzFU7xyyU0Ne1rZP7cIH/brw+yeDDDPFJKVWJZhaszwNhP U1B/jSLza8DlACnlXucqowZMymYjp6jcUksAmhzmQvCldVu1c2KbaIkAxs30CokdMOtM KSvqmbAAQ/+7fpSmIvGHTtaD86HNTYAc6C/C2lPbUGTCsxjfvJQUG3tckOb+202f2/QK tahQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UXuHFIndZIYil86jbMwPW/YOI70jZYWJpardaeoEq78=; b=Puqx5kZoQhzIqvbwRGxKy/kukINjreoQvStROftf6NMP6nvbXGoYpEuO5mnv53hu8g kjp4gvufSalJakgALl0hn7vep80OVIkewyTB5TB2b+j18bCkBR12VRtaeKtTbBsgQnPP GBZZYSLfg2o4xe93lDlwca2AaF38/tXwEk7oGSBpHCo5B2Z5NiH3GQQZ/38p/kcWGDV8 voDCsu3qsfxP7Thc3AJRW54sUhMBQGjmuC+oSyAbMZXcEXHBD/+E6d2CLnmBYpGMQigl UhR9Jb2+ZDcRyjMZ0o0r6O0Kpl8J3+OREExZAnEMCp0+nCcQEjKhm2epoicBxWAz0Cr/ mFGQ== X-Gm-Message-State: ABuFfojc+7BEcAns8XRPO5G1mq5/lnbjDmMohYZ0b43bw49bf8YmjKE8 gZgfupHcg1QpxXcTpVVKj/mqgRtsb7s= X-Google-Smtp-Source: ACcGV612O0/UehdWFOkhnvrrbJFBD48/uKMAsBxquL3UV4BGRvWQUzfNl+/N3yUYG00YK8nX1PNseg== X-Received: by 2002:a63:1921:: with SMTP id z33-v6mr12535844pgl.302.1538088991812; Thu, 27 Sep 2018 15:56:31 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::5:3e64]) by smtp.gmail.com with ESMTPSA id t1-v6sm3716721pgp.32.2018.09.27.15.56.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 15:56:31 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [PATCH v2 2/5] block: export blk_stat_enable_accounting() Date: Thu, 27 Sep 2018 15:55:52 -0700 Message-Id: X-Mailer: git-send-email 2.19.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Kyber will need this in a future change if it is built as a module. Signed-off-by: Omar Sandoval --- block/blk-stat.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-stat.c b/block/blk-stat.c index 7587b1c3caaf..90561af85a62 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -190,6 +190,7 @@ void blk_stat_enable_accounting(struct request_queue *q) blk_queue_flag_set(QUEUE_FLAG_STATS, q); spin_unlock(&q->stats->lock); } +EXPORT_SYMBOL_GPL(blk_stat_enable_accounting); struct blk_queue_stats *blk_alloc_queue_stats(void) { From patchwork Thu Sep 27 22:55:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10618723 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57250913 for ; Thu, 27 Sep 2018 22:56:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AB292B561 for ; Thu, 27 Sep 2018 22:56:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F0302B56B; Thu, 27 Sep 2018 22:56:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D6B612B561 for ; Thu, 27 Sep 2018 22:56:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726100AbeI1FRL (ORCPT ); Fri, 28 Sep 2018 01:17:11 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:38830 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725957AbeI1FRL (ORCPT ); Fri, 28 Sep 2018 01:17:11 -0400 Received: by mail-pf1-f196.google.com with SMTP id x17-v6so2908822pfh.5 for ; Thu, 27 Sep 2018 15:56:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VjltX173hApTc7hUXh4zTpNOr3tihINNCf28/dpEECs=; b=eRHRzUh1mvUIJ6aAkDYB/12U1A7UgSTvtjjMWTnYuWnFgaNCW1pYmWHR3Y8o+sE1/+ MrZLsbTBsFBernrIB/HouXtWkBHxYVe+wgkEwhyrIG8h6mcM08OMg5LhoJadyUAeDcL+ GPZVbpyyUxe6EoiKYJ7Wo6y/+uG3k1Lb+pgfmUI3XSgP12lDQkGypATQIdeTGnHYYced zHlIGwXisjnglJAXmFaLdSkaKpBY6xhABGIohuS7oBTKq/Z/IBgvjGsonSG0nTHMF5Ln dTYAT11GxCZ/sTUQo+8hKBhOIGCW8AzYhHpq5IOzA+CP9t8IdEUdM+V+n3nwToY7QZ2l eQrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VjltX173hApTc7hUXh4zTpNOr3tihINNCf28/dpEECs=; b=GyEpUujPedRur5sDpl5w+uoFnmdtBPBVGq77cQ5aRRMvYHsuMgAynJu1c1DMdQlfrD V8KcGVQq1t9gu2was2zPn0j3e3B6TMYS23dg29NugBQvjiSEpS4BwpWg0VHvk99zi+dX si0xRxJ8tbzwLI6zWuAq6Kss9wqylDB/RPK+3KCD0Dlx/ucR1jWOeg83KiT+8FuGXB1N CBg8Ci5brEYg40JJKpg4dSymZlfzYGFl0OoidGKiqKdymCt6GA71XRYTVCnMo0hPH5l3 BChRzXaxc7p6uQgWqzsqL27Nom9hARDYiLZRDRCuMOlrhyf6RMFmM5OUDbymbFIyBQlO b3jw== X-Gm-Message-State: ABuFfohG7vZIwptAu4+/zvcud8Rc/pKXxr6Pk6kr3RlAp+IxvFQdfZW2 z5RllAovtOOA/AiXpZjDnrI17GGbENc= X-Google-Smtp-Source: ACcGV61pWVy8+WUMhtAe3Oyac5jSBUf8Sj4sqha0T/ZMx8ryoQ3lRzjOV9Aut7fiiikBbj3RCl/ZUQ== X-Received: by 2002:aa7:824d:: with SMTP id e13-v6mr13614012pfn.97.1538088992547; Thu, 27 Sep 2018 15:56:32 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::5:3e64]) by smtp.gmail.com with ESMTPSA id t1-v6sm3716721pgp.32.2018.09.27.15.56.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 15:56:32 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [PATCH v2 3/5] kyber: don't make domain token sbitmap larger than necessary Date: Thu, 27 Sep 2018 15:55:53 -0700 Message-Id: <22c03a0be5e385516b9c2b2cb37c3baa54e82129.1538088475.git.osandov@fb.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval The domain token sbitmaps are currently initialized to the device queue depth or 256, whichever is larger, and immediately resized to the maximum depth for that domain (256, 128, or 64 for read, write, and other, respectively). The sbitmap is never resized larger than that, so it's unnecessary to allocate a bitmap larger than the maximum depth. Let's just allocate it to the maximum depth to begin with. This will use marginally less memory, and more importantly, give us a more appropriate number of bits per sbitmap word. Signed-off-by: Omar Sandoval --- block/kyber-iosched.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 95d062c07c61..08eb5295c18d 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -40,8 +40,6 @@ enum { }; enum { - KYBER_MIN_DEPTH = 256, - /* * In order to prevent starvation of synchronous requests by a flood of * asynchronous requests, we reserve 25% of requests for synchronous @@ -305,7 +303,6 @@ static int kyber_bucket_fn(const struct request *rq) static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) { struct kyber_queue_data *kqd; - unsigned int max_tokens; unsigned int shift; int ret = -ENOMEM; int i; @@ -320,25 +317,17 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) if (!kqd->cb) goto err_kqd; - /* - * The maximum number of tokens for any scheduling domain is at least - * the queue depth of a single hardware queue. If the hardware doesn't - * have many tags, still provide a reasonable number. - */ - max_tokens = max_t(unsigned int, q->tag_set->queue_depth, - KYBER_MIN_DEPTH); for (i = 0; i < KYBER_NUM_DOMAINS; i++) { WARN_ON(!kyber_depth[i]); WARN_ON(!kyber_batch_size[i]); ret = sbitmap_queue_init_node(&kqd->domain_tokens[i], - max_tokens, -1, false, GFP_KERNEL, - q->node); + kyber_depth[i], -1, false, + GFP_KERNEL, q->node); if (ret) { while (--i >= 0) sbitmap_queue_free(&kqd->domain_tokens[i]); goto err_cb; } - sbitmap_queue_resize(&kqd->domain_tokens[i], kyber_depth[i]); } shift = kyber_sched_tags_shift(kqd); From patchwork Thu Sep 27 22:55:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10618727 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 53336174A for ; Thu, 27 Sep 2018 22:56:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 43DF32B54B for ; Thu, 27 Sep 2018 22:56:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 37FFF2B56B; Thu, 27 Sep 2018 22:56:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D37C92B561 for ; Thu, 27 Sep 2018 22:56:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726083AbeI1FRO (ORCPT ); Fri, 28 Sep 2018 01:17:14 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:46399 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725957AbeI1FRN (ORCPT ); Fri, 28 Sep 2018 01:17:13 -0400 Received: by mail-pf1-f195.google.com with SMTP id d8-v6so2882617pfo.13 for ; Thu, 27 Sep 2018 15:56:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bzb43gZY67Tw5NJYXlzhsjneVLpd0s44nkHm6/Yz7ls=; b=l86e414irjyoG7zJl55v+FZye1suKwBGozFjwUW60O7Xy4auwZDTFBxIY/H1pDtQvt cKzNZ/nSNDuGNLevtIPw0CAChlGzFStO1JFajLDgOMQGtB09mLVvFDqXg7xmFJYOyg5+ zvVrfwB/1/f4i+4s8hRn+1JfoMOTs0P9ZG0dmcGgyYMPR69bH56AsXchZ+i8KDEdSJna 4jVX1rDcK5X5N4SJ602EvlPHbLkf9nKBYzB0SNcKotW/jm7CN0BCsahOGyrMgf2dFI/g 6ys9pkiYukLHb7VybPnWTYeZ/BZnCyciWM0zQzWzU5unYqV2ziX4oYKpHm2krjkMQsPq JT1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bzb43gZY67Tw5NJYXlzhsjneVLpd0s44nkHm6/Yz7ls=; b=QCMBDj6cXFKIqITFEocjXOVXmlPz7RXCusSKUw6u1JC1IbFXlciOsiUXHbTkOHQayE ANiSgpxkNGfwi/yLNLu+74lCn3dPGJ2FYxzU7h/ri4KaayH+9iNhZViHhxg0eY1Ll/DX aZ+X1ZufXwphZSrw6rnNRACzRhYdjZtl8KS3HXl1NlFm9fJwcKL9JbD4Av2xiFueDao4 spjpZoAfs2zvlRq1pPpU3KeznIHhMLBbfrzPm+GKrt2Zg1FHDgjpQNQKgbY6RJJUF7OO Iv08OZfM3ChowjuKIItWzXmCSxbsLlPuxYqeoYkEl6JUuX0Rtgqgpz5QyAxifnabuxdW nO+Q== X-Gm-Message-State: ABuFfoj2vkZj1oRm51H+37LYbveNZ93fbQi9w4FoQzM6W/tgPf+uE+x3 CZkJ8zfUNOAdKxGAx9zrDdikVIMuJxo= X-Google-Smtp-Source: ACcGV62uxHR+fZzti9LWJLty96KxlOTgeGP/Rj6f6hJ3higHhjrHeX6SCAEcnlC/9fDfY+qzU1bfMw== X-Received: by 2002:a63:d14a:: with SMTP id c10-v6mr149116pgj.384.1538088993600; Thu, 27 Sep 2018 15:56:33 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::5:3e64]) by smtp.gmail.com with ESMTPSA id t1-v6sm3716721pgp.32.2018.09.27.15.56.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 15:56:33 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [PATCH v2 4/5] kyber: implement improved heuristics Date: Thu, 27 Sep 2018 15:55:54 -0700 Message-Id: <37a6d5423a60ed64422121238bc6c3e6a71abee2.1538088475.git.osandov@fb.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Kyber's current heuristics have a few flaws: - It's based on the mean latency, but p99 latency tends to be more meaningful to anyone who cares about latency. The mean can also be skewed by rare outliers that the scheduler can't do anything about. - The statistics calculations are purely time-based with a short window. This works for steady, high load, but is more sensitive to outliers with bursty workloads. - It only considers the latency once an I/O has been submitted to the device, but the user cares about the time spent in the kernel, as well. These are shortcomings of the generic blk-stat code which doesn't quite fit the ideal use case for Kyber. So, this replaces the statistics with a histogram used to calculate percentiles of total latency and I/O latency, which we then use to adjust depths in a slightly more intelligent manner: - Sync and async writes are now the same domain. - Discards are a separate domain. - Domain queue depths are scaled by the ratio of the p99 total latency to the target latency (e.g., if the p99 latency is double the target latency, we will double the queue depth; if the p99 latency is half of the target latency, we can halve the queue depth). - We use the I/O latency to determine whether we should scale queue depths down: we will only scale down if any domain's I/O latency exceeds the target latency, which is an indicator of congestion in the device. These new heuristics are just as scalable as the heuristics they replace. Signed-off-by: Omar Sandoval --- block/kyber-iosched.c | 497 ++++++++++++++++++++++++------------------ 1 file changed, 279 insertions(+), 218 deletions(-) diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 08eb5295c18d..adc8e6393829 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -29,13 +29,16 @@ #include "blk-mq-debugfs.h" #include "blk-mq-sched.h" #include "blk-mq-tag.h" -#include "blk-stat.h" -/* Scheduling domains. */ +/* + * Scheduling domains: the device is divided into multiple domains based on the + * request type. + */ enum { KYBER_READ, - KYBER_SYNC_WRITE, - KYBER_OTHER, /* Async writes, discard, etc. */ + KYBER_WRITE, + KYBER_DISCARD, + KYBER_OTHER, KYBER_NUM_DOMAINS, }; @@ -49,25 +52,82 @@ enum { }; /* - * Initial device-wide depths for each scheduling domain. + * Maximum device-wide depth for each scheduling domain. * - * Even for fast devices with lots of tags like NVMe, you can saturate - * the device with only a fraction of the maximum possible queue depth. - * So, we cap these to a reasonable value. + * Even for fast devices with lots of tags like NVMe, you can saturate the + * device with only a fraction of the maximum possible queue depth. So, we cap + * these to a reasonable value. */ static const unsigned int kyber_depth[] = { [KYBER_READ] = 256, - [KYBER_SYNC_WRITE] = 128, - [KYBER_OTHER] = 64, + [KYBER_WRITE] = 128, + [KYBER_DISCARD] = 64, + [KYBER_OTHER] = 16, }; /* - * Scheduling domain batch sizes. We favor reads. + * Default latency targets for each scheduling domain. + */ +static const u64 kyber_latency_targets[] = { + [KYBER_READ] = 2 * NSEC_PER_MSEC, + [KYBER_WRITE] = 10 * NSEC_PER_MSEC, + [KYBER_DISCARD] = 5 * NSEC_PER_SEC, +}; + +/* + * Batch size (number of requests we'll dispatch in a row) for each scheduling + * domain. */ static const unsigned int kyber_batch_size[] = { [KYBER_READ] = 16, - [KYBER_SYNC_WRITE] = 8, - [KYBER_OTHER] = 8, + [KYBER_WRITE] = 8, + [KYBER_DISCARD] = 1, + [KYBER_OTHER] = 1, +}; + +/* + * Requests latencies are recorded in a histogram with buckets defined relative + * to the target latency: + * + * <= 1/4 * target latency + * <= 1/2 * target latency + * <= 3/4 * target latency + * <= target latency + * <= 1 1/4 * target latency + * <= 1 1/2 * target latency + * <= 1 3/4 * target latency + * > 1 3/4 * target latency + */ +enum { + /* + * The width of the latency histogram buckets is + * 1 / (1 << KYBER_LATENCY_SHIFT) * target latency. + */ + KYBER_LATENCY_SHIFT = 2, + /* + * The first (1 << KYBER_LATENCY_SHIFT) buckets are <= target latency, + * thus, "good". + */ + KYBER_GOOD_BUCKETS = 1 << KYBER_LATENCY_SHIFT, + /* There are also (1 << KYBER_LATENCY_SHIFT) "bad" buckets. */ + KYBER_LATENCY_BUCKETS = 2 << KYBER_LATENCY_SHIFT, +}; + +/* + * We measure both the total latency and the I/O latency (i.e., latency after + * submitting to the device). + */ +enum { + KYBER_TOTAL_LATENCY, + KYBER_IO_LATENCY, +}; + +/* + * Per-cpu latency histograms: total latency and I/O latency for each scheduling + * domain except for KYBER_OTHER. + */ +struct kyber_cpu_latency { + atomic_t buckets[KYBER_OTHER][2][KYBER_LATENCY_BUCKETS]; }; /* @@ -84,14 +144,9 @@ struct kyber_ctx_queue { } ____cacheline_aligned_in_smp; struct kyber_queue_data { - struct request_queue *q; - - struct blk_stat_callback *cb; - /* - * The device is divided into multiple scheduling domains based on the - * request type. Each domain has a fixed number of in-flight requests of - * that type device-wide, limited by these tokens. + * Each scheduling domain has a limited number of in-flight requests + * device-wide, limited by these tokens. */ struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS]; @@ -101,8 +156,19 @@ struct kyber_queue_data { */ unsigned int async_depth; + struct kyber_cpu_latency __percpu *cpu_latency; + + /* Timer for stats aggregation and adjusting domain tokens. */ + struct timer_list timer; + + unsigned int latency_buckets[KYBER_OTHER][2][KYBER_LATENCY_BUCKETS]; + + unsigned long latency_timeout[KYBER_OTHER]; + + int domain_p99[KYBER_OTHER]; + /* Target latencies in nanoseconds. */ - u64 read_lat_nsec, write_lat_nsec; + u64 latency_targets[KYBER_OTHER]; }; struct kyber_hctx_data { @@ -122,182 +188,165 @@ static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags, static unsigned int kyber_sched_domain(unsigned int op) { - if ((op & REQ_OP_MASK) == REQ_OP_READ) + switch (op & REQ_OP_MASK) { + case REQ_OP_READ: return KYBER_READ; - else if ((op & REQ_OP_MASK) == REQ_OP_WRITE && op_is_sync(op)) - return KYBER_SYNC_WRITE; - else + case REQ_OP_WRITE: + return KYBER_WRITE; + case REQ_OP_DISCARD: + return KYBER_DISCARD; + default: return KYBER_OTHER; + } } -enum { - NONE = 0, - GOOD = 1, - GREAT = 2, - BAD = -1, - AWFUL = -2, -}; - -#define IS_GOOD(status) ((status) > 0) -#define IS_BAD(status) ((status) < 0) - -static int kyber_lat_status(struct blk_stat_callback *cb, - unsigned int sched_domain, u64 target) +static void flush_latency_buckets(struct kyber_queue_data *kqd, + struct kyber_cpu_latency *cpu_latency, + unsigned int sched_domain, unsigned int type) { - u64 latency; - - if (!cb->stat[sched_domain].nr_samples) - return NONE; + unsigned int *buckets = kqd->latency_buckets[sched_domain][type]; + atomic_t *cpu_buckets = cpu_latency->buckets[sched_domain][type]; + unsigned int bucket; - latency = cb->stat[sched_domain].mean; - if (latency >= 2 * target) - return AWFUL; - else if (latency > target) - return BAD; - else if (latency <= target / 2) - return GREAT; - else /* (latency <= target) */ - return GOOD; + for (bucket = 0; bucket < KYBER_LATENCY_BUCKETS; bucket++) + buckets[bucket] += atomic_xchg(&cpu_buckets[bucket], 0); } /* - * Adjust the read or synchronous write depth given the status of reads and - * writes. The goal is that the latencies of the two domains are fair (i.e., if - * one is good, then the other is good). + * Calculate the histogram bucket with the given percentile rank, or -1 if there + * aren't enough samples yet. */ -static void kyber_adjust_rw_depth(struct kyber_queue_data *kqd, - unsigned int sched_domain, int this_status, - int other_status) +static int calculate_percentile(struct kyber_queue_data *kqd, + unsigned int sched_domain, unsigned int type, + unsigned int percentile) { - unsigned int orig_depth, depth; + unsigned int *buckets = kqd->latency_buckets[sched_domain][type]; + unsigned int bucket, samples = 0, percentile_samples; + + for (bucket = 0; bucket < KYBER_LATENCY_BUCKETS; bucket++) + samples += buckets[bucket]; + + if (!samples) + return -1; /* - * If this domain had no samples, or reads and writes are both good or - * both bad, don't adjust the depth. + * We do the calculation once we have 500 samples or one second passes + * since the first sample was recorded, whichever comes first. */ - if (this_status == NONE || - (IS_GOOD(this_status) && IS_GOOD(other_status)) || - (IS_BAD(this_status) && IS_BAD(other_status))) - return; - - orig_depth = depth = kqd->domain_tokens[sched_domain].sb.depth; + if (!kqd->latency_timeout[sched_domain]) + kqd->latency_timeout[sched_domain] = max(jiffies + HZ, 1UL); + if (samples < 500 && + time_is_after_jiffies(kqd->latency_timeout[sched_domain])) { + return -1; + } + kqd->latency_timeout[sched_domain] = 0; - if (other_status == NONE) { - depth++; - } else { - switch (this_status) { - case GOOD: - if (other_status == AWFUL) - depth -= max(depth / 4, 1U); - else - depth -= max(depth / 8, 1U); - break; - case GREAT: - if (other_status == AWFUL) - depth /= 2; - else - depth -= max(depth / 4, 1U); + percentile_samples = DIV_ROUND_UP(samples * percentile, 100); + for (bucket = 0; bucket < KYBER_LATENCY_BUCKETS - 1; bucket++) { + if (buckets[bucket] >= percentile_samples) break; - case BAD: - depth++; - break; - case AWFUL: - if (other_status == GREAT) - depth += 2; - else - depth++; - break; - } + percentile_samples -= buckets[bucket]; } + memset(buckets, 0, sizeof(kqd->latency_buckets[sched_domain][type])); + return bucket; +} + +static void kyber_resize_domain(struct kyber_queue_data *kqd, + unsigned int sched_domain, unsigned int depth) +{ depth = clamp(depth, 1U, kyber_depth[sched_domain]); - if (depth != orig_depth) + if (depth != kqd->domain_tokens[sched_domain].sb.depth) sbitmap_queue_resize(&kqd->domain_tokens[sched_domain], depth); } -/* - * Adjust the depth of other requests given the status of reads and synchronous - * writes. As long as either domain is doing fine, we don't throttle, but if - * both domains are doing badly, we throttle heavily. - */ -static void kyber_adjust_other_depth(struct kyber_queue_data *kqd, - int read_status, int write_status, - bool have_samples) -{ - unsigned int orig_depth, depth; - int status; - - orig_depth = depth = kqd->domain_tokens[KYBER_OTHER].sb.depth; - - if (read_status == NONE && write_status == NONE) { - depth += 2; - } else if (have_samples) { - if (read_status == NONE) - status = write_status; - else if (write_status == NONE) - status = read_status; - else - status = max(read_status, write_status); - switch (status) { - case GREAT: - depth += 2; - break; - case GOOD: - depth++; - break; - case BAD: - depth -= max(depth / 4, 1U); - break; - case AWFUL: - depth /= 2; - break; +static void kyber_timer_fn(struct timer_list *t) +{ + struct kyber_queue_data *kqd = from_timer(kqd, t, timer); + unsigned int sched_domain; + int cpu; + bool bad = false; + + /* Sum all of the per-cpu latency histograms. */ + for_each_online_cpu(cpu) { + struct kyber_cpu_latency *cpu_latency; + + cpu_latency = per_cpu_ptr(kqd->cpu_latency, cpu); + for (sched_domain = 0; sched_domain < KYBER_OTHER; sched_domain++) { + flush_latency_buckets(kqd, cpu_latency, sched_domain, + KYBER_TOTAL_LATENCY); + flush_latency_buckets(kqd, cpu_latency, sched_domain, + KYBER_IO_LATENCY); } } - depth = clamp(depth, 1U, kyber_depth[KYBER_OTHER]); - if (depth != orig_depth) - sbitmap_queue_resize(&kqd->domain_tokens[KYBER_OTHER], depth); -} - -/* - * Apply heuristics for limiting queue depths based on gathered latency - * statistics. - */ -static void kyber_stat_timer_fn(struct blk_stat_callback *cb) -{ - struct kyber_queue_data *kqd = cb->data; - int read_status, write_status; - - read_status = kyber_lat_status(cb, KYBER_READ, kqd->read_lat_nsec); - write_status = kyber_lat_status(cb, KYBER_SYNC_WRITE, kqd->write_lat_nsec); + /* + * Check if any domains have a high I/O latency, which might indicate + * congestion in the device. Note that we use the p90; we don't want to + * be too sensitive to outliers here. + */ + for (sched_domain = 0; sched_domain < KYBER_OTHER; sched_domain++) { + int p90; - kyber_adjust_rw_depth(kqd, KYBER_READ, read_status, write_status); - kyber_adjust_rw_depth(kqd, KYBER_SYNC_WRITE, write_status, read_status); - kyber_adjust_other_depth(kqd, read_status, write_status, - cb->stat[KYBER_OTHER].nr_samples != 0); + p90 = calculate_percentile(kqd, sched_domain, KYBER_IO_LATENCY, + 90); + if (p90 >= KYBER_GOOD_BUCKETS) + bad = true; + } /* - * Continue monitoring latencies if we aren't hitting the targets or - * we're still throttling other requests. + * Adjust the scheduling domain depths. If we determined that there was + * congestion, we throttle all domains with good latencies. Either way, + * we ease up on throttling domains with bad latencies. */ - if (!blk_stat_is_active(kqd->cb) && - ((IS_BAD(read_status) || IS_BAD(write_status) || - kqd->domain_tokens[KYBER_OTHER].sb.depth < kyber_depth[KYBER_OTHER]))) - blk_stat_activate_msecs(kqd->cb, 100); + for (sched_domain = 0; sched_domain < KYBER_OTHER; sched_domain++) { + unsigned int orig_depth, depth; + int p99; + + p99 = calculate_percentile(kqd, sched_domain, + KYBER_TOTAL_LATENCY, 99); + /* + * This is kind of subtle: different domains will not + * necessarily have enough samples to calculate the latency + * percentiles during the same window, so we have to remember + * the p99 for the next time we observe congestion; once we do, + * we don't want to throttle again until we get more data, so we + * reset it to -1. + */ + if (bad) { + if (p99 < 0) + p99 = kqd->domain_p99[sched_domain]; + kqd->domain_p99[sched_domain] = -1; + } else if (p99 >= 0) { + kqd->domain_p99[sched_domain] = p99; + } + if (p99 < 0) + continue; + + /* + * If this domain has bad latency, throttle less. Otherwise, + * throttle more iff we determined that there is congestion. + * + * The new depth is scaled linearly with the p99 latency vs the + * latency target. E.g., if the p99 is 3/4 of the target, then + * we throttle down to 3/4 of the current depth, and if the p99 + * is 2x the target, then we double the depth. + */ + if (bad || p99 >= KYBER_GOOD_BUCKETS) { + orig_depth = kqd->domain_tokens[sched_domain].sb.depth; + depth = (orig_depth * (p99 + 1)) >> KYBER_LATENCY_SHIFT; + kyber_resize_domain(kqd, sched_domain, depth); + } + } } -static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd) +static unsigned int kyber_sched_tags_shift(struct request_queue *q) { /* * All of the hardware queues have the same depth, so we can just grab * the shift of the first one. */ - return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift; -} - -static int kyber_bucket_fn(const struct request *rq) -{ - return kyber_sched_domain(rq->cmd_flags); + return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift; } static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) @@ -307,16 +356,17 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) int ret = -ENOMEM; int i; - kqd = kmalloc_node(sizeof(*kqd), GFP_KERNEL, q->node); + kqd = kzalloc_node(sizeof(*kqd), GFP_KERNEL, q->node); if (!kqd) goto err; - kqd->q = q; - kqd->cb = blk_stat_alloc_callback(kyber_stat_timer_fn, kyber_bucket_fn, - KYBER_NUM_DOMAINS, kqd); - if (!kqd->cb) + kqd->cpu_latency = alloc_percpu_gfp(struct kyber_cpu_latency, + GFP_KERNEL | __GFP_ZERO); + if (!kqd->cpu_latency) goto err_kqd; + timer_setup(&kqd->timer, kyber_timer_fn, 0); + for (i = 0; i < KYBER_NUM_DOMAINS; i++) { WARN_ON(!kyber_depth[i]); WARN_ON(!kyber_batch_size[i]); @@ -326,20 +376,22 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) if (ret) { while (--i >= 0) sbitmap_queue_free(&kqd->domain_tokens[i]); - goto err_cb; + goto err_buckets; } } - shift = kyber_sched_tags_shift(kqd); - kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U; + for (i = 0; i < KYBER_OTHER; i++) { + kqd->domain_p99[i] = -1; + kqd->latency_targets[i] = kyber_latency_targets[i]; + } - kqd->read_lat_nsec = 2000000ULL; - kqd->write_lat_nsec = 10000000ULL; + shift = kyber_sched_tags_shift(q); + kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U; return kqd; -err_cb: - blk_stat_free_callback(kqd->cb); +err_buckets: + free_percpu(kqd->cpu_latency); err_kqd: kfree(kqd); err: @@ -361,25 +413,24 @@ static int kyber_init_sched(struct request_queue *q, struct elevator_type *e) return PTR_ERR(kqd); } + blk_stat_enable_accounting(q); + eq->elevator_data = kqd; q->elevator = eq; - blk_stat_add_callback(q, kqd->cb); - return 0; } static void kyber_exit_sched(struct elevator_queue *e) { struct kyber_queue_data *kqd = e->elevator_data; - struct request_queue *q = kqd->q; int i; - blk_stat_remove_callback(q, kqd->cb); + del_timer_sync(&kqd->timer); for (i = 0; i < KYBER_NUM_DOMAINS; i++) sbitmap_queue_free(&kqd->domain_tokens[i]); - blk_stat_free_callback(kqd->cb); + free_percpu(kqd->cpu_latency); kfree(kqd); } @@ -547,40 +598,44 @@ static void kyber_finish_request(struct request *rq) rq_clear_domain_token(kqd, rq); } -static void kyber_completed_request(struct request *rq, u64 now) +static void add_latency_sample(struct kyber_cpu_latency *cpu_latency, + unsigned int sched_domain, unsigned int type, + u64 target, u64 latency) { - struct request_queue *q = rq->q; - struct kyber_queue_data *kqd = q->elevator->elevator_data; - unsigned int sched_domain; - u64 latency, target; + unsigned int bucket; + u64 divisor; - /* - * Check if this request met our latency goal. If not, quickly gather - * some statistics and start throttling. - */ - sched_domain = kyber_sched_domain(rq->cmd_flags); - switch (sched_domain) { - case KYBER_READ: - target = kqd->read_lat_nsec; - break; - case KYBER_SYNC_WRITE: - target = kqd->write_lat_nsec; - break; - default: - return; + if (latency > 0) { + divisor = max_t(u64, target >> KYBER_LATENCY_SHIFT, 1); + bucket = min_t(unsigned int, div64_u64(latency - 1, divisor), + KYBER_LATENCY_BUCKETS - 1); + } else { + bucket = 0; } - /* If we are already monitoring latencies, don't check again. */ - if (blk_stat_is_active(kqd->cb)) - return; + atomic_inc(&cpu_latency->buckets[sched_domain][type][bucket]); +} - if (now < rq->io_start_time_ns) +static void kyber_completed_request(struct request *rq, u64 now) +{ + struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; + struct kyber_cpu_latency *cpu_latency; + unsigned int sched_domain; + u64 target; + + sched_domain = kyber_sched_domain(rq->cmd_flags); + if (sched_domain == KYBER_OTHER) return; - latency = now - rq->io_start_time_ns; + cpu_latency = get_cpu_ptr(kqd->cpu_latency); + target = kqd->latency_targets[sched_domain]; + add_latency_sample(cpu_latency, sched_domain, KYBER_TOTAL_LATENCY, + target, now - rq->start_time_ns); + add_latency_sample(cpu_latency, sched_domain, KYBER_IO_LATENCY, target, + now - rq->io_start_time_ns); + put_cpu_ptr(kqd->cpu_latency); - if (latency > target) - blk_stat_activate_msecs(kqd->cb, 10); + timer_reduce(&kqd->timer, jiffies + HZ / 10); } struct flush_kcq_data { @@ -778,17 +833,17 @@ static bool kyber_has_work(struct blk_mq_hw_ctx *hctx) return false; } -#define KYBER_LAT_SHOW_STORE(op) \ -static ssize_t kyber_##op##_lat_show(struct elevator_queue *e, \ - char *page) \ +#define KYBER_LAT_SHOW_STORE(domain, name) \ +static ssize_t kyber_##name##_lat_show(struct elevator_queue *e, \ + char *page) \ { \ struct kyber_queue_data *kqd = e->elevator_data; \ \ - return sprintf(page, "%llu\n", kqd->op##_lat_nsec); \ + return sprintf(page, "%llu\n", kqd->latency_targets[domain]); \ } \ \ -static ssize_t kyber_##op##_lat_store(struct elevator_queue *e, \ - const char *page, size_t count) \ +static ssize_t kyber_##name##_lat_store(struct elevator_queue *e, \ + const char *page, size_t count) \ { \ struct kyber_queue_data *kqd = e->elevator_data; \ unsigned long long nsec; \ @@ -798,12 +853,12 @@ static ssize_t kyber_##op##_lat_store(struct elevator_queue *e, \ if (ret) \ return ret; \ \ - kqd->op##_lat_nsec = nsec; \ + kqd->latency_targets[domain] = nsec; \ \ return count; \ } -KYBER_LAT_SHOW_STORE(read); -KYBER_LAT_SHOW_STORE(write); +KYBER_LAT_SHOW_STORE(KYBER_READ, read); +KYBER_LAT_SHOW_STORE(KYBER_WRITE, write); #undef KYBER_LAT_SHOW_STORE #define KYBER_LAT_ATTR(op) __ATTR(op##_lat_nsec, 0644, kyber_##op##_lat_show, kyber_##op##_lat_store) @@ -870,7 +925,8 @@ static int kyber_##name##_waiting_show(void *data, struct seq_file *m) \ return 0; \ } KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_READ, read) -KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_SYNC_WRITE, sync_write) +KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_WRITE, write) +KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_DISCARD, discard) KYBER_DEBUGFS_DOMAIN_ATTRS(KYBER_OTHER, other) #undef KYBER_DEBUGFS_DOMAIN_ATTRS @@ -892,8 +948,11 @@ static int kyber_cur_domain_show(void *data, struct seq_file *m) case KYBER_READ: seq_puts(m, "READ\n"); break; - case KYBER_SYNC_WRITE: - seq_puts(m, "SYNC_WRITE\n"); + case KYBER_WRITE: + seq_puts(m, "WRITE\n"); + break; + case KYBER_DISCARD: + seq_puts(m, "DISCARD\n"); break; case KYBER_OTHER: seq_puts(m, "OTHER\n"); @@ -918,7 +977,8 @@ static int kyber_batching_show(void *data, struct seq_file *m) {#name "_tokens", 0400, kyber_##name##_tokens_show} static const struct blk_mq_debugfs_attr kyber_queue_debugfs_attrs[] = { KYBER_QUEUE_DOMAIN_ATTRS(read), - KYBER_QUEUE_DOMAIN_ATTRS(sync_write), + KYBER_QUEUE_DOMAIN_ATTRS(write), + KYBER_QUEUE_DOMAIN_ATTRS(discard), KYBER_QUEUE_DOMAIN_ATTRS(other), {"async_depth", 0400, kyber_async_depth_show}, {}, @@ -930,7 +990,8 @@ static const struct blk_mq_debugfs_attr kyber_queue_debugfs_attrs[] = { {#name "_waiting", 0400, kyber_##name##_waiting_show} static const struct blk_mq_debugfs_attr kyber_hctx_debugfs_attrs[] = { KYBER_HCTX_DOMAIN_ATTRS(read), - KYBER_HCTX_DOMAIN_ATTRS(sync_write), + KYBER_HCTX_DOMAIN_ATTRS(write), + KYBER_HCTX_DOMAIN_ATTRS(discard), KYBER_HCTX_DOMAIN_ATTRS(other), {"cur_domain", 0400, kyber_cur_domain_show}, {"batching", 0400, kyber_batching_show}, From patchwork Thu Sep 27 22:55:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10618725 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54C95174A for ; Thu, 27 Sep 2018 22:56:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 477BC2B54B for ; Thu, 27 Sep 2018 22:56:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C4632B56B; Thu, 27 Sep 2018 22:56:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B9782B54B for ; Thu, 27 Sep 2018 22:56:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726112AbeI1FRN (ORCPT ); Fri, 28 Sep 2018 01:17:13 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:39637 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726083AbeI1FRN (ORCPT ); Fri, 28 Sep 2018 01:17:13 -0400 Received: by mail-pf1-f195.google.com with SMTP id j8-v6so2907979pff.6 for ; Thu, 27 Sep 2018 15:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xSSuNsxRBf8fQYn8zCydJrexGXmBXF0jRMn5h3UTUUw=; b=UK86474vek99qP9rxSiVUs11L5R8/fo7QsBzALxzQ41wruq1f01Htmo4hyGKOhqxX0 hU0vMIiEAZLh16ojtatpIUJsJQKUd/jLDaL5wI6EtZSPWI0Yc9EuRxFj4R86EX4j4cuB DhuhDG5fsrLeBtpUQuc0GJsOFJ2dQa254T7MBtJUjbjZvyc2M0kuhQ5nqBFosk2ML101 1qDBAhE8Cdy/to6j/O3hCnCXeSlgV29xg62blmWFmKlVOfyZstlyNG+Bego59dxcSGx9 E50I2i2hY5VSzqydPupKHZFjizqajRkz74/SGL48nDV+oF6NmRnXuZo+QQdEBSVtBWdC 9IrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xSSuNsxRBf8fQYn8zCydJrexGXmBXF0jRMn5h3UTUUw=; b=NNxooTEY8ZJXQKTMREKMEN+LcHFkI2e+6VvR3CHDSMihWkvldS25qglvsMoiTC1V1+ S/BHI6GAy8BeOG+rlJVltBZVayrv5croLljiwIJ6Bz6WYDWlWrkxU8g7K/82RuvGTJlU 6+ALoGjl+q9leM2JoQ04E9MVFJAaTS6dOugkeTsuzPZFEIu99oeVkqCWG0BuwOc9Y3q+ DB+5XqQ+uq0q9/RKti8RcuiOeOKaaIyv+rcp/fbw9EcF3OBePkIlGVxQwxKAvTcGIKP9 /ocDn66ZnoZoBoCiUrFB/nlTOvCMRPnUOjaGa0rhG1CAcQ/rnKAozr8zLSTHox7ahnVw 4NNg== X-Gm-Message-State: ABuFfoiy/37x47yyREGWHTEdrYqj2cZ/J6j9oHD3NG8x/hZMiZgBMYKK TXPl/MI6OSyFjH42XsDg83kk1UZq+ao= X-Google-Smtp-Source: ACcGV63XXLxoSjJCA8cOhtvRjyeJP42iUuuNSj/TS7+RVKB64I+seLoBmGy42BoBEixJy0Ksawo2Jw== X-Received: by 2002:a62:c4c5:: with SMTP id h66-v6mr13643841pfk.39.1538088994369; Thu, 27 Sep 2018 15:56:34 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::5:3e64]) by smtp.gmail.com with ESMTPSA id t1-v6sm3716721pgp.32.2018.09.27.15.56.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 15:56:33 -0700 (PDT) From: Omar Sandoval To: linux-block@vger.kernel.org Cc: Jens Axboe , kernel-team@fb.com Subject: [PATCH v2 5/5] kyber: add tracepoints Date: Thu, 27 Sep 2018 15:55:55 -0700 Message-Id: X-Mailer: git-send-email 2.19.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval When debugging Kyber, it's really useful to know what latencies we've been having, how the domain depths have been adjusted, and if we've actually been throttling. Add three tracepoints, kyber_latency, kyber_adjust, and kyber_throttled, to record that. Signed-off-by: Omar Sandoval --- block/kyber-iosched.c | 52 ++++++++++++------- include/trace/events/kyber.h | 96 ++++++++++++++++++++++++++++++++++++ 2 files changed, 130 insertions(+), 18 deletions(-) create mode 100644 include/trace/events/kyber.h diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index adc8e6393829..2b62e362fb36 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -30,6 +30,9 @@ #include "blk-mq-sched.h" #include "blk-mq-tag.h" +#define CREATE_TRACE_POINTS +#include + /* * Scheduling domains: the device is divided into multiple domains based on the * request type. @@ -42,6 +45,13 @@ enum { KYBER_NUM_DOMAINS, }; +static const char *kyber_domain_names[] = { + [KYBER_READ] = "READ", + [KYBER_WRITE] = "WRITE", + [KYBER_DISCARD] = "DISCARD", + [KYBER_OTHER] = "OTHER", +}; + enum { /* * In order to prevent starvation of synchronous requests by a flood of @@ -122,6 +132,11 @@ enum { KYBER_IO_LATENCY, }; +static const char *kyber_latency_type_names[] = { + [KYBER_TOTAL_LATENCY] = "total", + [KYBER_IO_LATENCY] = "I/O", +}; + /* * Per-cpu latency histograms: total latency and I/O latency for each scheduling * domain except for KYBER_OTHER. @@ -144,6 +159,8 @@ struct kyber_ctx_queue { } ____cacheline_aligned_in_smp; struct kyber_queue_data { + struct request_queue *q; + /* * Each scheduling domain has a limited number of in-flight requests * device-wide, limited by these tokens. @@ -249,6 +266,10 @@ static int calculate_percentile(struct kyber_queue_data *kqd, } memset(buckets, 0, sizeof(kqd->latency_buckets[sched_domain][type])); + trace_kyber_latency(kqd->q, kyber_domain_names[sched_domain], + kyber_latency_type_names[type], percentile, + bucket + 1, 1 << KYBER_LATENCY_SHIFT, samples); + return bucket; } @@ -256,8 +277,11 @@ static void kyber_resize_domain(struct kyber_queue_data *kqd, unsigned int sched_domain, unsigned int depth) { depth = clamp(depth, 1U, kyber_depth[sched_domain]); - if (depth != kqd->domain_tokens[sched_domain].sb.depth) + if (depth != kqd->domain_tokens[sched_domain].sb.depth) { sbitmap_queue_resize(&kqd->domain_tokens[sched_domain], depth); + trace_kyber_adjust(kqd->q, kyber_domain_names[sched_domain], + depth); + } } static void kyber_timer_fn(struct timer_list *t) @@ -360,6 +384,8 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) if (!kqd) goto err; + kqd->q = q; + kqd->cpu_latency = alloc_percpu_gfp(struct kyber_cpu_latency, GFP_KERNEL | __GFP_ZERO); if (!kqd->cpu_latency) @@ -756,6 +782,9 @@ kyber_dispatch_cur_domain(struct kyber_queue_data *kqd, rq_set_domain_token(rq, nr); list_del_init(&rq->queuelist); return rq; + } else { + trace_kyber_throttled(kqd->q, + kyber_domain_names[khd->cur_domain]); } } else if (sbitmap_any_bit_set(&khd->kcq_map[khd->cur_domain])) { nr = kyber_get_domain_token(kqd, khd, hctx); @@ -766,6 +795,9 @@ kyber_dispatch_cur_domain(struct kyber_queue_data *kqd, rq_set_domain_token(rq, nr); list_del_init(&rq->queuelist); return rq; + } else { + trace_kyber_throttled(kqd->q, + kyber_domain_names[khd->cur_domain]); } } @@ -944,23 +976,7 @@ static int kyber_cur_domain_show(void *data, struct seq_file *m) struct blk_mq_hw_ctx *hctx = data; struct kyber_hctx_data *khd = hctx->sched_data; - switch (khd->cur_domain) { - case KYBER_READ: - seq_puts(m, "READ\n"); - break; - case KYBER_WRITE: - seq_puts(m, "WRITE\n"); - break; - case KYBER_DISCARD: - seq_puts(m, "DISCARD\n"); - break; - case KYBER_OTHER: - seq_puts(m, "OTHER\n"); - break; - default: - seq_printf(m, "%u\n", khd->cur_domain); - break; - } + seq_printf(m, "%s\n", kyber_domain_names[khd->cur_domain]); return 0; } diff --git a/include/trace/events/kyber.h b/include/trace/events/kyber.h new file mode 100644 index 000000000000..a9834c37ac40 --- /dev/null +++ b/include/trace/events/kyber.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM kyber + +#if !defined(_TRACE_KYBER_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_KYBER_H + +#include +#include + +#define DOMAIN_LEN 16 +#define LATENCY_TYPE_LEN 8 + +TRACE_EVENT(kyber_latency, + + TP_PROTO(struct request_queue *q, const char *domain, const char *type, + unsigned int percentile, unsigned int numerator, + unsigned int denominator, unsigned int samples), + + TP_ARGS(q, domain, type, percentile, numerator, denominator, samples), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __array( char, domain, DOMAIN_LEN ) + __array( char, type, LATENCY_TYPE_LEN ) + __field( u8, percentile ) + __field( u8, numerator ) + __field( u8, denominator ) + __field( unsigned int, samples ) + ), + + TP_fast_assign( + __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + strlcpy(__entry->domain, domain, DOMAIN_LEN); + strlcpy(__entry->type, type, DOMAIN_LEN); + __entry->percentile = percentile; + __entry->numerator = numerator; + __entry->denominator = denominator; + __entry->samples = samples; + ), + + TP_printk("%d,%d %s %s p%u %u/%u samples=%u", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->domain, + __entry->type, __entry->percentile, __entry->numerator, + __entry->denominator, __entry->samples) +); + +TRACE_EVENT(kyber_adjust, + + TP_PROTO(struct request_queue *q, const char *domain, + unsigned int depth), + + TP_ARGS(q, domain, depth), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __array( char, domain, DOMAIN_LEN ) + __field( unsigned int, depth ) + ), + + TP_fast_assign( + __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + strlcpy(__entry->domain, domain, DOMAIN_LEN); + __entry->depth = depth; + ), + + TP_printk("%d,%d %s %u", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->domain, + __entry->depth) +); + +TRACE_EVENT(kyber_throttled, + + TP_PROTO(struct request_queue *q, const char *domain), + + TP_ARGS(q, domain), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __array( char, domain, DOMAIN_LEN ) + ), + + TP_fast_assign( + __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + strlcpy(__entry->domain, domain, DOMAIN_LEN); + ), + + TP_printk("%d,%d %s", MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->domain) +); + +#define _TRACE_KYBER_H +#endif /* _TRACE_KYBER_H */ + +/* This part must be outside protection */ +#include