From patchwork Mon Mar 27 17:51:46 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Shaohua Li <shli@fb.com>
X-Patchwork-Id: 9647293
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	D89AA602BF for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 27 Mar 2017 17:54:06 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C6AE6283E1
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 27 Mar 2017 17:54:06 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id BB52D28408; Mon, 27 Mar 2017 17:54:06 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E0FE283E1
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 27 Mar 2017 17:54:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751941AbdC0RyE (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Mon, 27 Mar 2017 13:54:04 -0400
Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:47058 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
	by vger.kernel.org with ESMTP id S1751413AbdC0RxV (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Mon, 27 Mar 2017 13:53:21 -0400
Received: from pps.filterd (m0001303.ppops.net [127.0.0.1])
	by m0001303.ppops.net (8.16.0.20/8.16.0.20) with SMTP id
	v2RHkN1H025152
	for <linux-block@vger.kernel.org>; Mon, 27 Mar 2017 10:51:49 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
	h=from : to : cc : subject
	: date : message-id : in-reply-to : references : mime-version :
	content-type; s=facebook;
	bh=W9VMLedLjBvazES7busWiH89NHdg5TuPM46MpO6gEWk=;
	b=YnqG82GdvF5qrEb5EUuRQon9ljuIYUSVr78GZqosQmfpb0DxarA/MD3h0dyy2e5oW7eg
	3Kh8JSAzO7zM5nsc6nKEzvB8Q727XbNI0xXEbQt3RNN6VZ54x1T1zxOxWuc/QsHXVJWS
	TzsDz7w6qbwauOdxqyiAVWJdd2csPNG3koI=
Received: from mail.thefacebook.com ([199.201.64.23])
	by m0001303.ppops.net with ESMTP id 29dm7bx7sd-5
	(version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <linux-block@vger.kernel.org>; Mon, 27 Mar 2017 10:51:49 -0700
Received: from mx-out.facebook.com (192.168.52.123) by
	PRN-CHUB04.TheFacebook.com (192.168.16.14) with Microsoft SMTP Server
	(TLS) id 14.3.319.2; Mon, 27 Mar 2017 10:51:47 -0700
Received: from facebook.com (2401:db00:21:603d:face:0:19:0)     by
	mx-out.facebook.com (10.102.107.97) with ESMTP id
	0c5cd366131611e7a5920002c99331b0-4bffb9a0 for
	<linux-block@vger.kernel.org>; Mon, 27 Mar 2017 10:51:47 -0700
Received: by devbig638.prn2.facebook.com (Postfix, from userid 11222)   id
	792A343A3BAD; Mon, 27 Mar 2017 10:51:47 -0700 (PDT)
Smtp-Origin-Hostprefix: devbig
From: Shaohua Li <shli@fb.com>
Smtp-Origin-Hostname: devbig638.prn2.facebook.com
To: <linux-kernel@vger.kernel.org>, <linux-block@vger.kernel.org>
CC: <axboe@kernel.dk>, <tj@kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>, <jmoyer@redhat.com>,
	<Kernel-team@fb.com>
Smtp-Origin-Cluster: prn2c22
Subject: [PATCH V7 18/18] blk-throttle: add latency target support
Date: Mon, 27 Mar 2017 10:51:46 -0700
Message-ID: 
 <0d512d4589e1d188a27fea1afd60e5611f1533c9.1490634565.git.shli@fb.com>
X-Mailer: git-send-email 2.9.3
In-Reply-To: <cover.1490634565.git.shli@fb.com>
References: <cover.1490634565.git.shli@fb.com>
X-FB-Internal: Safe
MIME-Version: 1.0
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2017-03-27_16:, , signatures=0
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

One hard problem adding .low limit is to detect idle cgroup. If one
cgroup doesn't dispatch enough IO against its low limit, we must have a
mechanism to determine if other cgroups dispatch more IO. We added the
think time detection mechanism before, but it doesn't work for all
workloads. Here we add a latency based approach.

We already have mechanism to calculate latency threshold for each IO
size. For every IO dispatched from a cgorup, we compare its latency
against its threshold and record the info. If most IO latency is below
threshold (in the code I use 75%), the cgroup could be treated idle and
other cgroups can dispatch more IO.

Currently this latency target check is only for SSD as we can't
calcualte the latency target for hard disk. And this is only for cgroup
leaf node so far.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 block/blk-throttle.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 4b9c6a1..e506d94 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -170,6 +170,10 @@ struct throtl_grp {
 	unsigned long checked_last_finish_time; /* ns / 1024 */
 	unsigned long avg_idletime; /* ns / 1024 */
 	unsigned long idletime_threshold; /* us */
+
+	unsigned int bio_cnt; /* total bios */
+	unsigned int bad_bio_cnt; /* bios exceeding latency threshold */
+	unsigned long bio_cnt_reset_time;
 };
 
 /* We measure latency for request size from <= 4k to >= 1M */
@@ -1725,12 +1729,15 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
 	 * - single idle is too long, longer than a fixed value (in case user
 	 *   configure a too big threshold) or 4 times of slice
 	 * - average think time is more than threshold
+	 * - IO latency is largely below threshold
 	 */
 	unsigned long time = jiffies_to_usecs(4 * tg->td->throtl_slice);
 
 	time = min_t(unsigned long, MAX_IDLE_TIME, time);
 	return (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
-	       tg->avg_idletime > tg->idletime_threshold;
+	       tg->avg_idletime > tg->idletime_threshold ||
+	       (tg->latency_target && tg->bio_cnt &&
+		tg->bad_bio_cnt * 5 < tg->bio_cnt);
 }
 
 static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
@@ -2211,12 +2218,36 @@ void blk_throtl_bio_endio(struct bio *bio)
 
 	start_time = THROTL_STAT_TIME(bio->bi_throtl_stat) >> 10;
 	finish_time = THROTL_STAT_TIME(finish_time_ns) >> 10;
-	if (start_time && finish_time > start_time &&
-	    !(bio->bi_throtl_stat & THROTL_SKIP_LAT)) {
-		lat = finish_time - start_time;
+	if (!start_time || finish_time <= start_time)
+		return;
+
+	lat = finish_time - start_time;
+	if (!(bio->bi_throtl_stat & THROTL_SKIP_LAT))
 		throtl_track_latency(tg->td,
 			THROTL_STAT_SIZE(bio->bi_throtl_stat),
 			bio_op(bio), lat);
+
+	if (tg->latency_target) {
+		int bucket;
+		unsigned int threshold;
+
+		bucket = request_bucket_index(
+			THROTL_STAT_SIZE(bio->bi_throtl_stat));
+		threshold = tg->td->avg_buckets[bucket].latency +
+			tg->latency_target;
+		if (lat > threshold)
+			tg->bad_bio_cnt++;
+		/*
+		 * Not race free, could get wrong count, which means cgroups
+		 * will be throttled
+		 */
+		tg->bio_cnt++;
+	}
+
+	if (time_after(jiffies, tg->bio_cnt_reset_time) || tg->bio_cnt > 1024) {
+		tg->bio_cnt_reset_time = tg->td->throtl_slice + jiffies;
+		tg->bio_cnt /= 2;
+		tg->bad_bio_cnt /= 2;
 	}
 }
 #endif