From patchwork Mon Mar 27 22:19:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaohua Li X-Patchwork-Id: 9647705 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id F19A4602BF for ; Mon, 27 Mar 2017 23:14:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E2AF727F81 for ; Mon, 27 Mar 2017 23:14:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D7475282E8; Mon, 27 Mar 2017 23:14:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5CDBC27F81 for ; Mon, 27 Mar 2017 23:14:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752722AbdC0XO3 (ORCPT ); Mon, 27 Mar 2017 19:14:29 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:60068 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752657AbdC0XO2 (ORCPT ); Mon, 27 Mar 2017 19:14:28 -0400 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v2RMJPli022266 for ; Mon, 27 Mar 2017 15:19:46 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=l0KxTrfA1f9UOmbNzWnVubqsGzbXo2CAADJ6B4FAzno=; b=T2aXcnCS20lzn0zr0+htxYeXXlX3hqITPg74vvz0iEeKyASs80f5CI9U3AJ9HI9zqODN 5R97/ADLiZU1487KfqM+cj6gCbMWwbnn6STPw3Rk5TD8E8ExULVJsAAxNFFIeUeBCyOd 1/G4oe9PS+gu7SsupA02A34i9wOCXslqgBk= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 29fas0r5b2-3 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Mon, 27 Mar 2017 15:19:46 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB01.TheFacebook.com (192.168.16.11) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 27 Mar 2017 15:19:45 -0700 Received: from facebook.com (2401:db00:21:603d:face:0:19:0) by mx-out.facebook.com (10.212.232.63) with ESMTP id 7ace9f80133b11e7916f0002c992ebde-391fc9a0 for ; Mon, 27 Mar 2017 15:19:44 -0700 Received: by devbig638.prn2.facebook.com (Postfix, from userid 11222) id 8384D43A3BC7; Mon, 27 Mar 2017 15:19:43 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Shaohua Li Smtp-Origin-Hostname: devbig638.prn2.facebook.com To: , CC: , , Vivek Goyal , , Smtp-Origin-Cluster: prn2c22 Subject: [PATCH 3/3] blk-throttle: add latency target support Date: Mon, 27 Mar 2017 15:19:43 -0700 Message-ID: <81425b424868c7054893cf34eb6106ecaa09c6a6.1490651903.git.shli@fb.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: References: <567d5361-7d6b-c53e-8ada-a2966e48dc54@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-03-27_20:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP One hard problem adding .low limit is to detect idle cgroup. If one cgroup doesn't dispatch enough IO against its low limit, we must have a mechanism to determine if other cgroups dispatch more IO. We added the think time detection mechanism before, but it doesn't work for all workloads. Here we add a latency based approach. We already have mechanism to calculate latency threshold for each IO size. For every IO dispatched from a cgorup, we compare its latency against its threshold and record the info. If most IO latency is below threshold (in the code I use 75%), the cgroup could be treated idle and other cgroups can dispatch more IO. Currently this latency target check is only for SSD as we can't calcualte the latency target for hard disk. And this is only for cgroup leaf node so far. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 39 +++++++++++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 140da29..c82bf9b 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -165,6 +165,10 @@ struct throtl_grp { unsigned long checked_last_finish_time; /* ns / 1024 */ unsigned long avg_idletime; /* ns / 1024 */ unsigned long idletime_threshold; /* us */ + + unsigned int bio_cnt; /* total bios */ + unsigned int bad_bio_cnt; /* bios exceeding latency threshold */ + unsigned long bio_cnt_reset_time; }; /* We measure latency for request size from <= 4k to >= 1M */ @@ -1720,12 +1724,15 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg) * - single idle is too long, longer than a fixed value (in case user * configure a too big threshold) or 4 times of slice * - average think time is more than threshold + * - IO latency is largely below threshold */ unsigned long time = jiffies_to_usecs(4 * tg->td->throtl_slice); time = min_t(unsigned long, MAX_IDLE_TIME, time); return (ktime_get_ns() >> 10) - tg->last_finish_time > time || - tg->avg_idletime > tg->idletime_threshold; + tg->avg_idletime > tg->idletime_threshold || + (tg->latency_target && tg->bio_cnt && + tg->bad_bio_cnt * 5 < tg->bio_cnt); } static bool throtl_tg_can_upgrade(struct throtl_grp *tg) @@ -2194,12 +2201,36 @@ void blk_throtl_bio_endio(struct bio *bio) start_time = blk_stat_time(&bio->bi_issue_stat) >> 10; finish_time = __blk_stat_time(finish_time_ns) >> 10; + if (!start_time || finish_time <= start_time) + return; + + lat = finish_time - start_time; /* this is only for bio based driver */ - if (start_time && finish_time > start_time && - !(bio->bi_issue_stat.stat & SKIP_LATENCY)) { - lat = finish_time - start_time; + if (!(bio->bi_issue_stat.stat & SKIP_LATENCY)) throtl_track_latency(tg->td, blk_stat_size(&bio->bi_issue_stat), bio_op(bio), lat); + + if (tg->latency_target) { + int bucket; + unsigned int threshold; + + bucket = request_bucket_index( + blk_stat_size(&bio->bi_issue_stat)); + threshold = tg->td->avg_buckets[bucket].latency + + tg->latency_target; + if (lat > threshold) + tg->bad_bio_cnt++; + /* + * Not race free, could get wrong count, which means cgroups + * will be throttled + */ + tg->bio_cnt++; + } + + if (time_after(jiffies, tg->bio_cnt_reset_time) || tg->bio_cnt > 1024) { + tg->bio_cnt_reset_time = tg->td->throtl_slice + jiffies; + tg->bio_cnt /= 2; + tg->bad_bio_cnt /= 2; } } #endif