From patchwork Mon Mar 27 17:51:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaohua Li X-Patchwork-Id: 9647351 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8F264601D7 for ; Mon, 27 Mar 2017 18:35:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 84BEE283FF for ; Mon, 27 Mar 2017 18:35:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7741728408; Mon, 27 Mar 2017 18:35:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD2E3283FF for ; Mon, 27 Mar 2017 18:35:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751522AbdC0SfN (ORCPT ); Mon, 27 Mar 2017 14:35:13 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:35566 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751287AbdC0SfN (ORCPT ); Mon, 27 Mar 2017 14:35:13 -0400 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v2RHjBlT015183 for ; Mon, 27 Mar 2017 10:51:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=ehzrqQqehK0XjBfFn8BC+OdUvPU+g2kyEL2AlxR7bLw=; b=m/Ca91LBuE70Jy+M3n5e5ZjzBZFYYW9Waz4ylg1xL+RL3TuXLatvQDspa1iP0k39Kgzw 8pA4QS0eOXeqYGb3gE2KgJxgt4Fy1U3yHPXQ9/Qxt6nWgOJQNty3L34MHVW2sK9pfFtf t75oKLz5gOxrZGyvf+ZKbyn+kG9OvyT3UvE= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 29f4eu0wrm-14 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Mon, 27 Mar 2017 10:51:50 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB10.TheFacebook.com (192.168.16.20) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 27 Mar 2017 10:51:48 -0700 Received: from facebook.com (2401:db00:21:603d:face:0:19:0) by mx-out.facebook.com (10.222.219.45) with ESMTP id 0ca80b88131611e78d0224be05904660-ec7f59a0 for ; Mon, 27 Mar 2017 10:51:48 -0700 Received: by devbig638.prn2.facebook.com (Postfix, from userid 11222) id 507CD43A3BA3; Mon, 27 Mar 2017 10:51:47 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Shaohua Li Smtp-Origin-Hostname: devbig638.prn2.facebook.com To: , CC: , , Vivek Goyal , , Smtp-Origin-Cluster: prn2c22 Subject: [PATCH V7 16/18] blk-throttle: add interface for per-cgroup target latency Date: Mon, 27 Mar 2017 10:51:44 -0700 Message-ID: <2b83be1e6eb0fc1640ed094752ef218813f50935.1490634565.git.shli@fb.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: References: X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-03-27_16:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Here we introduce per-cgroup latency target. The target determines how a cgroup can afford latency increasement. We will use the target latency to calculate a threshold and use it to schedule IO for cgroups. If a cgroup's bandwidth is below its low limit but its average latency is below the threshold, other cgroups can safely dispatch more IO even their bandwidth is higher than their low limits. On the other hand, if the first cgroup's latency is higher than the threshold, other cgroups are throttled to their low limits. So the target latency determines how we efficiently utilize free disk resource without sacifice of worload's IO latency. For example, assume 4k IO average latency is 50us when disk isn't congested. A cgroup sets the target latency to 30us. Then the cgroup can accept 50+30=80us IO latency. If the cgroupt's average IO latency is 90us and its bandwidth is below low limit, other cgroups are throttled to their low limit. If the cgroup's average IO latency is 60us, other cgroups are allowed to dispatch more IO. When other cgroups dispatch more IO, the first cgroup's IO latency will increase. If it increases to 81us, we then throttle other cgroups. User will configure the interface in this way: echo "8:16 rbps=2097152 wbps=max latency=100 idle=200" > io.low latency is in microsecond unit By default, latency target is 0, which means to guarantee IO latency. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 0ea8698..6e1c298 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -25,6 +25,8 @@ static int throtl_quantum = 32; #define DFL_IDLE_THRESHOLD_SSD (1000L) /* 1 ms */ #define DFL_IDLE_THRESHOLD_HD (100L * 1000) /* 100 ms */ #define MAX_IDLE_TIME (5L * 1000 * 1000) /* 5 s */ +/* default latency target is 0, eg, guarantee IO latency by default */ +#define DFL_LATENCY_TARGET (0) static struct blkcg_policy blkcg_policy_throtl; @@ -152,6 +154,7 @@ struct throtl_grp { unsigned long last_check_time; + unsigned long latency_target; /* us */ /* When did we start a new slice */ unsigned long slice_start[2]; unsigned long slice_end[2]; @@ -449,6 +452,8 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node) tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX; /* LIMIT_LOW will have default value 0 */ + tg->latency_target = DFL_LATENCY_TARGET; + return &tg->pd; } @@ -1445,6 +1450,7 @@ static u64 tg_prfill_limit(struct seq_file *sf, struct blkg_policy_data *pd, u64 bps_dft; unsigned int iops_dft; char idle_time[26] = ""; + char latency_time[26] = ""; if (!dname) return 0; @@ -1461,8 +1467,9 @@ static u64 tg_prfill_limit(struct seq_file *sf, struct blkg_policy_data *pd, tg->bps_conf[WRITE][off] == bps_dft && tg->iops_conf[READ][off] == iops_dft && tg->iops_conf[WRITE][off] == iops_dft && - (off != LIMIT_LOW || tg->idletime_threshold == - tg->td->dft_idletime_threshold)) + (off != LIMIT_LOW || + (tg->idletime_threshold == tg->td->dft_idletime_threshold && + tg->latency_target == DFL_LATENCY_TARGET))) return 0; if (tg->bps_conf[READ][off] != bps_dft) @@ -1483,10 +1490,17 @@ static u64 tg_prfill_limit(struct seq_file *sf, struct blkg_policy_data *pd, else snprintf(idle_time, sizeof(idle_time), " idle=%lu", tg->idletime_threshold); + + if (tg->latency_target == ULONG_MAX) + strcpy(latency_time, " latency=max"); + else + snprintf(latency_time, sizeof(latency_time), + " latency=%lu", tg->latency_target); } - seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s%s\n", - dname, bufs[0], bufs[1], bufs[2], bufs[3], idle_time); + seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s%s%s\n", + dname, bufs[0], bufs[1], bufs[2], bufs[3], idle_time, + latency_time); return 0; } @@ -1505,6 +1519,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, struct throtl_grp *tg; u64 v[4]; unsigned long idle_time; + unsigned long latency_time; int ret; int index = of_cft(of)->private; @@ -1520,6 +1535,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, v[3] = tg->iops_conf[WRITE][index]; idle_time = tg->idletime_threshold; + latency_time = tg->latency_target; while (true) { char tok[27]; /* wiops=18446744073709551616 */ char *p; @@ -1553,6 +1569,8 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, v[3] = min_t(u64, val, UINT_MAX); else if (off == LIMIT_LOW && !strcmp(tok, "idle")) idle_time = val; + else if (off == LIMIT_LOW && !strcmp(tok, "latency")) + latency_time = val; else goto out_finish; } @@ -1583,6 +1601,8 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, tg->td->limit_index = LIMIT_LOW; tg->idletime_threshold = (idle_time == ULONG_MAX) ? ULONG_MAX : idle_time; + tg->latency_target = (latency_time == ULONG_MAX) ? + ULONG_MAX : latency_time; } tg_conf_updated(tg); ret = 0;