From patchwork Mon Feb 22 22:01:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaohua Li X-Patchwork-Id: 8384041 Return-Path: X-Original-To: patchwork-linux-block@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 66EFC9F372 for ; Mon, 22 Feb 2016 22:03:21 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5D842204D1 for ; Mon, 22 Feb 2016 22:03:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3F48620454 for ; Mon, 22 Feb 2016 22:03:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755831AbcBVWDR (ORCPT ); Mon, 22 Feb 2016 17:03:17 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:59980 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755955AbcBVWBe (ORCPT ); Mon, 22 Feb 2016 17:01:34 -0500 Received: from pps.filterd (m0044008.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.15.0.59/8.15.0.59) with SMTP id u1MLxHi1017957 for ; Mon, 22 Feb 2016 14:01:34 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=/bysMyJDy9qF+u8h3FFi3pzNFzQjnwnub0CmM+80fPQ=; b=lL+ioKkU6cqO2qfDQ3n0gTkQP+OzVzzLrNhfTj1/tghDuwLbmFvETMgbXX/hppBPXVQM YLv6dT+JaBO+S5EZrqeLu7lRxSIrt6JOLuUY6wXV7tZLlgcDMVVeKLLKqRLMr9KUW4Mo 4qhpYDLGNB1bXLVr6bDVJrHoxS/KnYM1ZJY= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2188sr0k06-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Mon, 22 Feb 2016 14:01:34 -0800 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB14.TheFacebook.com (192.168.16.24) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 22 Feb 2016 14:01:33 -0800 Received: from facebook.com (2401:db00:11:d0a2:face:0:39:0) by mx-out.facebook.com (10.223.100.99) with ESMTP id d4ea56bad9af11e5affc24be05956610-d7ad3270 for ; Mon, 22 Feb 2016 14:01:32 -0800 Received: by devbig084.prn1.facebook.com (Postfix, from userid 11222) id 9C75D47E3216; Mon, 22 Feb 2016 14:01:29 -0800 (PST) From: Shaohua Li To: , CC: , , Vivek Goyal , "jmoyer @ redhat . com" , Subject: [PATCH V2 13/13] blk-throttle: detect wrong shrink Date: Mon, 22 Feb 2016 14:01:28 -0800 Message-ID: <3bacf5bc1eb353cdb0e690098cb9c3441d21eb98.1456178093.git.shli@fb.com> X-Mailer: git-send-email 2.6.5 In-Reply-To: References: X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-02-22_08:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Last patch detects wrong shrink for some cases, but not all. Let's have an example. cg1 gets 20% share, cg2 gets 80% share. disk max bps 100, cgroup can only dispatch 80 bps cg1 bps = 20, cg2 bps = 80 This should be stable state, but we don't know about it. As we assign extra bps to queue, we will find cg2 doesn't use its share. So we try to adjust it, for example, shrink cg2 bps 5. cg1 will then get 25 bps. The problem is total disk bps is 100, cg2 can't dispatch enough IO and its bps will drop to 75. From time to time, cg2's share will be adjusted. Finally cg1 and cg2 will get same bps. The shrink is wrong. To detect this situation, we record history when a cgroup's share is shrinked. If we found the cgroup's bps drops, we think the shrink is wrong and restore the cgroup's share. To avoid fluctuation, we also disable shrink for a while if a wrong shrink is detected. It's possible, the shrink is right actually, we will redo the shrink after a while. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 7db7d8e..3af41bc 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -122,7 +122,15 @@ struct throtl_io_cost { uint64_t target_bps; unsigned int target_iops; + + uint64_t last_bps; + unsigned int last_iops; + unsigned int last_weight; + unsigned long last_shrink_time; + unsigned long last_wrong_shrink_time; }; +#define SHRINK_STABLE_TIME (2 * CGCHECK_TIME) +#define WRONG_SHRINK_STABLE_TIME (16 * CGCHECK_TIME) /* per cgroup per device data */ struct throtl_grp { @@ -1239,6 +1247,76 @@ static void precheck_tg_activity(struct throtl_grp *tg, unsigned long elapsed_ti } } +static bool detect_wrong_shrink(struct throtl_grp *tg, unsigned long elapsed_time) +{ + struct throtl_service_queue *sq = &tg->service_queue; + struct throtl_service_queue *psq = tg->service_queue.parent_sq; + struct throtl_grp *ptg = sq_to_tg(psq); + struct blkcg_gq *blkg; + struct cgroup_subsys_state *pos_css; + uint64_t actual_bps; + unsigned int actual_iops; + bool ret = false; + unsigned long now = jiffies; + unsigned int new_weight; + bool bandwidth_mode = tg->td->mode == MODE_WEIGHT_BANDWIDTH; + + blkg_for_each_child(blkg, pos_css, tg_to_blkg(ptg)) { + struct throtl_grp *child; + + child = blkg_to_tg(blkg); + sq = &child->service_queue; + + actual_bps = child->io_cost.act_bps; + actual_iops = child->io_cost.act_iops; + if (child->io_cost.last_shrink_time) { + if (time_before_eq(now, child->io_cost.last_shrink_time + + SHRINK_STABLE_TIME) && + ((bandwidth_mode && actual_bps < child->io_cost.last_bps && + child->io_cost.last_bps - actual_bps > + (child->io_cost.last_bps >> 5)) || + (!bandwidth_mode && actual_iops < child->io_cost.last_iops && + child->io_cost.last_iops - actual_iops > + (child->io_cost.last_iops >> 5)))) { + + ret = true; + /* the cgroup will get 1/4 more share */ + if (4 * psq->children_weight > 5 * sq->acting_weight) { + new_weight = sq->acting_weight * + psq->children_weight / (4 * psq->children_weight - + 5 * sq->acting_weight) + sq->acting_weight; + if (new_weight > sq->weight) + new_weight = sq->weight; + } else + new_weight = sq->weight; + + psq->children_weight += new_weight - + sq->acting_weight; + sq->acting_weight = new_weight; + + child->io_cost.last_shrink_time = 0; + child->io_cost.last_wrong_shrink_time = now; + } else if (time_after(now, + child->io_cost.last_shrink_time + SHRINK_STABLE_TIME)) + child->io_cost.last_shrink_time = 0; + } + if (child->io_cost.last_wrong_shrink_time && + time_after(now, child->io_cost.last_wrong_shrink_time + + WRONG_SHRINK_STABLE_TIME)) + child->io_cost.last_wrong_shrink_time = 0; + } + if (!ret) + return ret; + + blkg_for_each_child(blkg, pos_css, tg_to_blkg(ptg)) { + struct throtl_grp *child; + + child = blkg_to_tg(blkg); + child->io_cost.check_weight = false; + } + return true; +} + static bool detect_one_inactive_cg(struct throtl_grp *tg, unsigned long elapsed_time) { struct throtl_service_queue *sq = &tg->service_queue; @@ -1268,6 +1346,9 @@ static bool detect_one_inactive_cg(struct throtl_grp *tg, unsigned long elapsed_ sq->acting_weight == psq->children_weight) return false; + if (detect_wrong_shrink(tg, elapsed_time)) + return true; + blkg_for_each_child(blkg, pos_css, tg_to_blkg(ptg)) { child = blkg_to_tg(blkg); sq = &child->service_queue; @@ -1277,9 +1358,18 @@ static bool detect_one_inactive_cg(struct throtl_grp *tg, unsigned long elapsed_ if (sq->acting_weight == sq->weight) none_scaled_weight += sq->acting_weight; + /* wait it stable */ + if ((child->io_cost.last_shrink_time && time_before_eq(jiffies, + child->io_cost.last_shrink_time + SHRINK_STABLE_TIME)) || + (child->io_cost.last_wrong_shrink_time && time_before_eq(jiffies, + child->io_cost.last_wrong_shrink_time + + WRONG_SHRINK_STABLE_TIME))) + continue; + if (bandwidth_mode) { if (child->io_cost.bps[0] == -1) continue; + if (child->io_cost.act_bps + (child->io_cost.act_bps >> 3) >= child->io_cost.bps[0]) continue; @@ -1290,6 +1380,8 @@ static bool detect_one_inactive_cg(struct throtl_grp *tg, unsigned long elapsed_ adjusted_bps = tmp_bps >> 2; child->io_cost.target_bps = child->io_cost.bps[0] - adjusted_bps; + + child->io_cost.last_bps = child->io_cost.act_bps; } else { if (child->io_cost.iops[0] == -1) continue; @@ -1305,6 +1397,7 @@ static bool detect_one_inactive_cg(struct throtl_grp *tg, unsigned long elapsed_ adjusted_iops; } + child->io_cost.last_weight = sq->acting_weight; adjusted_weight += sq->acting_weight; if (sq->acting_weight == sq->weight) none_scaled_weight -= sq->acting_weight; @@ -1357,6 +1450,8 @@ static bool detect_one_inactive_cg(struct throtl_grp *tg, unsigned long elapsed_ } psq->children_weight -= sq->acting_weight - new_weight; sq->acting_weight = new_weight; + + child->io_cost.last_shrink_time = jiffies; } return true;