From patchwork Mon Feb 22 22:01:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaohua Li X-Patchwork-Id: 8384111 Return-Path: X-Original-To: patchwork-linux-block@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 81471C0553 for ; Mon, 22 Feb 2016 22:03:58 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 80BEC205CD for ; Mon, 22 Feb 2016 22:03:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6A965204D1 for ; Mon, 22 Feb 2016 22:03:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756072AbcBVWDo (ORCPT ); Mon, 22 Feb 2016 17:03:44 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:59961 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755927AbcBVWBc (ORCPT ); Mon, 22 Feb 2016 17:01:32 -0500 Received: from pps.filterd (m0044008.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.15.0.59/8.15.0.59) with SMTP id u1MLxG6k017952 for ; Mon, 22 Feb 2016 14:01:32 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=ij8xXrFCHb9RDs0bcRNTCh+w53omHXLoJaguzhqRcKc=; b=A8S2L1NOHHnSnMZHrEJzTvN7n5M4Ltz12FdhVYSjV2UoTCZERTZQC3hlrtMNVbYfD/1X At7O0Zor4GZJRhDlSnLSaiLIZ8CYksoN5kwF/RDJ5Rhbmrp3XfYe207ruJZS3Pii4nOm 43++4d++gC+8RtvQ1QHMVQ6uWX6ceLUVjnM= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2188sr0jyw-2 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Mon, 22 Feb 2016 14:01:32 -0800 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB05.TheFacebook.com (192.168.16.15) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 22 Feb 2016 14:01:31 -0800 Received: from facebook.com (2401:db00:11:d0a2:face:0:39:0) by mx-out.facebook.com (10.212.232.59) with ESMTP id d408fd8cd9af11e5b27a0002c991e86a-cbbe9270 for ; Mon, 22 Feb 2016 14:01:30 -0800 Received: by devbig084.prn1.facebook.com (Postfix, from userid 11222) id 15BBC47E3192; Mon, 22 Feb 2016 14:01:28 -0800 (PST) From: Shaohua Li To: , CC: , , Vivek Goyal , "jmoyer @ redhat . com" , Subject: [PATCH V2 05/13] blk-throttling: detect inactive cgroup Date: Mon, 22 Feb 2016 14:01:20 -0800 Message-ID: <16795d8d024d5cb1aa37f482eb553f7d53f3ba05.1456178093.git.shli@fb.com> X-Mailer: git-send-email 2.6.5 In-Reply-To: References: X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-02-22_08:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We don't have a tree to maintain active cgroups. If a cgroup is inactive for some time, it should be excluded from bandwidth calculation. Otherwise we will only assign partial bandwidth to active cgroups, active cgroups will dispatch less IO, the estimated queue bandwidth drops, which will cause active cgroups dispatch even less IO. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 74 insertions(+), 5 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index fafe9c2..43de1dc 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -18,6 +18,8 @@ #define DFT_WEIGHT (100) #define SHARE_SHIFT (14) #define MAX_SHARE (1 << SHARE_SHIFT) +/* must be less than the interval we update bandwidth */ +#define CGCHECK_TIME (msecs_to_jiffies(100)) /* Max dispatch from a group in 1 round */ static int throtl_grp_quantum = 8; @@ -83,8 +85,11 @@ struct throtl_service_queue { struct timer_list pending_timer; /* fires on first_pending_disptime */ unsigned int weight; /* this queue's weight against siblings */ + unsigned int acting_weight; /* actual weight of the queue */ unsigned int children_weight; /* children weight */ unsigned int share; /* disk bandwidth share of the queue */ + + unsigned long active_timestamp; }; enum tg_state_flags { @@ -184,6 +189,7 @@ struct throtl_data /* Work for dispatching throttled bios */ struct work_struct dispatch_work; enum run_mode mode; + unsigned long last_check_timestamp; }; static bool td_weight_based(struct throtl_data *td) @@ -444,7 +450,7 @@ static void throtl_pd_init(struct blkg_policy_data *pd) if (cgroup_subsys_on_dfl(io_cgrp_subsys) && blkg->parent) sq->parent_sq = &blkg_to_tg(blkg->parent)->service_queue; sq->weight = DFT_WEIGHT; - sq->parent_sq->children_weight += sq->weight; + sq->acting_weight = 0; tg->td = td; } @@ -483,8 +489,10 @@ static void throtl_pd_free(struct blkg_policy_data *pd) struct throtl_grp *tg = pd_to_tg(pd); struct throtl_service_queue *sq = &tg->service_queue; - if (sq->parent_sq) - sq->parent_sq->children_weight -= sq->weight; + if (sq->acting_weight && sq->parent_sq) { + sq->parent_sq->children_weight -= sq->acting_weight; + sq->acting_weight = 0; + } del_timer_sync(&tg->service_queue.pending_timer); kfree(tg); @@ -1096,16 +1104,74 @@ static void tg_update_share(struct throtl_data *td, struct throtl_grp *tg) child = blkg_to_tg(blkg); sq = &child->service_queue; - if (!sq->parent_sq) + if (!sq->parent_sq || !sq->acting_weight) continue; sq->share = max_t(unsigned int, - sq->parent_sq->share * sq->weight / + sq->parent_sq->share * sq->acting_weight / sq->parent_sq->children_weight, 1); } } +static void tg_update_active_time(struct throtl_grp *tg) +{ + struct throtl_service_queue *sq = &tg->service_queue; + unsigned long now = jiffies; + + tg = NULL; + while (sq->parent_sq) { + sq->active_timestamp = now; + if (!sq->acting_weight) { + sq->acting_weight = sq->weight; + sq->parent_sq->children_weight += sq->acting_weight; + tg = sq_to_tg(sq); + } + sq = sq->parent_sq; + } + + if (tg) + tg_update_share(tg->td, tg); +} + +static void detect_inactive_cg(struct throtl_grp *tg) +{ + struct throtl_data *td = tg->td; + struct throtl_service_queue *sq = &tg->service_queue; + unsigned long now = jiffies; + struct cgroup_subsys_state *pos_css; + struct blkcg_gq *blkg; + bool update_share = false; + + tg_update_active_time(tg); + + if (time_before(now, td->last_check_timestamp + CGCHECK_TIME)) + return; + td->last_check_timestamp = now; + + blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg) { + tg = blkg_to_tg(blkg); + sq = &tg->service_queue; + /* '/' cgroup of cgroup2 */ + if (cgroup_subsys_on_dfl(io_cgrp_subsys) && + !tg_to_blkg(tg)->parent) + continue; + + if (sq->parent_sq && + time_before(sq->active_timestamp + CGCHECK_TIME, now) && + !(sq->nr_queued[READ] || sq->nr_queued[WRITE])) { + if (sq->acting_weight && sq->parent_sq) { + sq->parent_sq->children_weight -= sq->acting_weight; + sq->acting_weight = 0; + update_share = true; + } + } + } + + if (update_share) + tg_update_share(td, NULL); +} + static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) { struct throtl_service_queue *sq = &tg->service_queue; @@ -1186,6 +1252,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg) break; } if (nr_reads + nr_writes) { + detect_inactive_cg(tg); tg_update_perf(tg); } @@ -1639,6 +1706,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg, sq = &tg->service_queue; + detect_inactive_cg(tg); while (true) { /* throtl is FIFO - if bios are already queued, should queue */ if (sq->nr_queued[index]) @@ -1791,6 +1859,7 @@ int blk_throtl_init(struct request_queue *q) throtl_service_queue_init(&td->service_queue); td->service_queue.share = MAX_SHARE; td->mode = MODE_NONE; + td->service_queue.acting_weight = DFT_WEIGHT; q->td = td; td->queue = q;