From patchwork Sun Jan 15 03:42:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaohua Li X-Patchwork-Id: 9517281 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7424D60762 for ; Sun, 15 Jan 2017 03:47:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6944F2842C for ; Sun, 15 Jan 2017 03:47:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5E1CA28441; Sun, 15 Jan 2017 03:47:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E70432842C for ; Sun, 15 Jan 2017 03:47:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750812AbdAODrC (ORCPT ); Sat, 14 Jan 2017 22:47:02 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:59140 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750804AbdAODmh (ORCPT ); Sat, 14 Jan 2017 22:42:37 -0500 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v0F3fPG7003219 for ; Sat, 14 Jan 2017 19:42:36 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=eVT+SJNJyzzoQNdlX7kmv1tvGtqOJktvbe9RYPQvNcc=; b=Tp7jTFTTFHVXv9kGQnAIRWkWQkiEq25if666jNWRGhZSb24Ab+2orf8U6y/7QSxTvwQz qbQepz+JLBf/ukaB76xjwdnIPdoK96H/QW8v0B1vF5joY5uAfKI9FpC7Wwl0l0esPh4k VDVNDRiIwLAPBIwXMityXBv744Z5C69p8Nw= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 27ygw69px7-4 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Sat, 14 Jan 2017 19:42:36 -0800 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB05.TheFacebook.com (192.168.16.15) with Microsoft SMTP Server (TLS) id 14.3.294.0; Sat, 14 Jan 2017 19:42:35 -0800 Received: from facebook.com (2401:db00:21:603d:face:0:19:0) by mx-out.facebook.com (10.103.99.99) with ESMTP id a75a50c0dad411e699290002c9dfb610-2d9f5a50 for ; Sat, 14 Jan 2017 19:42:35 -0800 Received: by devbig638.prn2.facebook.com (Postfix, from userid 11222) id 975A24363473; Sat, 14 Jan 2017 19:42:35 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: Shaohua Li Smtp-Origin-Hostname: devbig638.prn2.facebook.com To: , CC: , , , Smtp-Origin-Cluster: prn2c22 Subject: [PATCH V6 05/18] blk-throttle: add upgrade logic for LIMIT_LOW state Date: Sat, 14 Jan 2017 19:42:22 -0800 Message-ID: X-Mailer: git-send-email 2.9.3 In-Reply-To: References: X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-01-15_03:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When queue is in LIMIT_LOW state and all cgroups with low limit cross the bps/iops limitation, we will upgrade queue's state to LIMIT_MAX. To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request means the cgroup reaches the limit. If a cgroup has limit set for both read and write, we consider the combination of them for upgrade. The reason is read IO and write IO can interfere with each other. If we do the upgrade based in one direction IO, the other direction IO could be severly harmed. For a cgroup hierarchy, there are two cases. Children has lower low limit than parent. Parent's low limit is meaningless. If children's bps/iops cross low limit, we can upgrade queue state. The other case is children has higher low limit than parent. Children's low limit is meaningless. As long as parent's bps/iops (which is a sum of childrens bps/iops) cross low limit, we can upgrade queue state. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 96 insertions(+), 4 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 3bc6deb..4cc066f 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -457,6 +457,7 @@ static void blk_throtl_update_limit_valid(struct throtl_data *td) td->limit_valid[LIMIT_LOW] = low_valid; } +static void throtl_upgrade_state(struct throtl_data *td); static void throtl_pd_offline(struct blkg_policy_data *pd) { struct throtl_grp *tg = pd_to_tg(pd); @@ -468,9 +469,8 @@ static void throtl_pd_offline(struct blkg_policy_data *pd) blk_throtl_update_limit_valid(tg->td); - if (tg->td->limit_index == LIMIT_LOW && - !tg->td->limit_valid[LIMIT_LOW]) - tg->td->limit_index = LIMIT_MAX; + if (!tg->td->limit_valid[tg->td->limit_index]) + throtl_upgrade_state(tg->td); } static void throtl_pd_free(struct blkg_policy_data *pd) @@ -1079,6 +1079,8 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq) return nr_disp; } +static bool throtl_can_upgrade(struct throtl_data *td, + struct throtl_grp *this_tg); /** * throtl_pending_timer_fn - timer function for service_queue->pending_timer * @arg: the throtl_service_queue being serviced @@ -1105,6 +1107,9 @@ static void throtl_pending_timer_fn(unsigned long arg) int ret; spin_lock_irq(q->queue_lock); + if (throtl_can_upgrade(td, NULL)) + throtl_upgrade_state(td); + again: parent_sq = sq->parent_sq; dispatched = false; @@ -1518,6 +1523,87 @@ static struct blkcg_policy blkcg_policy_throtl = { .pd_free_fn = throtl_pd_free, }; +static bool throtl_tg_can_upgrade(struct throtl_grp *tg) +{ + struct throtl_service_queue *sq = &tg->service_queue; + bool read_limit, write_limit; + + /* + * if cgroup reaches low limit (if low limit is 0, the cgroup always + * reaches), it's ok to upgrade to next limit + */ + read_limit = tg->bps[READ][LIMIT_LOW] || tg->iops[READ][LIMIT_LOW]; + write_limit = tg->bps[WRITE][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW]; + if (!read_limit && !write_limit) + return true; + if (read_limit && sq->nr_queued[READ] && + (!write_limit || sq->nr_queued[WRITE])) + return true; + if (write_limit && sq->nr_queued[WRITE] && + (!read_limit || sq->nr_queued[READ])) + return true; + return false; +} + +static bool throtl_hierarchy_can_upgrade(struct throtl_grp *tg) +{ + while (true) { + if (throtl_tg_can_upgrade(tg)) + return true; + tg = sq_to_tg(tg->service_queue.parent_sq); + if (!tg || !tg_to_blkg(tg)->parent) + return false; + } + return false; +} + +static bool throtl_can_upgrade(struct throtl_data *td, + struct throtl_grp *this_tg) +{ + struct cgroup_subsys_state *pos_css; + struct blkcg_gq *blkg; + + if (td->limit_index != LIMIT_LOW) + return false; + + rcu_read_lock(); + blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg) { + struct throtl_grp *tg = blkg_to_tg(blkg); + + if (tg == this_tg) + continue; + if (!list_empty(&tg_to_blkg(tg)->blkcg->css.children)) + continue; + if (!throtl_hierarchy_can_upgrade(tg)) { + rcu_read_unlock(); + return false; + } + } + rcu_read_unlock(); + return true; +} + +static void throtl_upgrade_state(struct throtl_data *td) +{ + struct cgroup_subsys_state *pos_css; + struct blkcg_gq *blkg; + + td->limit_index = LIMIT_MAX; + rcu_read_lock(); + blkg_for_each_descendant_post(blkg, pos_css, td->queue->root_blkg) { + struct throtl_grp *tg = blkg_to_tg(blkg); + struct throtl_service_queue *sq = &tg->service_queue; + + tg->disptime = jiffies - 1; + throtl_select_dispatch(sq); + throtl_schedule_next_dispatch(sq, false); + } + rcu_read_unlock(); + throtl_select_dispatch(&td->service_queue); + throtl_schedule_next_dispatch(&td->service_queue, false); + queue_work(kthrotld_workqueue, &td->dispatch_work); +} + bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg, struct bio *bio) { @@ -1540,14 +1626,20 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg, sq = &tg->service_queue; +again: while (true) { /* throtl is FIFO - if bios are already queued, should queue */ if (sq->nr_queued[rw]) break; /* if above limits, break to queue */ - if (!tg_may_dispatch(tg, bio, NULL)) + if (!tg_may_dispatch(tg, bio, NULL)) { + if (throtl_can_upgrade(tg->td, tg)) { + throtl_upgrade_state(tg->td); + goto again; + } break; + } /* within limits, let's charge and dispatch directly */ throtl_charge_bio(tg, bio);