From patchwork Mon Sep 12 06:45:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 9325769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2597560231 for ; Mon, 12 Sep 2016 06:46:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1709E28ACF for ; Mon, 12 Sep 2016 06:46:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B78128ADA; Mon, 12 Sep 2016 06:46:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08AF528ACF for ; Mon, 12 Sep 2016 06:46:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754923AbcILGqK (ORCPT ); Mon, 12 Sep 2016 02:46:10 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:42773 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754396AbcILGqJ (ORCPT ); Mon, 12 Sep 2016 02:46:09 -0400 Received: from 172.24.1.36 (EHLO SZXEML429-HUB.china.huawei.com) ([172.24.1.36]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DMY72256; Mon, 12 Sep 2016 14:45:56 +0800 (CST) Received: from huawei.com (10.175.124.28) by SZXEML429-HUB.china.huawei.com (10.82.67.184) with Microsoft SMTP Server id 14.3.235.1; Mon, 12 Sep 2016 14:45:53 +0800 From: Hou Tao To: CC: , Jens Axboe , Vivek Goyal Subject: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel Date: Mon, 12 Sep 2016 14:45:30 +0800 Message-ID: <1473662730-184701-1-git-send-email-houtao1@huawei.com> X-Mailer: git-send-email 2.5.5 MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.57D64F24.0199, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 537bba6ccaf263d7821695029ae47743 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Due to commit 500462a9de65 ("timers: Switch to a non-cascading wheel"), the slack of timer increases when the timeout increases: So for HZ=250 we end up with the following granularity levels: Level Offset Granularity Range 0 0 4 ms 0 ms - 252 ms 1 64 32 ms 256 ms - 2044 ms (256ms - ~2s) 2 128 256 ms 2048 ms - 16380 ms (~2s - ~16s) When the slack is bigger than throtl_slice (100ms), there will be a problem: throtl_slice_used() will always return true, a new slice will always be genereated, and the bio will be throttled forever. The following is a example: echo 253:0 512 > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device fio --readonly --direct=1 --filename=/dev/vda --size=4K --rate=4K \ --rw=read --ioengine=libaio --iodepth=16 --name 1 the slack of 8s-timer is about 302ms. throtl / [R] bio. bdisp=0 sz=4096 bps=512 iodisp=0 iops=4294967295 queued=0/0 throtl schedule timer. delay=8000 jiffies=4295784850 throtl / dispatch nr_queued=1 read=1 write=0, bdisp=0/0, iodisp=0/0 throtl / [R] new slice start=4295793152 end=4295793252 jiffies=4295793152 throtl / [R] extend slice start=4295793152 end=4295801200 jiffies=4295793152 throtl schedule timer. delay=8000 jiffies=4295793152 throtl / dispatch nr_queued=1 read=1 write=0, bdisp=0/0, iodisp=0/0 throtl / [R] new slice start=4295801344 end=4295801444 jiffies=4295801344 throtl / [R] extend slice start=4295801344 end=4295809400 jiffies=4295801344 throtl schedule timer. delay=8000 jiffies=4295801344 Fix it by checking the delayed dispatch in tg_may_dispatch(): 1. If there is any dispatched bio, the time slice must have been used, so it's OK to renew the time slice. 2. If there is no queued bio, the time slice must have been expired, so it's Ok to renew the time slice. Signed-off-by: Hou Tao --- block/blk-throttle.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index f1aba26..91f8140 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -591,13 +591,20 @@ static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw, tg->slice_end[rw], jiffies); } +static bool throtl_is_delayed_disp(struct throtl_grp *tg, bool rw) +{ + return (time_after(jiffies, tg->slice_end[rw]) && + !tg->bytes_disp[rw] && !tg->io_disp[rw] && + tg->service_queue.nr_queued[rw]) ? true : false; +} + /* Determine if previously allocated or extended slice is complete or not */ static bool throtl_slice_used(struct throtl_grp *tg, bool rw) { if (time_in_range(jiffies, tg->slice_start[rw], tg->slice_end[rw])) return false; - return 1; + return true; } /* Trim the used slices and adjust slice start accordingly */ @@ -782,7 +789,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, * existing slice to make sure it is at least throtl_slice interval * long since now. */ - if (throtl_slice_used(tg, rw)) + if (throtl_slice_used(tg, rw) && !throtl_is_delayed_disp(tg, rw)) throtl_start_new_slice(tg, rw); else { if (time_before(tg->slice_end[rw], jiffies + throtl_slice))