From patchwork Tue Sep 1 18:52:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B30F014E5 for ; Tue, 1 Sep 2020 18:56:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A58B20866 for ; Tue, 1 Sep 2020 18:56:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986611; bh=/brCaO/lw2gEHN1TNYiu+hJjRhZnzAnyVHw4C3WRqkQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=Wdyan+/hE1G4oTb5WefD4WJbhhjDaFQTxKqQJyya2D6a3Jq9jnK+JzBfxq5xnbhkW jgMZk2cslmZWGmSkzdjEUkJXQb/KRhxEO30k1O/m7jTKzdstZ29ngBt86ztgthRQQ/ bC7g053oGp9tmGLRAQHUg7dx/NHlKe+5HvOVScmk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731273AbgIASxN (ORCPT ); Tue, 1 Sep 2020 14:53:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728971AbgIASxG (ORCPT ); Tue, 1 Sep 2020 14:53:06 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43D57C061249; Tue, 1 Sep 2020 11:53:06 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id b3so1722916qtg.13; Tue, 01 Sep 2020 11:53:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=V69TM0TXJZV6Ph3DQcWjjVMFJyyhsm9AOvCWZiY6Tfg=; b=npTqPcLDLPp9IG+8hih3s0n1i8HE84G+rEqAq7/0mzOvX4YAHvQawMoRbgGBQt3Yig 2eneUEERH+HTyLTzXLB4YL2RGRONA0RTWYNtrj47yF3wKaXG1SvQcwpDJCy21eZ5Jsca sZEjiM8Q8sRrkCSCdh7obp3ZwzFnJGVNkTOhuz/R6vNzBKji4+ZPo5x3PfxZMyNd2OBO M4MTsGJ65P64qH13pbf0vQEUgLCZYCKSqoDEAtJXuzOVPMrbIWAIgJY6kryL066yazx7 Vx3KNiFq7vpMib+dcpa+l8yunUwdH0plUmX8yPgAs/DeFZ1XTCDGXi3fKIOkg1FtiVS1 14Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=V69TM0TXJZV6Ph3DQcWjjVMFJyyhsm9AOvCWZiY6Tfg=; b=cnJZo/Dt72HBDE5s+8d5RYjwUgpq3tGYuLztZuckz2LwqW3SEmNbiRgdrcQf0k9VS5 s/JvjUO8npffcxcGN6JtQsxr/ERjjj43+ga0WxU9msqKNsm1UbkpOx/9CNeIu7Nl83bZ ukDTkvYazFamr6/3xZn0glhCwAdaG2klSme9sKgKgjpn1H+0ynzB7LVHYaL4F7ooMoxX 0xVZSxqFutfo6MlkbkYMQDEtcFMq1dBleU+Kdx5LV0R+lI1/Poi2aOe9UuYbyqvJc1KC z7yMzSB8sDlyZdzx9zijgssyK8YsQWQF+SFumFRvX/n8U9e4mzwJnO39Cjttd7y9Bmgr Ge1Q== X-Gm-Message-State: AOAM532soo0nFo2HYG+vuUNTJEdIY89ZGrLmfgl5YEUC3GXOlhx9jrnw TOIk0crfFD+dYxSdoQ+nII0= X-Google-Smtp-Source: ABdhPJyrnP9lJ/BIih++V+RCRNXE1ATUVDvzcPOodkJQX4aRSSsI0GgfivjUcfbZX/LofdivzaRlEg== X-Received: by 2002:ac8:5303:: with SMTP id t3mr3360197qtn.159.1598986385416; Tue, 01 Sep 2020 11:53:05 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id x26sm2262245qtr.78.2020.09.01.11.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:04 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo , stable@vger.kernel.org Subject: [PATCH 01/27] blk-iocost: ioc_pd_free() shouldn't assume irq disabled Date: Tue, 1 Sep 2020 14:52:31 -0400 Message-Id: <20200901185257.645114-2-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org ioc_pd_free() grabs irq-safe ioc->lock without ensuring that irq is disabled when it can be called with irq disabled or enabled. This has a small chance of causing A-A deadlocks and triggers lockdep splats. Use irqsave operations instead. Signed-off-by: Tejun Heo Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ --- block/blk-iocost.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 413e0b5c8e6b..d37b55db2409 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -2092,14 +2092,15 @@ static void ioc_pd_free(struct blkg_policy_data *pd) { struct ioc_gq *iocg = pd_to_iocg(pd); struct ioc *ioc = iocg->ioc; + unsigned long flags; if (ioc) { - spin_lock(&ioc->lock); + spin_lock_irqsave(&ioc->lock, flags); if (!list_empty(&iocg->active_list)) { propagate_active_weight(iocg, 0, 0); list_del_init(&iocg->active_list); } - spin_unlock(&ioc->lock); + spin_unlock_irqrestore(&ioc->lock, flags); hrtimer_cancel(&iocg->waitq_timer); hrtimer_cancel(&iocg->delay_timer); From patchwork Tue Sep 1 18:52:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749415 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26D3F1575 for ; Tue, 1 Sep 2020 18:56:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0F42820767 for ; Tue, 1 Sep 2020 18:56:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986616; bh=HbKlxLoyNfuJ+PYxw3pGmdfgMjFnR8bPA42Ap6SI98Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=WZNlH47fdQsBWyN+piYsUG/yOsJ3NieWq9QXigdGiRIab8lBa37Sy2unrnrOh0RfJ XSVZqTkW0xk3bg6wmTynQUebgcN9mE6k5NR+j4zSF/+7xgrUYr9vrCk2cGYyWfVwql vAOP/WQBIxd1YM69eFfVuUV7LLNv7fsV+SJpUvik= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732682AbgIAS4x (ORCPT ); Tue, 1 Sep 2020 14:56:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731245AbgIASxJ (ORCPT ); Tue, 1 Sep 2020 14:53:09 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71775C06124F; Tue, 1 Sep 2020 11:53:08 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id d20so1996903qka.5; Tue, 01 Sep 2020 11:53:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uLHwGiWfWkP5kcHSnqjONlJj/O0Qbx60l9TYb+oGkco=; b=B3K+LIBrV807PBKY6u2wypELmyggq3KtY8cxZwTKZ8RQA/n1e71fVHCy4sRQklcUtr CREI/4x1mQ+mcdi46kL6S9vLmtbUlnz9bBxhQWCYyCv2fc8NYLI9HGzXRFS12nQnr/7r ClhF+3BwIXFLZqWPw7pfXSiCSun3gZx/vd+gKCAmgcbETNSruJmVVwEXokDYSM3piiJU Kl96VzKmr+GcZYj7+ZLR+PoPVmd0WtAM4O/uxMk5okGAOPNdjr14YsIo2gBwdDgqKjIQ 2rly3BSMOKkL/pVu0MvnbEvXy4n1PpK9mIplhN1bpK56tpdiUGmth5nbGN49e2UaWbtV H2aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=uLHwGiWfWkP5kcHSnqjONlJj/O0Qbx60l9TYb+oGkco=; b=aprzarl+XOLn8Qco4ucUL0vr0zoMclLak5/hCgl2GzVytxAA9zntJ/bXBBpQAk4e1C 2/XjeS+SZELUOxIZx/OFw4UUINx5kICmIM7wRrLOgebeuN4fOVxxlqLqr+OeopoiqN/p I6SPWVlKAKivwGEwE7jzuCEz/63HDBbQ2m5k3/YF3tRqnE8RRDFucs1YGgRxR0viLLFH V6ag8iEZQDFBFuZoCaPeFcfpbV+1SMqjqOSbiwmXQl571g5zEaXZ6Yk1oUEe2aX2BmyN v1p6760MVwJ13iUf2Bk6dur6O4ZCatH23hZRnh3nJ/IaRa3QnINC4tpUvmxjC0Y0KM6l m5EQ== X-Gm-Message-State: AOAM531XskOITEueK6O9l7++jIjCJ87XdNRj+wlZ/5FDLrrCjTmN8D8k uDV0BsESOB4m++aMfO4kGRM9hl7hlMxSFg== X-Google-Smtp-Source: ABdhPJz/FzbNt/DYQEKFdJtI4ZeTtPEVzwUyDfGj2haKHEtcgF8tR5a76OyunS6WmzWCcMJ5geoj4w== X-Received: by 2002:a37:9d4d:: with SMTP id g74mr3064979qke.422.1598986387528; Tue, 01 Sep 2020 11:53:07 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id b13sm1675828qkl.46.2020.09.01.11.53.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:07 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo , stable@vger.kernel.org Subject: [PATCH 02/27] blk-stat: make q->stats->lock irqsafe Date: Tue, 1 Sep 2020 14:52:32 -0400 Message-Id: <20200901185257.645114-3-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org blk-iocost calls blk_stat_enable_accounting() while holding an irqsafe lock which triggers a lockdep splat because q->stats->lock isn't irqsafe. Let's make it irqsafe. Signed-off-by: Tejun Heo Fixes: cd006509b0a9 ("blk-iocost: account for IO size when testing latencies") Cc: stable@vger.kernel.org # v5.8+ --- block/blk-stat.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/block/blk-stat.c b/block/blk-stat.c index 7da302ff88d0..ae3dd1fb8e61 100644 --- a/block/blk-stat.c +++ b/block/blk-stat.c @@ -137,6 +137,7 @@ void blk_stat_add_callback(struct request_queue *q, struct blk_stat_callback *cb) { unsigned int bucket; + unsigned long flags; int cpu; for_each_possible_cpu(cpu) { @@ -147,20 +148,22 @@ void blk_stat_add_callback(struct request_queue *q, blk_rq_stat_init(&cpu_stat[bucket]); } - spin_lock(&q->stats->lock); + spin_lock_irqsave(&q->stats->lock, flags); list_add_tail_rcu(&cb->list, &q->stats->callbacks); blk_queue_flag_set(QUEUE_FLAG_STATS, q); - spin_unlock(&q->stats->lock); + spin_unlock_irqrestore(&q->stats->lock, flags); } void blk_stat_remove_callback(struct request_queue *q, struct blk_stat_callback *cb) { - spin_lock(&q->stats->lock); + unsigned long flags; + + spin_lock_irqsave(&q->stats->lock, flags); list_del_rcu(&cb->list); if (list_empty(&q->stats->callbacks) && !q->stats->enable_accounting) blk_queue_flag_clear(QUEUE_FLAG_STATS, q); - spin_unlock(&q->stats->lock); + spin_unlock_irqrestore(&q->stats->lock, flags); del_timer_sync(&cb->timer); } @@ -183,10 +186,12 @@ void blk_stat_free_callback(struct blk_stat_callback *cb) void blk_stat_enable_accounting(struct request_queue *q) { - spin_lock(&q->stats->lock); + unsigned long flags; + + spin_lock_irqsave(&q->stats->lock, flags); q->stats->enable_accounting = true; blk_queue_flag_set(QUEUE_FLAG_STATS, q); - spin_unlock(&q->stats->lock); + spin_unlock_irqrestore(&q->stats->lock, flags); } EXPORT_SYMBOL_GPL(blk_stat_enable_accounting); From patchwork Tue Sep 1 18:52:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749411 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2733D1575 for ; Tue, 1 Sep 2020 18:56:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C40920866 for ; Tue, 1 Sep 2020 18:56:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986608; bh=3s2Yw/bNvP1WTiAnTgIEY8ngBX73M0ZoeCaoNpDReWg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ZnHmT/pmdOWWr+thsvZ7EcFqTU6fdmiDo/ovls7vKA8fhvxEkmYghcnxPbSzmi/qU xmCUXCjDeWkB4mciBBXCntIEOA3J8jKfOdZIAX6Q0PjpDuvI2NcMKzenxq8X55TC0V CxZsdLVbyqkwNVj13F/529o7qNZSZ9QrBb3JR+Ng= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731375AbgIASxR (ORCPT ); Tue, 1 Sep 2020 14:53:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731329AbgIASxO (ORCPT ); Tue, 1 Sep 2020 14:53:14 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B55CC061244; Tue, 1 Sep 2020 11:53:14 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id d27so1738848qtg.4; Tue, 01 Sep 2020 11:53:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6GL4H8pij+pmYHNiyj1arTTHlbzqgLGjt59578Jx0II=; b=Xf4/aamsMdEx/yZ2bHoNq1RdPJ9KMsJVCeiC8A/RGlXxqVkMePHHHhDwHH+qzmu/Y3 n9Syo9hvtP0Dk/O9OAH0ITX0S0NuKlDdBOMzhcDkbF+DbIoilxy3C+CFtyTCegiXXNMt WYesuWY0HT6/xGbQDx0XwSiLZg9P61eYq+gHpIR0y80Ni8HrJBFwmgIgSMrw/5FWflN7 vOrHHnlsVjfz3cMRKdHZiiHy8AISdk3BmWqUmJ0KHnTJAnqNtlkwD3eDoGE5is9G1SGL najP9zacapJk6+8W7bt8Eyk2nPn2jK5rQwhzr4vGDSCt7HI4iRt3uQFAngQACqLdfgV1 Up3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=6GL4H8pij+pmYHNiyj1arTTHlbzqgLGjt59578Jx0II=; b=JJS5pgx+Z+P4dPvLPNDG48hzAqMQZNYDi0rP0mnawCCH3F7ApM4clQrzpGpoHD1g1c GI91mbwLG3oRL5edGB7DrOtyUhGcn9EZ8aw1bGxJbkLdnBgrXEWvIZ55D7awE4DCWfTL JogR515oHgRWNojrXq0ISntL4AvKH3c4LRQzRw8fK9Sj6n4SUqdpuo34vTadW7d9s1Ld bliqEOrbDshDPELVGfDXKHtFBF3gGqWGy932JDbxnF4hgZ+gDNBtvTMsdQ21ky8jkRYs tyC+2rWphwvEj+IN2lJFYJuj1SRu+N1rswJiiuy73Fu2EumJKjyQXQsHoD/2HAwcDg1B nAeQ== X-Gm-Message-State: AOAM531LzZTtjP84xCYUekv0RqbGEEAQGUWL3S+xg7iIzW3grGosNsLM 1A5Y2uV7jd7A8bkTQDq3X34= X-Google-Smtp-Source: ABdhPJwHkifWm2B1bW1JcFTu2RFjKovzm3fFIpUxV9ynAzKVIGr+zS0GUgQjkWHtC3EJ2xRc0xilPQ== X-Received: by 2002:aed:2ce5:: with SMTP id g92mr3185524qtd.204.1598986393468; Tue, 01 Sep 2020 11:53:13 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id x28sm2340852qki.55.2020.09.01.11.53.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:13 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 03/27] blk-iocost: use local[64]_t for percpu stat Date: Tue, 1 Sep 2020 14:52:33 -0400 Message-Id: <20200901185257.645114-4-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org blk-iocost has been reading percpu stat counters from remote cpus which on some archs can lead to torn reads in really rare occassions. Use local[64]_t for those counters. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index d37b55db2409..e2266e7692b4 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -179,6 +179,8 @@ #include #include #include +#include +#include #include "blk-rq-qos.h" #include "blk-stat.h" #include "blk-wbt.h" @@ -373,8 +375,8 @@ struct ioc_params { }; struct ioc_missed { - u32 nr_met; - u32 nr_missed; + local_t nr_met; + local_t nr_missed; u32 last_met; u32 last_missed; }; @@ -382,7 +384,7 @@ struct ioc_missed { struct ioc_pcpu_stat { struct ioc_missed missed[2]; - u64 rq_wait_ns; + local64_t rq_wait_ns; u64 last_rq_wait_ns; }; @@ -1278,8 +1280,8 @@ static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p u64 this_rq_wait_ns; for (rw = READ; rw <= WRITE; rw++) { - u32 this_met = READ_ONCE(stat->missed[rw].nr_met); - u32 this_missed = READ_ONCE(stat->missed[rw].nr_missed); + u32 this_met = local_read(&stat->missed[rw].nr_met); + u32 this_missed = local_read(&stat->missed[rw].nr_missed); nr_met[rw] += this_met - stat->missed[rw].last_met; nr_missed[rw] += this_missed - stat->missed[rw].last_missed; @@ -1287,7 +1289,7 @@ static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p stat->missed[rw].last_missed = this_missed; } - this_rq_wait_ns = READ_ONCE(stat->rq_wait_ns); + this_rq_wait_ns = local64_read(&stat->rq_wait_ns); rq_wait_ns += this_rq_wait_ns - stat->last_rq_wait_ns; stat->last_rq_wait_ns = this_rq_wait_ns; } @@ -1908,6 +1910,7 @@ static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio) static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq) { struct ioc *ioc = rqos_to_ioc(rqos); + struct ioc_pcpu_stat *ccs; u64 on_q_ns, rq_wait_ns, size_nsec; int pidx, rw; @@ -1931,13 +1934,17 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq) rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns; size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC); + ccs = get_cpu_ptr(ioc->pcpu_stat); + if (on_q_ns <= size_nsec || on_q_ns - size_nsec <= ioc->params.qos[pidx] * NSEC_PER_USEC) - this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_met); + local_inc(&ccs->missed[rw].nr_met); else - this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_missed); + local_inc(&ccs->missed[rw].nr_missed); + + local64_add(rq_wait_ns, &ccs->rq_wait_ns); - this_cpu_add(ioc->pcpu_stat->rq_wait_ns, rq_wait_ns); + put_cpu_ptr(ccs); } static void ioc_rqos_queue_depth_changed(struct rq_qos *rqos) @@ -1977,7 +1984,7 @@ static int blk_iocost_init(struct request_queue *q) { struct ioc *ioc; struct rq_qos *rqos; - int ret; + int i, cpu, ret; ioc = kzalloc(sizeof(*ioc), GFP_KERNEL); if (!ioc) @@ -1989,6 +1996,16 @@ static int blk_iocost_init(struct request_queue *q) return -ENOMEM; } + for_each_possible_cpu(cpu) { + struct ioc_pcpu_stat *ccs = per_cpu_ptr(ioc->pcpu_stat, cpu); + + for (i = 0; i < ARRAY_SIZE(ccs->missed); i++) { + local_set(&ccs->missed[i].nr_met, 0); + local_set(&ccs->missed[i].nr_missed, 0); + } + local64_set(&ccs->rq_wait_ns, 0); + } + rqos = &ioc->rqos; rqos->id = RQ_QOS_COST; rqos->ops = &ioc_rqos_ops; From patchwork Tue Sep 1 18:52:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749361 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7BD8B14E5 for ; Tue, 1 Sep 2020 18:53:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 600EB2098B for ; Tue, 1 Sep 2020 18:53:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986404; bh=jvJdrpAAo40Nsx3hqA1wXCNYFE9ZPHATEN5rd1230to=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=PJ6MVSCkYmMKHKQEdYn034gWSh/i53hHidCehRzW7k5MOu+GUVOxkFKs7XEsiDUkV Few5KUlX94Arkz7SwVDQF/7VmMlcoR07a6LmbA9E3kfbA4tePxfZcdZGw8pc6UNWDK AtuV7cqOyDtEAJFkCsNtjkXWpqGg3kt6g8Zn1H64= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730462AbgIASxV (ORCPT ); Tue, 1 Sep 2020 14:53:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731366AbgIASxR (ORCPT ); Tue, 1 Sep 2020 14:53:17 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E52BC061245; Tue, 1 Sep 2020 11:53:16 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id b14so1999428qkn.4; Tue, 01 Sep 2020 11:53:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Abijl0OmP6VYSflgQEhNpTgXbVq6GxdxWWrhkCH99jc=; b=BYrK0zSLvPdaANXkzCXqODMBDKe5h7ipVdn2xiRVWxu7y4qoGlbLsat7aKgGV2NzAX uA9xTg2W43pbV3T1KFcjp4McIbz8C7Z9y+4lVi0pNw5/4LtQdhb95BWCEs+uWaml2W9j QvwuHJONksBt2KE09l+I1qkK15+vvKFyOd6venv28WwjFQaNa1WW0wUIiCKkvCB1qLG7 MIWBSC3WrHgH3Wa2mh62XUb7i74iR0JuJMWm91KrNQIIlLW7d/CPJhKqVvICdX41A4eB UjHfd1CFFydEXixQ5tAReP0xL5GurO/+lMrr530F1YkcYn1cpAO3ght+hgZFzYk4wj5x xFGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=Abijl0OmP6VYSflgQEhNpTgXbVq6GxdxWWrhkCH99jc=; b=LEAWWam4DJRaLQDfj+KeAmzDBzEb3g0wtqDmIuTZLCam26ei5a7mrFORjU4Z8KcVqA U+8+LbF8f+koQIYyFFZ1rOrPdgjmp76iDmHk6OW8jSNeidwFxigCdxVssSxzaDtNkivF J7udIkFEdgzsQyclInZYcYeevuNu1hHh/lD3D7ALyxmHb+Ji29o45J/fFLeXbYwgaKRn /U3RSfO6fkWkacY2Cx18UD9dzUPxw0XCbwFD0D+0UcptQBSUEiOQyEM4lI5n+Brxlnze 3/7Z/CCYhjM4NB+PC551uTNM+xtp6glykFzUtuWNWr6EE4pbiScr0Nk/Rx3sRsloBNND 39eQ== X-Gm-Message-State: AOAM533kUB5osBHtYdwTWt4Z1nZ0uttLrKQon9+zFCIjN42I9uJdwsX+ H8EjjD8vyatAQcSX2rwuRTFUjcz/UK84CQ== X-Google-Smtp-Source: ABdhPJzlYRM4F4sUXjrbVokCMMwmBNVn4u2vGXfHZMYyyB+m9IRWMAArmA9lx0GWSsYbD7bh8ienXQ== X-Received: by 2002:a37:9a13:: with SMTP id c19mr3185912qke.48.1598986395712; Tue, 01 Sep 2020 11:53:15 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id k6sm1864377qti.23.2020.09.01.11.53.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:14 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 04/27] blk-iocost: rename propagate_active_weights() to propagate_weights() Date: Tue, 1 Sep 2020 14:52:34 -0400 Message-Id: <20200901185257.645114-5-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org It already propagates two weights - active and inuse - and there will be another soon. Let's drop the confusing misnomers. Rename [__]propagate_active_weights() to [__]propagate_weights() and commit_active_weights() to commit_weights(). This is pure rename. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index e2266e7692b4..78e6919153d8 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -479,7 +479,7 @@ struct ioc_gq { atomic64_t active_period; struct list_head active_list; - /* see __propagate_active_weight() and current_hweight() for details */ + /* see __propagate_weights() and current_hweight() for details */ u64 child_active_sum; u64 child_inuse_sum; int hweight_gen; @@ -890,7 +890,7 @@ static void ioc_start_period(struct ioc *ioc, struct ioc_now *now) * Update @iocg's `active` and `inuse` to @active and @inuse, update level * weight sums and propagate upwards accordingly. */ -static void __propagate_active_weight(struct ioc_gq *iocg, u32 active, u32 inuse) +static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse) { struct ioc *ioc = iocg->ioc; int lvl; @@ -935,7 +935,7 @@ static void __propagate_active_weight(struct ioc_gq *iocg, u32 active, u32 inuse ioc->weights_updated = true; } -static void commit_active_weights(struct ioc *ioc) +static void commit_weights(struct ioc *ioc) { lockdep_assert_held(&ioc->lock); @@ -947,10 +947,10 @@ static void commit_active_weights(struct ioc *ioc) } } -static void propagate_active_weight(struct ioc_gq *iocg, u32 active, u32 inuse) +static void propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse) { - __propagate_active_weight(iocg, active, inuse); - commit_active_weights(iocg->ioc); + __propagate_weights(iocg, active, inuse); + commit_weights(iocg->ioc); } static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep) @@ -966,9 +966,9 @@ static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep goto out; /* - * Paired with wmb in commit_active_weights(). If we saw the - * updated hweight_gen, all the weight updates from - * __propagate_active_weight() are visible too. + * Paired with wmb in commit_weights(). If we saw the updated + * hweight_gen, all the weight updates from __propagate_weights() are + * visible too. * * We can race with weight updates during calculation and get it * wrong. However, hweight_gen would have changed and a future @@ -1018,7 +1018,7 @@ static void weight_updated(struct ioc_gq *iocg) weight = iocg->cfg_weight ?: iocc->dfl_weight; if (weight != iocg->weight && iocg->active) - propagate_active_weight(iocg, weight, + propagate_weights(iocg, weight, DIV64_U64_ROUND_UP(iocg->inuse * weight, iocg->weight)); iocg->weight = weight; } @@ -1090,8 +1090,8 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) */ iocg->hweight_gen = atomic_read(&ioc->hweight_gen) - 1; list_add(&iocg->active_list, &ioc->active_iocgs); - propagate_active_weight(iocg, iocg->weight, - iocg->last_inuse ?: iocg->weight); + propagate_weights(iocg, iocg->weight, + iocg->last_inuse ?: iocg->weight); TRACE_IOCG_PATH(iocg_activate, iocg, now, last_period, cur_period, vtime); @@ -1384,13 +1384,13 @@ static void ioc_timer_fn(struct timer_list *timer) } else if (iocg_is_idle(iocg)) { /* no waiter and idle, deactivate */ iocg->last_inuse = iocg->inuse; - __propagate_active_weight(iocg, 0, 0); + __propagate_weights(iocg, 0, 0); list_del_init(&iocg->active_list); } spin_unlock(&iocg->waitq.lock); } - commit_active_weights(ioc); + commit_weights(ioc); /* calc usages and see whether some weights need to be moved around */ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { @@ -1483,8 +1483,8 @@ static void ioc_timer_fn(struct timer_list *timer) TRACE_IOCG_PATH(inuse_takeback, iocg, &now, iocg->inuse, new_inuse, hw_inuse, new_hwi); - __propagate_active_weight(iocg, iocg->weight, - new_inuse); + __propagate_weights(iocg, iocg->weight, + new_inuse); } } else { /* genuninely out of vtime */ @@ -1524,11 +1524,11 @@ static void ioc_timer_fn(struct timer_list *timer) TRACE_IOCG_PATH(inuse_giveaway, iocg, &now, iocg->inuse, new_inuse, hw_inuse, new_hwi); - __propagate_active_weight(iocg, iocg->weight, new_inuse); + __propagate_weights(iocg, iocg->weight, new_inuse); } } skip_surplus_transfers: - commit_active_weights(ioc); + commit_weights(ioc); /* * If q is getting clogged or we're missing too much, we're issuing @@ -1753,7 +1753,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) TRACE_IOCG_PATH(inuse_reset, iocg, &now, iocg->inuse, iocg->weight, hw_inuse, hw_active); spin_lock_irq(&ioc->lock); - propagate_active_weight(iocg, iocg->weight, iocg->weight); + propagate_weights(iocg, iocg->weight, iocg->weight); spin_unlock_irq(&ioc->lock); current_hweight(iocg, &hw_active, &hw_inuse); } @@ -2114,7 +2114,7 @@ static void ioc_pd_free(struct blkg_policy_data *pd) if (ioc) { spin_lock_irqsave(&ioc->lock, flags); if (!list_empty(&iocg->active_list)) { - propagate_active_weight(iocg, 0, 0); + propagate_weights(iocg, 0, 0); list_del_init(&iocg->active_list); } spin_unlock_irqrestore(&ioc->lock, flags); From patchwork Tue Sep 1 18:52:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749409 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0F4A1575 for ; Tue, 1 Sep 2020 18:56:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA96320866 for ; Tue, 1 Sep 2020 18:56:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986600; bh=WmkvbDJV3kLEMgwSnIJuHS45TOkt8O7tRuqHH2svG/w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=2MBy9bOmSQYpapAYyCtsNSaujcT/bEpF+jHzUKwjy2jF5jW1/ACd7LPpvGYm9Zwfj osxJQGqm8Ae8gPKBdNMhH71RF6Hegk7IoAL7X3CpAm02Ov6wcRi5k9qw0pEjekufkw d577/rrai8CQ7s23n/3JfAMgAqSWKriAxidp/ikg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731475AbgIASxY (ORCPT ); Tue, 1 Sep 2020 14:53:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731390AbgIASxV (ORCPT ); Tue, 1 Sep 2020 14:53:21 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A106C061246; Tue, 1 Sep 2020 11:53:21 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id u3so1991228qkd.9; Tue, 01 Sep 2020 11:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=a96NqVhWlH/6rbnaJHsSXmoP7NyH4YVeyS7xAuR4Ue8=; b=Rl00dssZUwB0VlBF0vtS3UAVh028ZEmyWDony//gcOYbkr1LyexfkGRr2AguBVNakR K9Ebkf/f1WXdvNiuOuHTGxQxRLnqr3R8Fm7YHS1XMUOlsM758If3x29KbUASRlcqgw30 vqtyL+ENveuwJjS5jDfPnVcikKUUEnxCOzWM6HwXiVJ/CZPghydNweK9CflK1x5ZGzrk V2fj/o0wnwORTQvSZYBc3FYl5UTbNxX5tZhJvvqbb3rFrh3GCGYCrH9CnrGOi2q5PrwO d6ktGCGprgljvf9/VpsSttymMRC1kzi+LXId7l4BQgp7cA1tElxDRWhoJafqSIslJER8 K1BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=a96NqVhWlH/6rbnaJHsSXmoP7NyH4YVeyS7xAuR4Ue8=; b=aO+ONcOMMKxR8La9MLubJs5cqAn5tX3bbrL2CLWpJ/36L/ABI4hIoCTJKrAG1PNC4C kcHj/UvilOQ4duWZlY00I9D4doI4gWegNmWZrgMdUdlT8sxe4uNe2pCi90n6MnVZXdOj EnrIr038IJ41JXab3kVckZ33ewv2ZOmc9H7lH09TvvNyzrmeulDjrAht13KXKgMDz0wa uCrN6+OYJ5XNxrbmUNo7I9xHIFImLlrZYLjTYCEfY1Kgo31LM1xaRHmY+A+La+ufJBhN DQzN+aeHb93eKbJiFZySK6TN4MnEk71R7yoBsxOB1xG4O3ecewAIPv4Py7yEo5K+OVF8 EJtg== X-Gm-Message-State: AOAM533AyeJy4XMpAR0Jwfq0h8Acr5zlr35IvtsCCWOt2KB+yuo5wvVe YcbjiEEOioWzg7PSInQf3mY= X-Google-Smtp-Source: ABdhPJzqAoMxEp76tyJMpXipznNzMrv+5slAYR8k+HgXrrZ63L+r8iQLKkluRa2/I3WVYJuB3DYV7g== X-Received: by 2002:a37:9a13:: with SMTP id c19mr3186201qke.48.1598986400192; Tue, 01 Sep 2020 11:53:20 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id k3sm2317260qkb.95.2020.09.01.11.53.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:19 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 05/27] blk-iocost: clamp inuse and skip noops in __propagate_weights() Date: Tue, 1 Sep 2020 14:52:35 -0400 Message-Id: <20200901185257.645114-6-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org __propagate_weights() currently expects the callers to clamp inuse within [1, active], which is needlessly fragile. The inuse adjustment logic is going to be revamped, in preparation, let's make __propagate_weights() clamp inuse on entry. Also, make it avoid weight updates altogether if neither active or inuse is changed. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 78e6919153d8..8dfe73dde2a8 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -897,7 +897,10 @@ static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse) lockdep_assert_held(&ioc->lock); - inuse = min(active, inuse); + inuse = clamp_t(u32, inuse, 1, active); + + if (active == iocg->active && inuse == iocg->inuse) + return; for (lvl = iocg->level - 1; lvl >= 0; lvl--) { struct ioc_gq *parent = iocg->ancestors[lvl]; From patchwork Tue Sep 1 18:52:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D33FE1575 for ; Tue, 1 Sep 2020 18:56:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BCD8D2065F for ; Tue, 1 Sep 2020 18:56:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986587; bh=6XeA/D7MFgUXMsHKWSD6qocnNhXey5Nm/AvYkicFUxM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=CdNA1/+7Gr6sKjnCyjHztaeTz/egfhcsOGtGmf1k/vD+oK6l//ymRhkDfz0KioqAy HfOsWwE7Dr90TOMSqEKY1TpSUpLGnY+8/U8XhEcqbFvNKYyWDZAO5qflq6tm1r6JAH +D9wazaTqp7wuVTSTcVarNW120Hsdc7cJlNlZh0c= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732649AbgIAS40 (ORCPT ); Tue, 1 Sep 2020 14:56:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731468AbgIASxY (ORCPT ); Tue, 1 Sep 2020 14:53:24 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7053DC061244; Tue, 1 Sep 2020 11:53:23 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id j10so1026464qvk.11; Tue, 01 Sep 2020 11:53:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=amdTNvVUod/ZAMwArtcpxqunrtULsg3CQCH6qJ0Kar8=; b=E7OQY/ZUcbJ82owa00eBiNWVR1HIGMW4MQdIXIQY1scLftcxLCgAii3l726ITeWKKK lnzFqYDNByOjTEYnJGkWVI715cM7T/tnN1hIduLu0HnSnC5+oXJSbLhnbUVIN+LbgtzS av7C3eGwknL2vhtYCs6LcMLqXKogImxvhAuNfuM6CO+jugBZeF4zHj/lQZY+wHlX6mB+ 6BdSyOIXb2DnxENTNn9uHI5cELCpucTO4zUDx61781/LYLikyLKzBPiexYxxQIPp1w3z P91On/L40LcRDZhQVGoLW8CJwOJoZTNHEmr4il3F2GStdJCElR2Umbwffv+xpYZs6r2g I7Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=amdTNvVUod/ZAMwArtcpxqunrtULsg3CQCH6qJ0Kar8=; b=IOGCt1tAzSFkumonjCDtS9nohDKLx7x3wGjwAgnJBYvj9PmX+z2BYaP+JLtLGfV/v2 6YVOX3WZu9Vr/NtH2wyX4EhHZiAIoYM9LSABNHLx5us/OY6OdE7KXVHh01+CROSvwWil wfEUk+XJNuMOThdFg/WwTaE7Ys4lImjGIwg1cuIFdcOm/stPD1ymu5xMz813Clsl07Y9 xLGJqRvqiwiUfqcMwDffjGNmnfgZrmE3/27XmIXl88nKgf8jwdV5Qxa/tk/pe7+seNPQ 6ZlKTD8ROD8WHOefouJ394h6UommVQdfm6TvELWbO6mOlAtZqdFkwLl8pKWJ0XpGHgis sZXQ== X-Gm-Message-State: AOAM531stDU2Bbw66Gms9PRhtP0FpkwEqz1e48VB0Vd4tq2ctKyeWKlz INX2FplTr8GO9W6WblnWzO4= X-Google-Smtp-Source: ABdhPJxJx0spsRBX6Q/COv4QVeQIFt6kIp3iRl+hslnbina7WSdv+PfDhXvbvRXs230ykJSvkczC/w== X-Received: by 2002:a05:6214:d43:: with SMTP id 3mr3378307qvr.47.1598986402580; Tue, 01 Sep 2020 11:53:22 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id 194sm2403401qke.36.2020.09.01.11.53.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:21 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 06/27] blk-iocost: move iocg_kick_delay() above iocg_kick_waitq() Date: Tue, 1 Sep 2020 14:52:36 -0400 Message-Id: <20200901185257.645114-7-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org We'll make iocg_kick_waitq() call iocg_kick_delay(). Reorder them in preparation. This is pure code reorganization. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 120 ++++++++++++++++++++++----------------------- 1 file changed, 60 insertions(+), 60 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 8dfe73dde2a8..ac22d761a350 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1115,6 +1115,66 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) return false; } +static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) +{ + struct ioc *ioc = iocg->ioc; + struct blkcg_gq *blkg = iocg_to_blkg(iocg); + u64 vtime = atomic64_read(&iocg->vtime); + u64 vmargin = ioc->margin_us * now->vrate; + u64 margin_ns = ioc->margin_us * NSEC_PER_USEC; + u64 delta_ns, expires, oexpires; + u32 hw_inuse; + + lockdep_assert_held(&iocg->waitq.lock); + + /* debt-adjust vtime */ + current_hweight(iocg, NULL, &hw_inuse); + vtime += abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); + + /* + * Clear or maintain depending on the overage. Non-zero vdebt is what + * guarantees that @iocg is online and future iocg_kick_delay() will + * clear use_delay. Don't leave it on when there's no vdebt. + */ + if (!iocg->abs_vdebt || time_before_eq64(vtime, now->vnow)) { + blkcg_clear_delay(blkg); + return false; + } + if (!atomic_read(&blkg->use_delay) && + time_before_eq64(vtime, now->vnow + vmargin)) + return false; + + /* use delay */ + delta_ns = DIV64_U64_ROUND_UP(vtime - now->vnow, + now->vrate) * NSEC_PER_USEC; + blkcg_set_delay(blkg, delta_ns); + expires = now->now_ns + delta_ns; + + /* if already active and close enough, don't bother */ + oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer)); + if (hrtimer_is_queued(&iocg->delay_timer) && + abs(oexpires - expires) <= margin_ns / 4) + return true; + + hrtimer_start_range_ns(&iocg->delay_timer, ns_to_ktime(expires), + margin_ns / 4, HRTIMER_MODE_ABS); + return true; +} + +static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer) +{ + struct ioc_gq *iocg = container_of(timer, struct ioc_gq, delay_timer); + struct ioc_now now; + unsigned long flags; + + spin_lock_irqsave(&iocg->waitq.lock, flags); + ioc_now(iocg->ioc, &now); + iocg_kick_delay(iocg, &now); + spin_unlock_irqrestore(&iocg->waitq.lock, flags); + + return HRTIMER_NORESTART; +} + static int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode, int flags, void *key) { @@ -1211,66 +1271,6 @@ static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer) return HRTIMER_NORESTART; } -static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) -{ - struct ioc *ioc = iocg->ioc; - struct blkcg_gq *blkg = iocg_to_blkg(iocg); - u64 vtime = atomic64_read(&iocg->vtime); - u64 vmargin = ioc->margin_us * now->vrate; - u64 margin_ns = ioc->margin_us * NSEC_PER_USEC; - u64 delta_ns, expires, oexpires; - u32 hw_inuse; - - lockdep_assert_held(&iocg->waitq.lock); - - /* debt-adjust vtime */ - current_hweight(iocg, NULL, &hw_inuse); - vtime += abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); - - /* - * Clear or maintain depending on the overage. Non-zero vdebt is what - * guarantees that @iocg is online and future iocg_kick_delay() will - * clear use_delay. Don't leave it on when there's no vdebt. - */ - if (!iocg->abs_vdebt || time_before_eq64(vtime, now->vnow)) { - blkcg_clear_delay(blkg); - return false; - } - if (!atomic_read(&blkg->use_delay) && - time_before_eq64(vtime, now->vnow + vmargin)) - return false; - - /* use delay */ - delta_ns = DIV64_U64_ROUND_UP(vtime - now->vnow, - now->vrate) * NSEC_PER_USEC; - blkcg_set_delay(blkg, delta_ns); - expires = now->now_ns + delta_ns; - - /* if already active and close enough, don't bother */ - oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer)); - if (hrtimer_is_queued(&iocg->delay_timer) && - abs(oexpires - expires) <= margin_ns / 4) - return true; - - hrtimer_start_range_ns(&iocg->delay_timer, ns_to_ktime(expires), - margin_ns / 4, HRTIMER_MODE_ABS); - return true; -} - -static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer) -{ - struct ioc_gq *iocg = container_of(timer, struct ioc_gq, delay_timer); - struct ioc_now now; - unsigned long flags; - - spin_lock_irqsave(&iocg->waitq.lock, flags); - ioc_now(iocg->ioc, &now); - iocg_kick_delay(iocg, &now); - spin_unlock_irqrestore(&iocg->waitq.lock, flags); - - return HRTIMER_NORESTART; -} - static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p) { u32 nr_met[2] = { }; From patchwork Tue Sep 1 18:52:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749407 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CF48914E5 for ; Tue, 1 Sep 2020 18:56:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B86142087D for ; Tue, 1 Sep 2020 18:56:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986599; bh=Ad94xsYs79tH1/gGWxP1Y3kYMJf9O2EM8/Z9dILL5b8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=affxDnAtucE1KTNIRiLQc7YmBZYg73H4vNtYMwmn3bIEInAIWooQerT+D81Z60rbB OXwOrrK88nAvjnIaj3QzAshxyDE9CeZuQPYZsj8UyUjqdX+tELmZdzc+cQRmpFhoeo frF/6OWQcUKcLTIR0bz4RzHEsQRJoc5jb6/tZyd4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732627AbgIAS40 (ORCPT ); Tue, 1 Sep 2020 14:56:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729621AbgIASx0 (ORCPT ); Tue, 1 Sep 2020 14:53:26 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B983C061245; Tue, 1 Sep 2020 11:53:26 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id t20so1744358qtr.8; Tue, 01 Sep 2020 11:53:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x/aOqugodrjwVnz0DE4cksx9JmSppPabXXJ/nxfNtZc=; b=jev6cNZQ2D4dT1yLxz8wo59LUTrlGT9NjwNMYlSaOJupdSMIHQqhqkGoiFnV5DcNBK 82fbqwvwQMUFuLRqhPkGGSwvv3LKRacsAvrU2//oJojY6uQjwKHPXl4oUzd6bvZUPwDM 5mtWTRwLS1V9vgjxQMw5mx7y1Z/lRFe/g8xJUSHQvu9tZ41DR28jl9G6tgQ/LcBfbZug +Hqz6m94YZD2hYkG3Mxwrl2UcEUDSvls530wuFIPHkJAERESWx63gKPLBCstHpfDOBxc 6oSzXm8fSrDVGHwA73FrNH0qrXs6X3t9b5TcerJ5aWpH8hk95SH+OeiI7iaouB4ZNTcm ndAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=x/aOqugodrjwVnz0DE4cksx9JmSppPabXXJ/nxfNtZc=; b=W06OJedFn/2VFnOw8zHlU/D28K5Bkdppku9nDGJ2qS/87Vjs2tOjJ7VDdpWOL4371A DJRrLppkcDte9ygiKCMHMMsobWHjQYb3x/xqT6a1UevMfWGnSpDKjrw9e+2yxifIdOtn VsamXHdSMUNBfU4zqEJCjMOxi7hKuv8giGgsQJrq7HP7Cihi5rErfQtDVxhHTWCQfhbJ AtkZiDA7KKnIbds3MZBM363q2YxLShwl5XrfyO8zwRuWMy+YpOsa5t6LPieDWHC6va+t 0gQbRPUa/d1W/Mh26YsK+JfZvWpTaqSVvlVFilpsCkidnfoXyuPdWMcnvfB8b0CYg4Yg /mgA== X-Gm-Message-State: AOAM532IbNgw4gYFkgrZEM/x72HcQpXhVNF76gMiVXA7ZlsJWlGdX4ZO OChUwdWzirsuIxJtklCs8t8= X-Google-Smtp-Source: ABdhPJxuQLCLuiS/N4X1Xuj6Bvtd+9llMRvUZvvhogAR26Hlt/0RlGF0I1t/CoxqSkdfpZujCjJ4Bw== X-Received: by 2002:ac8:691:: with SMTP id f17mr535054qth.241.1598986405405; Tue, 01 Sep 2020 11:53:25 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id k64sm2597259qkc.105.2020.09.01.11.53.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:24 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 07/27] blk-iocost: make iocg_kick_waitq() call iocg_kick_delay() after paying debt Date: Tue, 1 Sep 2020 14:52:37 -0400 Message-Id: <20200901185257.645114-8-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org iocg_kick_waitq() is the function which pays debt and iocg_kick_delay() updates the actual delay status accordingly. If iocg_kick_delay() is not called after iocg_kick_delay() updated debt, unnecessarily large delays can be applied temporarily. Let's make sure such conditions don't occur by making iocg_kick_waitq() always call iocg_kick_delay() after paying debt. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index ac22d761a350..b2b8dfbeee5a 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1226,6 +1226,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now) atomic64_add(delta, &iocg->vtime); atomic64_add(delta, &iocg->done_vtime); iocg->abs_vdebt -= abs_delta; + + iocg_kick_delay(iocg, now); } /* @@ -1383,7 +1385,6 @@ static void ioc_timer_fn(struct timer_list *timer) if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt) { /* might be oversleeping vtime / hweight changes, kick */ iocg_kick_waitq(iocg, &now); - iocg_kick_delay(iocg, &now); } else if (iocg_is_idle(iocg)) { /* no waiter and idle, deactivate */ iocg->last_inuse = iocg->inuse; From patchwork Tue Sep 1 18:52:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749397 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FBDD1575 for ; Tue, 1 Sep 2020 18:55:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECB4920767 for ; Tue, 1 Sep 2020 18:55:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986553; bh=yGwj+8iu7furKFCekE3bhMjmZAZ6ErAbyClftPWGO7c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=MqcRF68/+HXn2RxkkCI9GimXulzqSEjgx98fjhaGL5e3iLKROlFZo4IraG9dcs5PZ 3y4GwoLcD9ETsSP4qcaWADB/84jO6qChuMnPUOzudXLWiAo8s8NOJ+Zb5O4SstfFbb 6Nc52lIvp17QNGbXZhQm79oSQ1+03TbdYGt6E8vA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731591AbgIASxl (ORCPT ); Tue, 1 Sep 2020 14:53:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731560AbgIASxa (ORCPT ); Tue, 1 Sep 2020 14:53:30 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00DA3C061244; Tue, 1 Sep 2020 11:53:30 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id v69so1992156qkb.7; Tue, 01 Sep 2020 11:53:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/e5PktXRLfAON/oeMdCF9GHjT7JkUKFYrJ6ekTNbMj4=; b=fVMNPf9V07T3Oj0WLXkxJyr2vUryz+0eFnqMy0IfL0m9+flFyBQJkyxPkvXdbI+V5y fp50Dp+/Sw0qfga/3hn/lHcdMLzu9HKS6cbiP8vssRSknZyEF9y2ct3EfuzNb8Kd9fYT YpqpPe51lFRcaFG6N+Lz94nju/FMYSyauWu9TZ70urnuYUS5ZXwJGG7VsmVEajHW9LJ0 TQgAB3lwHPjX+rpEDclBOJ3LGs0imUwHCUXFyX3E291U0rRRwKad7A8bW4gty3YipSl7 n+loyCrxZg6UQnSEQEsBXo4D3F87fRblhrgP2P3+jfsgHN3ssjLyxgRXh3RHtlsLmwWg At+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=/e5PktXRLfAON/oeMdCF9GHjT7JkUKFYrJ6ekTNbMj4=; b=hPj+hH0MG1DLxMQlvrJqj/m7IKtnpEMS5j41I9+/QEnTYWKngfCACnGR2oEeb/1hIm Zhpf4/NpHArV3kX3KsTIEbcrndOS9CA+QOvhH2mR8aa6rG2AH11wnhMwvFW8UvjW3E+N lPhgCHKvqd1RwsQtnY/QHjed5SLfyJtvbj3GGlYcBlVUO7wdNyLQmYdAzgULSc5Qmzog AcZBFFZq0XC9IjylPcecp3rFRlEjFeuswwxzlKpgFmjjrN+TNRhjeFf+MuW5TI9V95t+ sb2ORE3/fuSWr0Acr62y7GU99xYiioaeU5Bk7w4WeW8IKwRj8PSm/Nc/Jj/84iu43Xi4 Ah5g== X-Gm-Message-State: AOAM531mE2IDcq85rMxN9VAbS17zbBlcpLWTAN/3VbDYiz8OvYSmVT0L sBPZPPzon1MGRDGY6JaTB4c= X-Google-Smtp-Source: ABdhPJxAdAjHTB/nWBfK3Z41i27byRG6ZciKBOs2oe+3yS5O5HlPuyuBOcg0mbMd1VUxGcSh4HzJ4g== X-Received: by 2002:a37:48c7:: with SMTP id v190mr3233750qka.153.1598986409082; Tue, 01 Sep 2020 11:53:29 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id i65sm2407530qkf.126.2020.09.01.11.53.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:28 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 08/27] blk-iocost: s/HWEIGHT_WHOLE/WEIGHT_ONE/g Date: Tue, 1 Sep 2020 14:52:38 -0400 Message-Id: <20200901185257.645114-9-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org We're gonna use HWEIGHT_WHOLE for regular weights too. Let's rename it to WEIGHT_ONE. Pure rename. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index b2b8dfbeee5a..5e6d56eec1c9 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -68,7 +68,7 @@ * gets 300/(100+300) or 75% share, and A0 and A1 equally splits the rest, * 12.5% each. The distribution mechanism only cares about these flattened * shares. They're called hweights (hierarchical weights) and always add - * upto 1 (HWEIGHT_WHOLE). + * upto 1 (WEIGHT_ONE). * * A given cgroup's vtime runs slower in inverse proportion to its hweight. * For example, with 12.5% weight, A0's time runs 8 times slower (100/12.5) @@ -246,7 +246,7 @@ enum { MIN_VALID_USAGES = 2, /* 1/64k is granular enough and can easily be handled w/ u32 */ - HWEIGHT_WHOLE = 1 << 16, + WEIGHT_ONE = 1 << 16, /* * As vtime is used to calculate the cost of each IO, it needs to @@ -285,8 +285,8 @@ enum { * donate the surplus. */ SURPLUS_SCALE_PCT = 125, /* * 125% */ - SURPLUS_SCALE_ABS = HWEIGHT_WHOLE / 50, /* + 2% */ - SURPLUS_MIN_ADJ_DELTA = HWEIGHT_WHOLE / 33, /* 3% */ + SURPLUS_SCALE_ABS = WEIGHT_ONE / 50, /* + 2% */ + SURPLUS_MIN_ADJ_DELTA = WEIGHT_ONE / 33, /* 3% */ /* switch iff the conditions are met for longer than this */ AUTOP_CYCLE_NSEC = 10LLU * NSEC_PER_SEC, @@ -491,7 +491,7 @@ struct ioc_gq { struct hrtimer waitq_timer; struct hrtimer delay_timer; - /* usage is recorded as fractions of HWEIGHT_WHOLE */ + /* usage is recorded as fractions of WEIGHT_ONE */ int usage_idx; u32 usages[NR_USAGE_SLOTS]; @@ -658,7 +658,7 @@ static struct ioc_cgrp *blkcg_to_iocc(struct blkcg *blkcg) */ static u64 abs_cost_to_cost(u64 abs_cost, u32 hw_inuse) { - return DIV64_U64_ROUND_UP(abs_cost * HWEIGHT_WHOLE, hw_inuse); + return DIV64_U64_ROUND_UP(abs_cost * WEIGHT_ONE, hw_inuse); } /* @@ -666,7 +666,7 @@ static u64 abs_cost_to_cost(u64 abs_cost, u32 hw_inuse) */ static u64 cost_to_abs_cost(u64 cost, u32 hw_inuse) { - return DIV64_U64_ROUND_UP(cost * hw_inuse, HWEIGHT_WHOLE); + return DIV64_U64_ROUND_UP(cost * hw_inuse, WEIGHT_ONE); } static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, u64 cost) @@ -980,7 +980,7 @@ static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep */ smp_rmb(); - hwa = hwi = HWEIGHT_WHOLE; + hwa = hwi = WEIGHT_ONE; for (lvl = 0; lvl <= iocg->level - 1; lvl++) { struct ioc_gq *parent = iocg->ancestors[lvl]; struct ioc_gq *child = iocg->ancestors[lvl + 1]; @@ -2088,8 +2088,8 @@ static void ioc_pd_init(struct blkg_policy_data *pd) atomic64_set(&iocg->done_vtime, now.vnow); atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period)); INIT_LIST_HEAD(&iocg->active_list); - iocg->hweight_active = HWEIGHT_WHOLE; - iocg->hweight_inuse = HWEIGHT_WHOLE; + iocg->hweight_active = WEIGHT_ONE; + iocg->hweight_inuse = WEIGHT_ONE; init_waitqueue_head(&iocg->waitq); hrtimer_init(&iocg->waitq_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); From patchwork Tue Sep 1 18:52:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749363 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C46C61575 for ; Tue, 1 Sep 2020 18:53:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AD67A2078B for ; Tue, 1 Sep 2020 18:53:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986422; bh=BGvj/EP9LSFIYvBUYSS3dsIKkhtKnXvRzA4OfqOEjK4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=UnUtO/2w2w5eUPJ+TwS79O831TmsUF4N678ny5L9Up0B+vg874LEyEjmkNGbqZT6w yKhvJ2GVi2KuzW7VPY6t61xd+BtZUxaKPai0A6iP4xAjsmUNVdGFexa57JEg8BAGrd IuWifE35Z5Ryx3kyweyRM0MonkjJk75IAQoQC22g= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731329AbgIASxk (ORCPT ); Tue, 1 Sep 2020 14:53:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731563AbgIASxh (ORCPT ); Tue, 1 Sep 2020 14:53:37 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34EDAC061245; Tue, 1 Sep 2020 11:53:32 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id d27so1739689qtg.4; Tue, 01 Sep 2020 11:53:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=b9CT/Pfaa29kbHcQyFjwoJj23rZzAWWYwFA5TYGSz5M=; b=i/BXrQkgkv7QT7qjcqY0KSGlDoxBJwW4KpDXH82QhKpbGtbdebLOUTjaCUe5b5/nZw rCODKcxJIXdNwj42TeSpSGUushKooNwe0n3mlchxpiyPl8W3/5+zspLkQF/OyPcgR5RS FJV+btITXYPBoXMt3anyJhpBxeVwd5xHqtJEcBZFZK4HWZkN4xQDvh9WMz59KryVmlt5 +sAX8KZrcybC1/u2MZn457a/XmuZ2TgsX5GmhJtNKYvATshK2VL+GZPE3iQg/N23x0Fx MLfIT9ZwD1/JbrvEZZaHDcKyfCDIuOLNTUPy0ERi4n1tVf3flcadR1IiAx/rpA1ZZ9rj XjUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=b9CT/Pfaa29kbHcQyFjwoJj23rZzAWWYwFA5TYGSz5M=; b=cqsCcjHb5mhsuQuz+oYHGXC+ny2Si/C37i1dlC+DTot7YAHHVKf0od563E/j6mi4Vo c80UB3hpabjYXPfZ7XAGJ40jbm0zzHmLnf9H17rcNqGvU7RXryRpbKVjNhQtm7zdDBaF 5zkmuUV7ItNxTer4uMTg0eVWlz9p5zNL12IFhwsxIBo9R7wCN8INWr87A3xnDuoRQiTQ cZjDtNhWrTlC3xy9XwGZJBPeKm1uLPOv3izfyDNOFCHLw9XUi/gLerUPl0VBCveEse2M egsa/rC/1Ad+c6cYJBMTn/v4RdhrTRBVFZryotbcqgin67m6tSrjl+SnTwE7zmIxucDJ KYCg== X-Gm-Message-State: AOAM530cIrS9kQ4v+0MueOt/kjfz7DvKCpgw6Tg4Q6XisDJpevLdYBZ6 TnyNGYSno/FBmBtzVc3e9u86+WTv9tZhCQ== X-Google-Smtp-Source: ABdhPJwE7t2uWDiypDRMlzVcJMFvJe6QxLN4rRdw6ZVCIo8dU3IItrGwF39QORSjPGAbfwD9lV5b9g== X-Received: by 2002:ac8:c8c:: with SMTP id n12mr3300144qti.226.1598986411316; Tue, 01 Sep 2020 11:53:31 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id i1sm2585368qkd.58.2020.09.01.11.53.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:30 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 09/27] blk-iocost: use WEIGHT_ONE based fixed point number for weights Date: Tue, 1 Sep 2020 14:52:39 -0400 Message-Id: <20200901185257.645114-10-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org To improve weight donations, we want to able to scale inuse with a greater accuracy and down below 1. Let's make non-hierarchical weights to use WEIGHT_ONE based fixed point numbers too like hierarchical ones. This doesn't cause any behavior changes yet. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 5e6d56eec1c9..00c5a3ad2b5b 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -984,8 +984,8 @@ static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep for (lvl = 0; lvl <= iocg->level - 1; lvl++) { struct ioc_gq *parent = iocg->ancestors[lvl]; struct ioc_gq *child = iocg->ancestors[lvl + 1]; - u32 active_sum = READ_ONCE(parent->child_active_sum); - u32 inuse_sum = READ_ONCE(parent->child_inuse_sum); + u64 active_sum = READ_ONCE(parent->child_active_sum); + u64 inuse_sum = READ_ONCE(parent->child_inuse_sum); u32 active = READ_ONCE(child->active); u32 inuse = READ_ONCE(child->inuse); @@ -993,11 +993,11 @@ static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep if (!active_sum || !inuse_sum) continue; - active_sum = max(active, active_sum); - hwa = hwa * active / active_sum; /* max 16bits * 10000 */ + active_sum = max_t(u64, active, active_sum); + hwa = div64_u64((u64)hwa * active, active_sum); - inuse_sum = max(inuse, inuse_sum); - hwi = hwi * inuse / inuse_sum; /* max 16bits * 10000 */ + inuse_sum = max_t(u64, inuse, inuse_sum); + hwi = div64_u64((u64)hwi * inuse, inuse_sum); } iocg->hweight_active = max_t(u32, hwa, 1); @@ -1022,7 +1022,8 @@ static void weight_updated(struct ioc_gq *iocg) weight = iocg->cfg_weight ?: iocc->dfl_weight; if (weight != iocg->weight && iocg->active) propagate_weights(iocg, weight, - DIV64_U64_ROUND_UP(iocg->inuse * weight, iocg->weight)); + DIV64_U64_ROUND_UP((u64)iocg->inuse * weight, + iocg->weight)); iocg->weight = weight; } @@ -2050,7 +2051,7 @@ static struct blkcg_policy_data *ioc_cpd_alloc(gfp_t gfp) if (!iocc) return NULL; - iocc->dfl_weight = CGROUP_WEIGHT_DFL; + iocc->dfl_weight = CGROUP_WEIGHT_DFL * WEIGHT_ONE; return &iocc->cpd; } @@ -2136,7 +2137,7 @@ static u64 ioc_weight_prfill(struct seq_file *sf, struct blkg_policy_data *pd, struct ioc_gq *iocg = pd_to_iocg(pd); if (dname && iocg->cfg_weight) - seq_printf(sf, "%s %u\n", dname, iocg->cfg_weight); + seq_printf(sf, "%s %u\n", dname, iocg->cfg_weight / WEIGHT_ONE); return 0; } @@ -2146,7 +2147,7 @@ static int ioc_weight_show(struct seq_file *sf, void *v) struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); struct ioc_cgrp *iocc = blkcg_to_iocc(blkcg); - seq_printf(sf, "default %u\n", iocc->dfl_weight); + seq_printf(sf, "default %u\n", iocc->dfl_weight / WEIGHT_ONE); blkcg_print_blkgs(sf, blkcg, ioc_weight_prfill, &blkcg_policy_iocost, seq_cft(sf)->private, false); return 0; @@ -2172,7 +2173,7 @@ static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf, return -EINVAL; spin_lock(&blkcg->lock); - iocc->dfl_weight = v; + iocc->dfl_weight = v * WEIGHT_ONE; hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) { struct ioc_gq *iocg = blkg_to_iocg(blkg); @@ -2203,7 +2204,7 @@ static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf, } spin_lock(&iocg->ioc->lock); - iocg->cfg_weight = v; + iocg->cfg_weight = v * WEIGHT_ONE; weight_updated(iocg); spin_unlock(&iocg->ioc->lock); From patchwork Tue Sep 1 18:52:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B64D166C for ; Tue, 1 Sep 2020 18:56:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 836B020767 for ; Tue, 1 Sep 2020 18:56:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986573; bh=XWh9U7tRnmqtFH5tOildfd9IzYz84Og8pnLCtJZhDXo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=leiQw+nRzhRHVd7j33jeBJVHQ7XmuHC7qIZX7VjCeBHt2LLcz+bCXRxDGtxVem7vN LepUOXWdQYRjshfYhDFO0DNJluWSRmMCmRp3HhvckVPhtQ3Zjjjm8WQoXY+lQDyj6D kpct/okJF9O3d7OVSjrXYLsFoEejdB8200+UuFcU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732567AbgIAS4L (ORCPT ); Tue, 1 Sep 2020 14:56:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731567AbgIASxh (ORCPT ); Tue, 1 Sep 2020 14:53:37 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74D4CC061246; Tue, 1 Sep 2020 11:53:36 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id z2so1731881qtv.12; Tue, 01 Sep 2020 11:53:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kkFoDBzJXtohjGExUSaDt/SZfRBA8OsFvyf57BiJvyg=; b=cxg/UxVAB6yQXR3TZQm4h8sbyRVSYu+uhDSfcqkgpdnlD21RHugcINxXFcSRleiXJv eASQJsFp8YoQy2bNQEPjflIStjRBqT3xMzOBW+SCvqxeBqAQrYVBmJZbAcLEBn+xgZI/ Q6vKOOK7PQsxfBPW/sgPXQ3gNtTxzElyrcOyhvQZbRmdhcIJqSKuOoltun0aP4ks/Aa9 mP/EJrsPFqyNQuKWojEvF5Dlzic6YjUQzAg0apY1Q/bTV/X8Xv91XuKOqqYhrmEiNJId 1OfDuY2UKMrOf0xGcPrE//vyBzVtl0l3eaIcmVJR21ld44TE7ZAYR+TWA5PTE5FqoyuX ziTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=kkFoDBzJXtohjGExUSaDt/SZfRBA8OsFvyf57BiJvyg=; b=fJjzxBW+wBxW3qqGprZDRiEntaRuaUu0+L91/jIgd+g/Xc9ZW1u4HqmWhqWel8HFbS nLECk+tnfk97OGqZZpEem4V/Svfq1SOgoQi3ewcKXDGHCGHS++2C3yMYiB7+eFmZnL0N yZjFaHE2IyxW1q5xm9cWKoZAP6sZ9gdGbyz0PJf7mR9l9ESUPIcEtXcdt96TBPpv1xCx 5Y4idY4Z7qqv5PN3qepUpuFV8bDeiRr4HfPnJZejhQx+kqoV8k4Nnfqey8pCKssQqr5N i8ZumX4Ao9KZX/c69lbvvnrFXEchMJ0e06NsiV75NW4yAVTf/Pgg/QFnzCmDfzcnKM9f XWkg== X-Gm-Message-State: AOAM532LfUW4+46ZFIOvY6Nq0D+peExUOYGPR7K3nnZbsw1F3+clqolE OvW33FGv5WFG3XKCgvOM5W8= X-Google-Smtp-Source: ABdhPJw5n8iYV30N3UTNTwNS7Ice6zxRZAH9BImIhKiWWnxXVfQZA19w8DrAhH9ttXLvz7JSIhLtjA== X-Received: by 2002:ac8:4e2f:: with SMTP id d15mr3261767qtw.20.1598986415568; Tue, 01 Sep 2020 11:53:35 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id j74sm2679768qke.7.2020.09.01.11.53.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:35 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 10/27] blk-iocost: make ioc_now->now and ioc->period_at 64bit Date: Tue, 1 Sep 2020 14:52:40 -0400 Message-Id: <20200901185257.645114-11-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org They are in microseconds and wrap in around 1.2 hours with u32. While unlikely, confusions from wraparounds are still possible. We aren't saving anything meaningful by keeping these u32. Let's make them u64. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 00c5a3ad2b5b..dc72cd965837 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -409,7 +409,7 @@ struct ioc { atomic64_t vtime_rate; seqcount_spinlock_t period_seqcount; - u32 period_at; /* wallclock starttime */ + u64 period_at; /* wallclock starttime */ u64 period_at_vtime; /* vtime starttime */ atomic64_t cur_period; /* inc'd each period */ @@ -508,7 +508,7 @@ struct ioc_cgrp { struct ioc_now { u64 now_ns; - u32 now; + u64 now; u64 vnow; u64 vrate; }; From patchwork Tue Sep 1 18:52:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D23014E5 for ; Tue, 1 Sep 2020 18:56:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5C0B62087D for ; Tue, 1 Sep 2020 18:56:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986573; bh=wbMJGGAacst2hcmPZUKV/apzDXBnfVrXGMCFmIVmOTQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=qmEvhQEbBnzTjjIfDiI8vJ5Hf2ILtI604Z1jL/pnmuoJSTnvq0AjbA3EHCPkzYUxp CKBVQk9b6VOXb5S6HDAjpHWxbw5h1zPpv29+iNEHdTfUDOMVMGs+xOsDQGCyxGTW+I 18VeY4CAu0nSyDo5rbCJLpwp4dILvp5yBcSRqT4U= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731283AbgIAS4I (ORCPT ); Tue, 1 Sep 2020 14:56:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729044AbgIASxj (ORCPT ); Tue, 1 Sep 2020 14:53:39 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1E20C061244; Tue, 1 Sep 2020 11:53:38 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id g72so1994941qke.8; Tue, 01 Sep 2020 11:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xcYuTvF8Ohy5zqCuYMaFIj11Q2zNsOBhOJ0SKfeowWw=; b=U472/xr30n+SbYC13Z1ABZBICXE1a3uAeOZy4m9jfXvJ0GEn/+mmArddjWTHxfpc+T rPR8A7BR6WeTjug0pzT2S6Rgd7Tp5CS/eTH1SrSHqv6V/l1kZU2NtLemgBBOs2YtTPji y7pEcOOB6Dj01Px9M7V1Mv6SzlcY9rDk/fL0oqjvmRoHA+lXcBd8Si6IAVviWA1J337Y fPL3knFHOZao5V+4G5XWX65be0veARsi7FmjeRbtLhRjkbo0VRQA4Va0mFqnc7gT//oz 1sgbxCh6XENhPrFl9UIvHaUIzsUB/PFcM0HnTM1EPrkBtHlr4Tnr3U2sZaSk2nsdw1aH rDeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=xcYuTvF8Ohy5zqCuYMaFIj11Q2zNsOBhOJ0SKfeowWw=; b=iMWRULyudiveuicZyHNbnNcJ1YfuDofBxgoAydh+tEysFwNXsDKjbu3+vrhJ+lY09+ X1484zTbL74dpFymsuLgF21pgLvZgW35+/uMMT08PWuryHBloT/vO9wxowObZc0EB2z9 KgKBINQYIzW16t5XNmZMDkVQNwcGY7lTbHCQe5nIlWqFa61vyvLDI4TdFCmp68Kzqwyr B9fBRobB96z3wzHRXvdRtcj+PcEJVoQZH/AbVV78pEsephEzbYOQHx1tUMf9/ute0X91 kGROkjN7CMjXpr+oxB5eaBdt85QDDweqo8Og+aNizZrPgPEWPcH4iKGmlEafCKgIJ02x EnjQ== X-Gm-Message-State: AOAM531/4qxKfablc1kqnnVojE4DIoMFaY1NPx9HCUPmRNhjf7qkQiPF dP0SjTxE7vs3V6/oheFnhmQ= X-Google-Smtp-Source: ABdhPJyeg7Zc6ub4xIR78TrlT1VcwiTxulMa2rR/4B9uimCW1OKTIRkFRkroDOreZHD1NXp/Nwhh6A== X-Received: by 2002:a05:620a:211b:: with SMTP id l27mr41243qkl.56.1598986418000; Tue, 01 Sep 2020 11:53:38 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id a3sm2427898qtj.21.2020.09.01.11.53.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:37 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 11/27] blk-iocost: streamline vtime margin and timer slack handling Date: Tue, 1 Sep 2020 14:52:41 -0400 Message-Id: <20200901185257.645114-12-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The margin handling was pretty inconsistent. * ioc->margin_us and ioc->inuse_margin_vtime were used as vtime margin thresholds. However, the two are in different units with the former requiring conversion to vtime on use. * iocg_kick_waitq() was using a quarter of WAITQ_TIMER_MARGIN_PCT of period_us as the timer slack - ~1.2%. While iocg_kick_delay() was using a quarter of ioc->margin_us - ~12.5%. There aren't strong reasons to use different values for the two. This patch cleans up margin and timer slack handling: * vtime margins are now recorded in ioc->margins.{min, max} on period duration changes and used consistently. * Timer slack is now 1% of period_us and recorded in ioc->timer_slack_ns and used consistently for iocg_kick_waitq() and iocg_kick_delay(). The only functional change is shortening of timer slack. No meaningful visible change is expected. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 67 ++++++++++++++++++++++++++-------------------- 1 file changed, 38 insertions(+), 29 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index dc72cd965837..f36988657594 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -221,11 +221,11 @@ enum { * serves as its IO credit buffer. Surplus weight adjustment is * immediately canceled if the vtime margin runs below 10%. */ - MARGIN_PCT = 50, - INUSE_MARGIN_PCT = 10, + MARGIN_MIN_PCT = 10, + MARGIN_MAX_PCT = 50, - /* Have some play in waitq timer operations */ - WAITQ_TIMER_MARGIN_PCT = 5, + /* Have some play in timer operations */ + TIMER_SLACK_PCT = 1, /* * vtime can wrap well within a reasonable uptime when vrate is @@ -374,6 +374,11 @@ struct ioc_params { u32 too_slow_vrate_pct; }; +struct ioc_margins { + s64 min; + s64 max; +}; + struct ioc_missed { local_t nr_met; local_t nr_missed; @@ -395,8 +400,9 @@ struct ioc { bool enabled; struct ioc_params params; + struct ioc_margins margins; u32 period_us; - u32 margin_us; + u32 timer_slack_ns; u64 vrate_min; u64 vrate_max; @@ -415,7 +421,6 @@ struct ioc { atomic64_t cur_period; /* inc'd each period */ int busy_level; /* saturation history */ - u64 inuse_margin_vtime; bool weights_updated; atomic_t hweight_gen; /* for lazy hweights */ @@ -678,6 +683,16 @@ static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, u64 cost) #define CREATE_TRACE_POINTS #include +static void ioc_refresh_margins(struct ioc *ioc) +{ + struct ioc_margins *margins = &ioc->margins; + u32 period_us = ioc->period_us; + u64 vrate = atomic64_read(&ioc->vtime_rate); + + margins->min = (period_us * MARGIN_MIN_PCT / 100) * vrate; + margins->max = (period_us * MARGIN_MAX_PCT / 100) * vrate; +} + /* latency Qos params changed, update period_us and all the dependent params */ static void ioc_refresh_period_us(struct ioc *ioc) { @@ -711,9 +726,10 @@ static void ioc_refresh_period_us(struct ioc *ioc) /* calculate dependent params */ ioc->period_us = period_us; - ioc->margin_us = period_us * MARGIN_PCT / 100; - ioc->inuse_margin_vtime = DIV64_U64_ROUND_UP( - period_us * VTIME_PER_USEC * INUSE_MARGIN_PCT, 100); + ioc->timer_slack_ns = div64_u64( + (u64)period_us * NSEC_PER_USEC * TIMER_SLACK_PCT, + 100); + ioc_refresh_margins(ioc); } static int ioc_autop_idx(struct ioc *ioc) @@ -1031,7 +1047,7 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; u64 last_period, cur_period, max_period_delta; - u64 vtime, vmargin, vmin; + u64 vtime, vmin; int i; /* @@ -1077,8 +1093,7 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) */ max_period_delta = DIV64_U64_ROUND_UP(VTIME_VALID_DUR, ioc->period_us); vtime = atomic64_read(&iocg->vtime); - vmargin = ioc->margin_us * now->vrate; - vmin = now->vnow - vmargin; + vmin = now->vnow - ioc->margins.max; if (last_period + max_period_delta < cur_period || time_before64(vtime, vmin)) { @@ -1121,8 +1136,6 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) struct ioc *ioc = iocg->ioc; struct blkcg_gq *blkg = iocg_to_blkg(iocg); u64 vtime = atomic64_read(&iocg->vtime); - u64 vmargin = ioc->margin_us * now->vrate; - u64 margin_ns = ioc->margin_us * NSEC_PER_USEC; u64 delta_ns, expires, oexpires; u32 hw_inuse; @@ -1142,7 +1155,7 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) return false; } if (!atomic_read(&blkg->use_delay) && - time_before_eq64(vtime, now->vnow + vmargin)) + time_before_eq64(vtime, now->vnow + ioc->margins.max)) return false; /* use delay */ @@ -1154,11 +1167,11 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) /* if already active and close enough, don't bother */ oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer)); if (hrtimer_is_queued(&iocg->delay_timer) && - abs(oexpires - expires) <= margin_ns / 4) + abs(oexpires - expires) <= ioc->timer_slack_ns) return true; hrtimer_start_range_ns(&iocg->delay_timer, ns_to_ktime(expires), - margin_ns / 4, HRTIMER_MODE_ABS); + ioc->timer_slack_ns, HRTIMER_MODE_ABS); return true; } @@ -1206,8 +1219,6 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; struct iocg_wake_ctx ctx = { .iocg = iocg }; - u64 margin_ns = (u64)(ioc->period_us * - WAITQ_TIMER_MARGIN_PCT / 100) * NSEC_PER_USEC; u64 vdebt, vshortage, expires, oexpires; s64 vbudget; u32 hw_inuse; @@ -1243,20 +1254,20 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now) if (WARN_ON_ONCE(ctx.vbudget >= 0)) return; - /* determine next wakeup, add a quarter margin to guarantee chunking */ + /* determine next wakeup, add a timer margin to guarantee chunking */ vshortage = -ctx.vbudget; expires = now->now_ns + DIV64_U64_ROUND_UP(vshortage, now->vrate) * NSEC_PER_USEC; - expires += margin_ns / 4; + expires += ioc->timer_slack_ns; /* if already active and close enough, don't bother */ oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->waitq_timer)); if (hrtimer_is_queued(&iocg->waitq_timer) && - abs(oexpires - expires) <= margin_ns / 4) + abs(oexpires - expires) <= ioc->timer_slack_ns) return; hrtimer_start_range_ns(&iocg->waitq_timer, ns_to_ktime(expires), - margin_ns / 4, HRTIMER_MODE_ABS); + ioc->timer_slack_ns, HRTIMER_MODE_ABS); } static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer) @@ -1399,7 +1410,7 @@ static void ioc_timer_fn(struct timer_list *timer) /* calc usages and see whether some weights need to be moved around */ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { - u64 vdone, vtime, vusage, vmargin, vmin; + u64 vdone, vtime, vusage, vmin; u32 hw_active, hw_inuse, usage; /* @@ -1450,8 +1461,7 @@ static void ioc_timer_fn(struct timer_list *timer) } /* see whether there's surplus vtime */ - vmargin = ioc->margin_us * now.vrate; - vmin = now.vnow - vmargin; + vmin = now.vnow - ioc->margins.max; iocg->has_surplus = false; @@ -1623,8 +1633,7 @@ static void ioc_timer_fn(struct timer_list *timer) nr_surpluses); atomic64_set(&ioc->vtime_rate, vrate); - ioc->inuse_margin_vtime = DIV64_U64_ROUND_UP( - ioc->period_us * vrate * INUSE_MARGIN_PCT, 100); + ioc_refresh_margins(ioc); } else if (ioc->busy_level != prev_busy_level || nr_lagging) { trace_iocost_ioc_vrate_adj(ioc, atomic64_read(&ioc->vtime_rate), missed_ppm, rq_wait_pct, nr_lagging, @@ -1754,7 +1763,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) current_hweight(iocg, &hw_active, &hw_inuse); if (hw_inuse < hw_active && - time_after_eq64(vtime + ioc->inuse_margin_vtime, now.vnow)) { + time_after_eq64(vtime + ioc->margins.min, now.vnow)) { TRACE_IOCG_PATH(inuse_reset, iocg, &now, iocg->inuse, iocg->weight, hw_inuse, hw_active); spin_lock_irq(&ioc->lock); From patchwork Tue Sep 1 18:52:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 65C2714E5 for ; Tue, 1 Sep 2020 18:55:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45DBF2078B for ; Tue, 1 Sep 2020 18:55:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986555; bh=ydN5+Lv6l2x42gZy1sLPoQ6vgOTGexibRy26HXwST5s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=JP8VM2lbex22Unf3RDQvKZzs9QgQSjrAm7L872SlYYRmrjUyHJF72voaOYSJ1/BuX Vnlg3E52AiPJ2cmP0Aa1lljma3GgmobdVMM4mpP6ypmdZz0eVcUiLi//+OYD49wg+r mo1+QLTB58b4/0swcSmQpZOGVAquAfr4wsUr2XRQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726997AbgIASzw (ORCPT ); Tue, 1 Sep 2020 14:55:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731573AbgIASxl (ORCPT ); Tue, 1 Sep 2020 14:53:41 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60B91C061247; Tue, 1 Sep 2020 11:53:41 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id f142so1984761qke.13; Tue, 01 Sep 2020 11:53:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SS4MWJ4/XFRWZ111xGt7boFuHdhcJQ+ioHm3dskasW8=; b=aLzLHjIw3l5Y/zlJrrHpxgwVyA9EN030ZQbfbIo3PBVwwTgPkmMOdf2lHRGRx/2Oi8 idOJ3H1mbEU1lLN6LBjA5Mt2jXrmoWPiqxTY4/xh2gpBLDetXWS3s8noEQ1FZqgOrqAB fTNTZWN7xcg6f5w0l1hsm8kqAdUbkzvn9jYJXCiBRsyhPqePSFVTQgy3Rl/hnpzQepp3 yLedbrzxfCElwnwJFLppHckD4N0/gnFDQ5y5gwIcV2KKj2aToa89KQ+qtUAGoNI3RxAd RQvW+HLM160JH1h2k9JglmwA5bDn7f0Zm2CT/fibZk3uavdltl7XPsFbK3zAdqYx+vXZ zMlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=SS4MWJ4/XFRWZ111xGt7boFuHdhcJQ+ioHm3dskasW8=; b=badvJ9NOx+xtLBQWzobNHJchFcmy2FCL5FVIi6IgugxThLVCXHC+9+by6/SQGEZ8Q/ pW/SaNOXUL1dbOdBTYmXjTooJEEw9C9cUPgZ5bYKhcip6eqL2zgOo6aLOKGxSZKbRe6Q XTA8NP8FwSePrEd/k5CvaHSEqaRNcw+BoM6d5mVn6XpDtyfba+zNLSu/6IDLwqIKOQ+7 TIUXqN7rXMJO3YVS4/kAqE6pyR4mSm1fWDmEba1XuWTXJr/J3MDPeVJTyG8S4YE+VV6n w2VcQ5EmGHL0LIGM3DkeR6Gg0fj5DkRqaXPxsb1QHmB8OdD1Q6bs8I/jka5sj3nIVJ+j n37Q== X-Gm-Message-State: AOAM532tNyrc0O8VrkTO/AP1tcO1vv7LR+dK1uBFj0FEt3ADKwyzrQYe QeraaBSWbtfgEAAfnp9V0sw= X-Google-Smtp-Source: ABdhPJzjJyj4e+S5sxOQAJfqqzubtLaQlRtAZU6s2bMQz5FCq1i/3SI5W0JQrGSw9Ihvt8YxoEkOEA== X-Received: by 2002:a37:5d43:: with SMTP id r64mr3031347qkb.52.1598986420445; Tue, 01 Sep 2020 11:53:40 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id o16sm2577628qkk.114.2020.09.01.11.53.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:39 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 12/27] blk-iocost: grab ioc->lock for debt handling Date: Tue, 1 Sep 2020 14:52:42 -0400 Message-Id: <20200901185257.645114-13-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, debt handling requires only iocg->waitq.lock. In the future, we want to adjust and propagate inuse changes depending on debt status. Let's grab ioc->lock in debt handling paths in preparation. * Because ioc->lock nests outside iocg->waitq.lock, the decision to grab ioc->lock needs to be made before entering the critical sections. * Add and use iocg_[un]lock() which handles the conditional double locking. * Add @pay_debt to iocg_kick_waitq() so that debt payment happens only when the caller grabbed both locks. This patch is prepatory and the comments contain references to future changes. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 92 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 73 insertions(+), 19 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index f36988657594..23b173e34591 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -680,6 +680,26 @@ static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, u64 cost) atomic64_add(cost, &iocg->vtime); } +static void iocg_lock(struct ioc_gq *iocg, bool lock_ioc, unsigned long *flags) +{ + if (lock_ioc) { + spin_lock_irqsave(&iocg->ioc->lock, *flags); + spin_lock(&iocg->waitq.lock); + } else { + spin_lock_irqsave(&iocg->waitq.lock, *flags); + } +} + +static void iocg_unlock(struct ioc_gq *iocg, bool unlock_ioc, unsigned long *flags) +{ + if (unlock_ioc) { + spin_unlock(&iocg->waitq.lock); + spin_unlock_irqrestore(&iocg->ioc->lock, *flags); + } else { + spin_unlock_irqrestore(&iocg->waitq.lock, *flags); + } +} + #define CREATE_TRACE_POINTS #include @@ -1215,11 +1235,17 @@ static int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode, return 0; } -static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now) +/* + * Calculate the accumulated budget, pay debt if @pay_debt and wake up waiters + * accordingly. When @pay_debt is %true, the caller must be holding ioc->lock in + * addition to iocg->waitq.lock. + */ +static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, + struct ioc_now *now) { struct ioc *ioc = iocg->ioc; struct iocg_wake_ctx ctx = { .iocg = iocg }; - u64 vdebt, vshortage, expires, oexpires; + u64 vshortage, expires, oexpires; s64 vbudget; u32 hw_inuse; @@ -1229,25 +1255,39 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now) vbudget = now->vnow - atomic64_read(&iocg->vtime); /* pay off debt */ - vdebt = abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); - if (vdebt && vbudget > 0) { + if (pay_debt && iocg->abs_vdebt && vbudget > 0) { + u64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); u64 delta = min_t(u64, vbudget, vdebt); u64 abs_delta = min(cost_to_abs_cost(delta, hw_inuse), iocg->abs_vdebt); + lockdep_assert_held(&ioc->lock); + atomic64_add(delta, &iocg->vtime); atomic64_add(delta, &iocg->done_vtime); iocg->abs_vdebt -= abs_delta; + vbudget -= vdebt; iocg_kick_delay(iocg, now); } + /* + * Debt can still be outstanding if we haven't paid all yet or the + * caller raced and called without @pay_debt. Shouldn't wake up waiters + * under debt. Make sure @vbudget reflects the outstanding amount and is + * not positive. + */ + if (iocg->abs_vdebt) { + s64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); + vbudget = min_t(s64, 0, vbudget - vdebt); + } + /* * Wake up the ones which are due and see how much vtime we'll need * for the next one. */ ctx.hw_inuse = hw_inuse; - ctx.vbudget = vbudget - vdebt; + ctx.vbudget = vbudget; __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx); if (!waitqueue_active(&iocg->waitq)) return; @@ -1273,14 +1313,15 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now) static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer) { struct ioc_gq *iocg = container_of(timer, struct ioc_gq, waitq_timer); + bool pay_debt = READ_ONCE(iocg->abs_vdebt); struct ioc_now now; unsigned long flags; ioc_now(iocg->ioc, &now); - spin_lock_irqsave(&iocg->waitq.lock, flags); - iocg_kick_waitq(iocg, &now); - spin_unlock_irqrestore(&iocg->waitq.lock, flags); + iocg_lock(iocg, pay_debt, &flags); + iocg_kick_waitq(iocg, pay_debt, &now); + iocg_unlock(iocg, pay_debt, &flags); return HRTIMER_NORESTART; } @@ -1396,7 +1437,7 @@ static void ioc_timer_fn(struct timer_list *timer) if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt) { /* might be oversleeping vtime / hweight changes, kick */ - iocg_kick_waitq(iocg, &now); + iocg_kick_waitq(iocg, true, &now); } else if (iocg_is_idle(iocg)) { /* no waiter and idle, deactivate */ iocg->last_inuse = iocg->inuse; @@ -1743,6 +1784,8 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) struct iocg_wait wait; u32 hw_active, hw_inuse; u64 abs_cost, cost, vtime; + bool use_debt, ioc_locked; + unsigned long flags; /* bypass IOs if disabled or for root cgroup */ if (!ioc->enabled || !iocg->level) @@ -1786,15 +1829,26 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) } /* - * We activated above but w/o any synchronization. Deactivation is - * synchronized with waitq.lock and we won't get deactivated as long - * as we're waiting or has debt, so we're good if we're activated - * here. In the unlikely case that we aren't, just issue the IO. + * We're over budget. This can be handled in two ways. IOs which may + * cause priority inversions are punted to @ioc->aux_iocg and charged as + * debt. Otherwise, the issuer is blocked on @iocg->waitq. Debt handling + * requires @ioc->lock, waitq handling @iocg->waitq.lock. Determine + * whether debt handling is needed and acquire locks accordingly. */ - spin_lock_irq(&iocg->waitq.lock); + use_debt = bio_issue_as_root_blkg(bio) || fatal_signal_pending(current); + ioc_locked = use_debt || READ_ONCE(iocg->abs_vdebt); + iocg_lock(iocg, ioc_locked, &flags); + + /* + * @iocg must stay activated for debt and waitq handling. Deactivation + * is synchronized against both ioc->lock and waitq.lock and we won't + * get deactivated as long as we're waiting or has debt, so we're good + * if we're activated here. In the unlikely cases that we aren't, just + * issue the IO. + */ if (unlikely(list_empty(&iocg->active_list))) { - spin_unlock_irq(&iocg->waitq.lock); + iocg_unlock(iocg, ioc_locked, &flags); iocg_commit_bio(iocg, bio, cost); return; } @@ -1816,12 +1870,12 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) * clear them and leave @iocg inactive w/ dangling use_delay heavily * penalizing the cgroup and its descendants. */ - if (bio_issue_as_root_blkg(bio) || fatal_signal_pending(current)) { + if (use_debt) { iocg->abs_vdebt += abs_cost; if (iocg_kick_delay(iocg, &now)) blkcg_schedule_throttle(rqos->q, (bio->bi_opf & REQ_SWAP) == REQ_SWAP); - spin_unlock_irq(&iocg->waitq.lock); + iocg_unlock(iocg, ioc_locked, &flags); return; } @@ -1845,9 +1899,9 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) wait.committed = false; /* will be set true by waker */ __add_wait_queue_entry_tail(&iocg->waitq, &wait.wait); - iocg_kick_waitq(iocg, &now); + iocg_kick_waitq(iocg, ioc_locked, &now); - spin_unlock_irq(&iocg->waitq.lock); + iocg_unlock(iocg, ioc_locked, &flags); while (true) { set_current_state(TASK_UNINTERRUPTIBLE); From patchwork Tue Sep 1 18:52:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7EE914E5 for ; Tue, 1 Sep 2020 18:54:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BD1D72087D for ; Tue, 1 Sep 2020 18:54:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986444; bh=rcZVy+2qcQAOvhVdFYERSmIi2KSvkXXlV/gg+GDV+vs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=KMQFWJPBy2M+kuQeFBSdm0dXdbvHPS/uL2T+GSyqklA2E0wJjb2MxFEIkbH+usPQ2 AAjbY4fTBXmCPK1nI2UuX+CGjP7pSPn5kkcw7rqTwFapgVZpbzzU3vmEcjHhGTkgOx pnnYExC5sl9HWlfyc2QO0Mtxw5aDmixXHIpaihjk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731857AbgIASyC (ORCPT ); Tue, 1 Sep 2020 14:54:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731659AbgIASxr (ORCPT ); Tue, 1 Sep 2020 14:53:47 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57961C061244; Tue, 1 Sep 2020 11:53:45 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id g3so1671946qtq.10; Tue, 01 Sep 2020 11:53:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cXJRMse9QXXNenk8cfC4gak0dCsZiFe6e3X+8Ji4w3Q=; b=BXXoYe01DDRCKvgaScU83wvwf7XD6KQVVomVyvl/vLID4xi+n/8BW2jZ5Cs+A7gx8u got81U9z9RBnWefFOopQpK6tm83wPUkY2xAILDa9falioSlOflC/7VQalLdG1wo+1HX7 t16rxbYltdCLJI5j0zNQ3994PoCumMxjBpfo1So1B/pGpDOmiuSLOgDMcRw7zUPaMAnZ L0TusRziIeUhjWEpsQ7vNdAGAvQguj9Ff+27M2zjTj6Rx64EzT9ik8NkXA1RnzSXYJ3f 8f8gf55MH2UepPZ5V6VSWLqGZ92ltx0/obz7AHiJ/EyY1AG+HIkq59tQdwQBIdwSL6LA 4b4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=cXJRMse9QXXNenk8cfC4gak0dCsZiFe6e3X+8Ji4w3Q=; b=KamioAxU5DdhQrSlzCUJVnU30FuSYFF2oVgoihyyJ6Z+vpURD3gN8l1zyXt7ihlxqS BMZWvSxjzOKkSLMfICrfVFmLtwo/0FOqhGJe9TondKONbKiIaDZ9aLAdfJ3oRH1MWIN5 sliw7IcMofgP4fPo+aA0OzmUL9reUWcs6G5AP+3Sh/7ythBuMf2uBx/CWaTJY0tEtrKG EjDGzJ8KCVqvSDFYxr70llWm8NvVz8AeS+nDWIEka5BBoz/f1h69R9Ygmv1DVNb62jSo egA94/l5H7qMuziRCaU6hLaVMOZDYjI9lQNNUzR+dE+bpIbC8iEfxRBdpSCL+XnxwUJJ PueQ== X-Gm-Message-State: AOAM531Rx6iKUiSxIpKUO+XymhnPhO5wLEzi65xnBrHPlNJdN+5eLUSW n95CMRQKT8UbSrv8Mn6WY4aYnkSM3jE/oQ== X-Google-Smtp-Source: ABdhPJwhvNbbui/Tjp10Qa2r+WKKg7c5TccEjTh7jZV3A81MbS296RdvBjjJVsgaGC20qYmp87lbIQ== X-Received: by 2002:ac8:4cc2:: with SMTP id l2mr3256408qtv.130.1598986424410; Tue, 01 Sep 2020 11:53:44 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id v28sm2639011qtk.28.2020.09.01.11.53.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:43 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 13/27] blk-iocost: add absolute usage stat Date: Tue, 1 Sep 2020 14:52:43 -0400 Message-Id: <20200901185257.645114-14-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, iocost doesn't collect or expose any statistics punting off all monitoring duties to drgn based iocost_monitor.py. While it works for some scenarios, there are some usability and data availability challenges. For example, accurate per-cgroup usage information can't be tracked by vtime progression at all and the number available in iocg->usages[] are really short-term snapshots used for control heuristics with possibly significant errors. This patch implements per-cgroup absolute usage stat counter and exposes it through io.stat along with the current vrate. Usage stat collection and flushing employ the same method as cgroup rstat on the active iocg's and the only hot path overhead is preemption toggling and adding to a percpu counter. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 155 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 149 insertions(+), 6 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 23b173e34591..f30f9b37fcf0 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -431,6 +431,14 @@ struct ioc { bool user_cost_model:1; }; +struct iocg_pcpu_stat { + local64_t abs_vusage; +}; + +struct iocg_stat { + u64 usage_us; +}; + /* per device-cgroup pair */ struct ioc_gq { struct blkg_policy_data pd; @@ -492,10 +500,19 @@ struct ioc_gq { u32 hweight_inuse; bool has_surplus; + struct list_head walk_list; + struct wait_queue_head waitq; struct hrtimer waitq_timer; struct hrtimer delay_timer; + /* statistics */ + struct iocg_pcpu_stat __percpu *pcpu_stat; + struct iocg_stat local_stat; + struct iocg_stat desc_stat; + struct iocg_stat last_stat; + u64 last_stat_abs_vusage; + /* usage is recorded as fractions of WEIGHT_ONE */ int usage_idx; u32 usages[NR_USAGE_SLOTS]; @@ -674,10 +691,17 @@ static u64 cost_to_abs_cost(u64 cost, u32 hw_inuse) return DIV64_U64_ROUND_UP(cost * hw_inuse, WEIGHT_ONE); } -static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, u64 cost) +static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, + u64 abs_cost, u64 cost) { + struct iocg_pcpu_stat *gcs; + bio->bi_iocost_cost = cost; atomic64_add(cost, &iocg->vtime); + + gcs = get_cpu_ptr(iocg->pcpu_stat); + local64_add(abs_cost, &gcs->abs_vusage); + put_cpu_ptr(gcs); } static void iocg_lock(struct ioc_gq *iocg, bool lock_ioc, unsigned long *flags) @@ -1221,7 +1245,7 @@ static int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode, if (ctx->vbudget < 0) return -1; - iocg_commit_bio(ctx->iocg, wait->bio, cost); + iocg_commit_bio(ctx->iocg, wait->bio, wait->abs_cost, cost); /* * autoremove_wake_function() removes the wait entry only when it @@ -1382,6 +1406,87 @@ static bool iocg_is_idle(struct ioc_gq *iocg) return true; } +/* + * Call this function on the target leaf @iocg's to build pre-order traversal + * list of all the ancestors in @inner_walk. The inner nodes are linked through + * ->walk_list and the caller is responsible for dissolving the list after use. + */ +static void iocg_build_inner_walk(struct ioc_gq *iocg, + struct list_head *inner_walk) +{ + int lvl; + + WARN_ON_ONCE(!list_empty(&iocg->walk_list)); + + /* find the first ancestor which hasn't been visited yet */ + for (lvl = iocg->level - 1; lvl >= 0; lvl--) { + if (!list_empty(&iocg->ancestors[lvl]->walk_list)) + break; + } + + /* walk down and visit the inner nodes to get pre-order traversal */ + while (++lvl <= iocg->level - 1) { + struct ioc_gq *inner = iocg->ancestors[lvl]; + + /* record traversal order */ + list_add_tail(&inner->walk_list, inner_walk); + } +} + +/* collect per-cpu counters and propagate the deltas to the parent */ +static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now) +{ + struct iocg_stat new_stat; + u64 abs_vusage = 0; + u64 vusage_delta; + int cpu; + + lockdep_assert_held(&iocg->ioc->lock); + + /* collect per-cpu counters */ + for_each_possible_cpu(cpu) { + abs_vusage += local64_read( + per_cpu_ptr(&iocg->pcpu_stat->abs_vusage, cpu)); + } + vusage_delta = abs_vusage - iocg->last_stat_abs_vusage; + iocg->last_stat_abs_vusage = abs_vusage; + + iocg->local_stat.usage_us += div64_u64(vusage_delta, now->vrate); + + new_stat.usage_us = + iocg->local_stat.usage_us + iocg->desc_stat.usage_us; + + /* propagate the deltas to the parent */ + if (iocg->level > 0) { + struct iocg_stat *parent_stat = + &iocg->ancestors[iocg->level - 1]->desc_stat; + + parent_stat->usage_us += + new_stat.usage_us - iocg->last_stat.usage_us; + } + + iocg->last_stat = new_stat; +} + +/* get stat counters ready for reading on all active iocgs */ +static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now) +{ + LIST_HEAD(inner_walk); + struct ioc_gq *iocg, *tiocg; + + /* flush leaves and build inner node walk list */ + list_for_each_entry(iocg, target_iocgs, active_list) { + iocg_flush_stat_one(iocg, now); + iocg_build_inner_walk(iocg, &inner_walk); + } + + /* keep flushing upwards by walking the inner list backwards */ + list_for_each_entry_safe_reverse(iocg, tiocg, &inner_walk, walk_list) { + iocg_flush_stat_one(iocg, now); + list_del_init(&iocg->walk_list); + } +} + /* returns usage with margin added if surplus is large enough */ static u32 surplus_adjusted_hweight_inuse(u32 usage, u32 hw_inuse) { @@ -1422,6 +1527,8 @@ static void ioc_timer_fn(struct timer_list *timer) return; } + iocg_flush_stat(&ioc->active_iocgs, &now); + /* * Waiters determine the sleep durations based on the vrate they * saw at the time of sleep. If vrate has increased, some waiters @@ -1824,7 +1931,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) */ if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt && time_before_eq64(vtime + cost, now.vnow)) { - iocg_commit_bio(iocg, bio, cost); + iocg_commit_bio(iocg, bio, abs_cost, cost); return; } @@ -1849,7 +1956,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) */ if (unlikely(list_empty(&iocg->active_list))) { iocg_unlock(iocg, ioc_locked, &flags); - iocg_commit_bio(iocg, bio, cost); + iocg_commit_bio(iocg, bio, abs_cost, cost); return; } @@ -1948,7 +2055,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, */ if (rq->bio && rq->bio->bi_iocost_cost && time_before_eq64(atomic64_read(&iocg->vtime) + cost, now.vnow)) { - iocg_commit_bio(iocg, bio, cost); + iocg_commit_bio(iocg, bio, abs_cost, cost); return; } @@ -1962,7 +2069,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, iocg->abs_vdebt += abs_cost; iocg_kick_delay(iocg, &now); } else { - iocg_commit_bio(iocg, bio, cost); + iocg_commit_bio(iocg, bio, abs_cost, cost); } spin_unlock_irqrestore(&iocg->waitq.lock, flags); } @@ -2133,6 +2240,12 @@ static struct blkg_policy_data *ioc_pd_alloc(gfp_t gfp, struct request_queue *q, if (!iocg) return NULL; + iocg->pcpu_stat = alloc_percpu_gfp(struct iocg_pcpu_stat, gfp); + if (!iocg->pcpu_stat) { + kfree(iocg); + return NULL; + } + return &iocg->pd; } @@ -2152,6 +2265,7 @@ static void ioc_pd_init(struct blkg_policy_data *pd) atomic64_set(&iocg->done_vtime, now.vnow); atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period)); INIT_LIST_HEAD(&iocg->active_list); + INIT_LIST_HEAD(&iocg->walk_list); iocg->hweight_active = WEIGHT_ONE; iocg->hweight_inuse = WEIGHT_ONE; @@ -2181,18 +2295,46 @@ static void ioc_pd_free(struct blkg_policy_data *pd) if (ioc) { spin_lock_irqsave(&ioc->lock, flags); + if (!list_empty(&iocg->active_list)) { propagate_weights(iocg, 0, 0); list_del_init(&iocg->active_list); } + + WARN_ON_ONCE(!list_empty(&iocg->walk_list)); + spin_unlock_irqrestore(&ioc->lock, flags); hrtimer_cancel(&iocg->waitq_timer); hrtimer_cancel(&iocg->delay_timer); } + free_percpu(iocg->pcpu_stat); kfree(iocg); } +static size_t ioc_pd_stat(struct blkg_policy_data *pd, char *buf, size_t size) +{ + struct ioc_gq *iocg = pd_to_iocg(pd); + struct ioc *ioc = iocg->ioc; + size_t pos = 0; + + if (!ioc->enabled) + return 0; + + if (iocg->level == 0) { + unsigned vp10k = DIV64_U64_ROUND_CLOSEST( + atomic64_read(&ioc->vtime_rate) * 10000, + VTIME_PER_USEC); + pos += scnprintf(buf + pos, size - pos, " cost.vrate=%u.%02u", + vp10k / 100, vp10k % 100); + } + + pos += scnprintf(buf + pos, size - pos, " cost.usage=%llu", + iocg->last_stat.usage_us); + + return pos; +} + static u64 ioc_weight_prfill(struct seq_file *sf, struct blkg_policy_data *pd, int off) { @@ -2606,6 +2748,7 @@ static struct blkcg_policy blkcg_policy_iocost = { .pd_alloc_fn = ioc_pd_alloc, .pd_init_fn = ioc_pd_init, .pd_free_fn = ioc_pd_free, + .pd_stat_fn = ioc_pd_stat, }; static int __init ioc_init(void) From patchwork Tue Sep 1 18:52:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8275C1575 for ; Tue, 1 Sep 2020 18:55:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 604382078B for ; Tue, 1 Sep 2020 18:55:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986517; bh=tcWe8OKslQt3CgrL+pd7pksoqWMC8rYYLcw5KnH7Kyo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=CVSlDVXClNpwy95D1EHqsh3xysTdSQ/64cDbK14hrqQ/vDoejKNDmBzqXf8FxUgo6 rVkwPAEiJP/bbpPVk2yDKUdpgxQygri76uFAIlsg3rcmC+IGRVq5nRBiWENv2PNSfB DEnfbAPYUncjAxFWzGFLgOnv1442HxrXrhZyePAI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731869AbgIASyC (ORCPT ); Tue, 1 Sep 2020 14:54:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731690AbgIASxr (ORCPT ); Tue, 1 Sep 2020 14:53:47 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9180EC061245; Tue, 1 Sep 2020 11:53:47 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id h1so1022030qvo.9; Tue, 01 Sep 2020 11:53:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NloDMWjJsS+bjuVz49pvrmKeqmF57hT1Nr3il/HqC2g=; b=ebPycDI+XWZGaFDnilsFlFrO5OVRhPj02XE4tDGuTN9237wIwia46lmRS1W2nMJhik zR1HAWIuad9uIjhNnsSePxMD5XGEI6Q8/NREEl5F4p4IBE5fM3oSnvPmcc6jUSJ8QExc Mp3p1SEwURyxEYhH39CIMotzqp5qZLEMCzF25Q9bgPBYsBHrU7/0lWl/ve66RJalb6RT JoQD3dtRJEXHsXjCCLEj8/QS6tx3hpzqNKCNhCjef606UXNGxuvqU4aACNEIP+97TFjm redHzPSYcT+mPAn2MvffRlJPUXZjC91D3SAuZd95aPJ91PjM88PHsYERepmt8PFGTKwQ vuEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=NloDMWjJsS+bjuVz49pvrmKeqmF57hT1Nr3il/HqC2g=; b=I9eRNmcSfOIK6EaFZTw8ifmPXtrI24NCm9wrpMnIv2vJHUvna5q9Foz/gqhIBEjmDA 1tOwuVOQyFUQYsM+0H0KEQLXtOATA2O/O4pKYbRu+K3l9xi12x5guoIIJDz+hTEZL5Ep AuTPOJ3l1U/ayX6xdei3zyryKMZCHp1bFd5iRVDhza3/AGRiMGsSBRXQ5aHQKIHugyqT zXHuOp2hXvn1bjCFNNq8ErQRngsH+nUNwCWgpM0IcvwpJ5uuMjRYhoiYedsf1MPltX9t 1PKB1K/xOyRpgihTKGJhtVihJklo5ZP3+zVGyLPnCIFZZSmdXBmiNp1kMrmcQV8YG1OQ +xnQ== X-Gm-Message-State: AOAM531vgVnOz0Zv/saSy1g+3Bd4Fhui2SLciJ0K00ecUJel4wkIahNl RwEJGF/wFshgj5nNqMgR5dZ9OFa8xrERVw== X-Google-Smtp-Source: ABdhPJz5vtIIkw4vrekQ9J4w7X5VWbykUaCY6gXRRGjSOMhB8/snecgoJs8lsxmvuPSBGEXc/ZFHUg== X-Received: by 2002:a0c:fece:: with SMTP id z14mr3450483qvs.66.1598986426674; Tue, 01 Sep 2020 11:53:46 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id k185sm2544439qkd.94.2020.09.01.11.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:46 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 14/27] blk-iocost: calculate iocg->usages[] from iocg->local_stat.usage_us Date: Tue, 1 Sep 2020 14:52:44 -0400 Message-Id: <20200901185257.645114-15-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, iocg->usages[] which are used to guide inuse adjustments are calculated from vtime deltas. This, however, assumes that the hierarchical inuse weight at the time of calculation held for the entire period, which often isn't true and can lead to significant errors. Now that we have absolute usage information collected, we can derive iocg->usages[] from iocg->local_stat.usage_us so that inuse adjustment decisions are made based on actual absolute usage. The calculated usage is clamped between 1 and WEIGHT_ONE and WEIGHT_ONE is also used to signal saturation regardless of the current hierarchical inuse weight. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 72 ++++++++++++++++++++++------------- include/trace/events/iocost.h | 7 +--- 2 files changed, 47 insertions(+), 32 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index f30f9b37fcf0..2496674bbbf4 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -476,14 +476,10 @@ struct ioc_gq { * `vtime_done` is the same but progressed on completion rather * than issue. The delta behind `vtime` represents the cost of * currently in-flight IOs. - * - * `last_vtime` is used to remember `vtime` at the end of the last - * period to calculate utilization. */ atomic64_t vtime; atomic64_t done_vtime; u64 abs_vdebt; - u64 last_vtime; /* * The period this iocg was last active in. Used for deactivation @@ -506,6 +502,9 @@ struct ioc_gq { struct hrtimer waitq_timer; struct hrtimer delay_timer; + /* timestamp at the latest activation */ + u64 activated_at; + /* statistics */ struct iocg_pcpu_stat __percpu *pcpu_stat; struct iocg_stat local_stat; @@ -514,6 +513,7 @@ struct ioc_gq { u64 last_stat_abs_vusage; /* usage is recorded as fractions of WEIGHT_ONE */ + u32 usage_delta_us; int usage_idx; u32 usages[NR_USAGE_SLOTS]; @@ -1159,7 +1159,7 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) TRACE_IOCG_PATH(iocg_activate, iocg, now, last_period, cur_period, vtime); - iocg->last_vtime = vtime; + iocg->activated_at = now->now; if (ioc->running == IOC_IDLE) { ioc->running = IOC_RUNNING; @@ -1451,7 +1451,8 @@ static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now) vusage_delta = abs_vusage - iocg->last_stat_abs_vusage; iocg->last_stat_abs_vusage = abs_vusage; - iocg->local_stat.usage_us += div64_u64(vusage_delta, now->vrate); + iocg->usage_delta_us = div64_u64(vusage_delta, now->vrate); + iocg->local_stat.usage_us += iocg->usage_delta_us; new_stat.usage_us = iocg->local_stat.usage_us + iocg->desc_stat.usage_us; @@ -1558,8 +1559,9 @@ static void ioc_timer_fn(struct timer_list *timer) /* calc usages and see whether some weights need to be moved around */ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { - u64 vdone, vtime, vusage, vmin; + u64 vdone, vtime, usage_us, vmin; u32 hw_active, hw_inuse, usage; + int uidx; /* * Collect unused and wind vtime closer to vnow to prevent @@ -1583,27 +1585,44 @@ static void ioc_timer_fn(struct timer_list *timer) time_before64(vdone, now.vnow - period_vtime)) nr_lagging++; - if (waitqueue_active(&iocg->waitq)) - vusage = now.vnow - iocg->last_vtime; - else if (time_before64(iocg->last_vtime, vtime)) - vusage = vtime - iocg->last_vtime; - else - vusage = 0; - - iocg->last_vtime += vusage; /* - * Factor in in-flight vtime into vusage to avoid - * high-latency completions appearing as idle. This should - * be done after the above ->last_time adjustment. + * Determine absolute usage factoring in pending and in-flight + * IOs to avoid stalls and high-latency completions appearing as + * idle. */ - vusage = max(vusage, vtime - vdone); - - /* calculate hweight based usage ratio and record */ - if (vusage) { - usage = DIV64_U64_ROUND_UP(vusage * hw_inuse, - period_vtime); - iocg->usage_idx = (iocg->usage_idx + 1) % NR_USAGE_SLOTS; - iocg->usages[iocg->usage_idx] = usage; + usage_us = iocg->usage_delta_us; + if (waitqueue_active(&iocg->waitq) && time_before64(vtime, now.vnow)) + usage_us += DIV64_U64_ROUND_UP( + cost_to_abs_cost(now.vnow - vtime, hw_inuse), + now.vrate); + if (vdone != vtime) { + u64 inflight_us = DIV64_U64_ROUND_UP( + cost_to_abs_cost(vtime - vdone, hw_inuse), + now.vrate); + usage_us = max(usage_us, inflight_us); + } + + /* convert to hweight based usage ratio and record */ + uidx = (iocg->usage_idx + 1) % NR_USAGE_SLOTS; + + if (time_after64(vtime, now.vnow - ioc->margins.min)) { + iocg->usage_idx = uidx; + iocg->usages[uidx] = WEIGHT_ONE; + } else if (usage_us) { + u64 started_at, dur; + + if (time_after64(iocg->activated_at, ioc->period_at)) + started_at = iocg->activated_at; + else + started_at = ioc->period_at; + + dur = max_t(u64, now.now - started_at, 1); + usage = clamp_t(u32, + DIV64_U64_ROUND_UP(usage_us * WEIGHT_ONE, dur), + 1, WEIGHT_ONE); + + iocg->usage_idx = uidx; + iocg->usages[uidx] = usage; } else { usage = 0; } @@ -1620,7 +1639,6 @@ static void ioc_timer_fn(struct timer_list *timer) /* throw away surplus vtime */ atomic64_add(delta, &iocg->vtime); atomic64_add(delta, &iocg->done_vtime); - iocg->last_vtime += delta; /* if usage is sufficiently low, maybe it can donate */ if (surplus_adjusted_hweight_inuse(usage, hw_inuse)) { iocg->has_surplus = true; diff --git a/include/trace/events/iocost.h b/include/trace/events/iocost.h index c2f580fd371b..a905ecc0342f 100644 --- a/include/trace/events/iocost.h +++ b/include/trace/events/iocost.h @@ -26,7 +26,6 @@ TRACE_EVENT(iocost_iocg_activate, __field(u64, vrate) __field(u64, last_period) __field(u64, cur_period) - __field(u64, last_vtime) __field(u64, vtime) __field(u32, weight) __field(u32, inuse) @@ -42,7 +41,6 @@ TRACE_EVENT(iocost_iocg_activate, __entry->vrate = now->vrate; __entry->last_period = last_period; __entry->cur_period = cur_period; - __entry->last_vtime = iocg->last_vtime; __entry->vtime = vtime; __entry->weight = iocg->weight; __entry->inuse = iocg->inuse; @@ -51,13 +49,12 @@ TRACE_EVENT(iocost_iocg_activate, ), TP_printk("[%s:%s] now=%llu:%llu vrate=%llu " - "period=%llu->%llu vtime=%llu->%llu " + "period=%llu->%llu vtime=%llu " "weight=%u/%u hweight=%llu/%llu", __get_str(devname), __get_str(cgroup), __entry->now, __entry->vnow, __entry->vrate, __entry->last_period, __entry->cur_period, - __entry->last_vtime, __entry->vtime, - __entry->inuse, __entry->weight, + __entry->vtime, __entry->inuse, __entry->weight, __entry->hweight_inuse, __entry->hweight_active ) ); From patchwork Tue Sep 1 18:52:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7424D1575 for ; Tue, 1 Sep 2020 18:55:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 59D49208CA for ; Tue, 1 Sep 2020 18:55:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986545; bh=QYSUFbBlUZdu0G72XBGzFcXAA/hpWmkZ9H0ZKATRXqc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=m+x4bygvQtdEt8CJdYG4jl656i15q4uHJAuqKDUbyVjWfoVc+qnO7BNWYWW90sdRc gwmISgrBFa1MjG6kRp5r9pjTLa1tKIRVp+rapL+BdX2woSTsach/3aDghe0MjLpJvT zsXMXJBgaIHugYWN3pFmQaDsK9cKbNzpAV42pNGE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731851AbgIASyB (ORCPT ); Tue, 1 Sep 2020 14:54:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730517AbgIASxv (ORCPT ); Tue, 1 Sep 2020 14:53:51 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9FC2C061246; Tue, 1 Sep 2020 11:53:50 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id v69so1993265qkb.7; Tue, 01 Sep 2020 11:53:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0YDXgIA1K+ZfhGe/lwGuhu8ktkywG9yY1AryQl3mCGI=; b=IJmpZv9OQ33NuZ5oIRloUIYTVC+/7+sFOK4ttwgkljaMn7IlEN3iV9lyqBOcQqR0uC fwmhnl4FPImWoZiBbyHRJG7BhYHTcuwWeuA54PjTv94oC59xUbUSRaXZm0d1u0i0AgYu VFG9TRDpU+0m6LQncYEC0A5mt8TlqhsdiugpoKm41gZCnPKxSRU3IDDOcBMB1NblhHPK GVJnObVjUVwmT6IA68FmGkXSCc2OePwnWp3hj3w6ZBwRNA9cojz8L8AQrQabiMOi6+0q RA6z5tL8/qJ+Ijd6WL/WCnnAezrfV1dSNNTMG500KIQ8kqcLYJREcXLI1md16F1TUv9i gSxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=0YDXgIA1K+ZfhGe/lwGuhu8ktkywG9yY1AryQl3mCGI=; b=IOUBCSTvOoSM/RS9iXykNd6zhPqRjHF7ywUQayi882gw3RlM/uzHT9C1EGUoKLsWqd amCwHrO32bmZx55d6Sqpoci46vvwRO6gGLp3Rl7cYVt5PJfPWVfJTiBhnX6yFYDhGLR2 9DUun0PurkJfQEm6Aivq0j4WOOYXjzaqGgQW6JwoB3DGp6ExV0o9BXBDHdYhR6i3TLAi Fcs9XLhwnPvevTjqhNmIn09DHzVPR8WNkx3ufdzKmq9O9Azj9Hpbti692ERHcOtZtpCb hmhteiTGC3c7rfvG6Gk9mmySn5+GaJe/pGdslMHtKanPQdNF5GJIWWiaR7mcubOmsPyE mwLA== X-Gm-Message-State: AOAM533jNXohMaNCbrRFiEY6988j1bEaS6U+2axc9QfQjtqmKLYGrolx F66c+DJhyYbenHTSDGLbrWI= X-Google-Smtp-Source: ABdhPJzIFXVWPu8lmo/nlLSwxDxGZUx+3Z4piSJAkukXlm2lwWiqEZmkUb/HShPISMCG8nUq8qb7kw== X-Received: by 2002:a37:5042:: with SMTP id e63mr3170231qkb.453.1598986429966; Tue, 01 Sep 2020 11:53:49 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id k6sm1865857qti.23.2020.09.01.11.53.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:49 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 15/27] blk-iocost: replace iocg->has_surplus with ->surplus_list Date: Tue, 1 Sep 2020 14:52:45 -0400 Message-Id: <20200901185257.645114-16-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Instead of marking iocgs with surplus with a flag and filtering for them while walking all active iocgs, build a surpluses list. This doesn't make much difference now but will help implementing improved donation logic which will iterate iocgs with surplus multiple times. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 2496674bbbf4..c1cd66cfa2a8 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -494,9 +494,9 @@ struct ioc_gq { int hweight_gen; u32 hweight_active; u32 hweight_inuse; - bool has_surplus; struct list_head walk_list; + struct list_head surplus_list; struct wait_queue_head waitq; struct hrtimer waitq_timer; @@ -1507,6 +1507,7 @@ static void ioc_timer_fn(struct timer_list *timer) struct ioc *ioc = container_of(timer, struct ioc, timer); struct ioc_gq *iocg, *tiocg; struct ioc_now now; + LIST_HEAD(surpluses); int nr_surpluses = 0, nr_shortages = 0, nr_lagging = 0; u32 ppm_rthr = MILLION - ioc->params.qos[QOS_RPPM]; u32 ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM]; @@ -1630,8 +1631,7 @@ static void ioc_timer_fn(struct timer_list *timer) /* see whether there's surplus vtime */ vmin = now.vnow - ioc->margins.max; - iocg->has_surplus = false; - + WARN_ON_ONCE(!list_empty(&iocg->surplus_list)); if (!waitqueue_active(&iocg->waitq) && time_before64(vtime, vmin)) { u64 delta = vmin - vtime; @@ -1641,7 +1641,7 @@ static void ioc_timer_fn(struct timer_list *timer) atomic64_add(delta, &iocg->done_vtime); /* if usage is sufficiently low, maybe it can donate */ if (surplus_adjusted_hweight_inuse(usage, hw_inuse)) { - iocg->has_surplus = true; + list_add(&iocg->surplus_list, &surpluses); nr_surpluses++; } } else if (hw_inuse < hw_active) { @@ -1677,13 +1677,10 @@ static void ioc_timer_fn(struct timer_list *timer) goto skip_surplus_transfers; /* there are both shortages and surpluses, transfer surpluses */ - list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { + list_for_each_entry(iocg, &surpluses, surplus_list) { u32 usage, hw_active, hw_inuse, new_hwi, new_inuse; int nr_valid = 0; - if (!iocg->has_surplus) - continue; - /* base the decision on max historical usage */ for (i = 0, usage = 0; i < NR_USAGE_SLOTS; i++) { if (iocg->usages[i]) { @@ -1711,6 +1708,10 @@ static void ioc_timer_fn(struct timer_list *timer) skip_surplus_transfers: commit_weights(ioc); + /* surplus list should be dissolved after use */ + list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list) + list_del_init(&iocg->surplus_list); + /* * If q is getting clogged or we're missing too much, we're issuing * too much IO and should lower vtime rate. If we're not missing @@ -2284,6 +2285,7 @@ static void ioc_pd_init(struct blkg_policy_data *pd) atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period)); INIT_LIST_HEAD(&iocg->active_list); INIT_LIST_HEAD(&iocg->walk_list); + INIT_LIST_HEAD(&iocg->surplus_list); iocg->hweight_active = WEIGHT_ONE; iocg->hweight_inuse = WEIGHT_ONE; @@ -2320,6 +2322,7 @@ static void ioc_pd_free(struct blkg_policy_data *pd) } WARN_ON_ONCE(!list_empty(&iocg->walk_list)); + WARN_ON_ONCE(!list_empty(&iocg->surplus_list)); spin_unlock_irqrestore(&ioc->lock, flags); From patchwork Tue Sep 1 18:52:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749395 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F12A9166C for ; Tue, 1 Sep 2020 18:55:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D62832087D for ; Tue, 1 Sep 2020 18:55:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986545; bh=WpAB5Q/wgQeiOnRPZUyObWMPUxFkDSosSG/MezRN9T4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ZO6z1SFtVL1WLIR6Q1B/djaN0Qvh5VL8RD7lhAzTk3CmSn8UIbBmcs+uYyFir8fBl gdU1/O2KMCy0HH9hSaAwLoZVQkbbqGGH7vFiJ79d0HH7jI60qm8mhl1lSOf1DZrlVL 3jp8KbWevMBGINyYXYXxFhC7k4+aBOFMW19l9v+c= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731834AbgIASyA (ORCPT ); Tue, 1 Sep 2020 14:54:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731761AbgIASxz (ORCPT ); Tue, 1 Sep 2020 14:53:55 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23BCDC061244; Tue, 1 Sep 2020 11:53:54 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id d20so1999253qka.5; Tue, 01 Sep 2020 11:53:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=W9a83GnSUegRN3aq2WoXDxZCM3FbBGqeK5GH+ZHyQQM=; b=Sn5CXW/oEMdzc5Gt1HU6u77M+wBIEpc5T8Nn2jWGaqsyzKTDbW9A0gR4v0fQ9A2SLF SiKpYQI0fJ/3cFymDBLZZKY8wtkhlTzY5FC/tGXq+mMSO6sAvHHyUP2Cpe0WCRRh5S51 MYoKmye0Kq8w0tJCoUkaopTaNumWCduYyOeTAlBlbNFqyxUyi12A5lWAQr0LRuNJhYY/ MRuPn6o+8MDCUmNK2gWYQkIxH3ASgcbDK8bs1dWX9xIyy6xWgl0fIHtm+ntd762jE5HC R4fSLhdyaH/5/YeKFXOcsc7uWdHy38d4f1QFu4lpjBQXLUJ0Vc99/k/bTCO4Zo+TvC7v gKVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=W9a83GnSUegRN3aq2WoXDxZCM3FbBGqeK5GH+ZHyQQM=; b=YZNvoE6yXZT33ntRVuNJe6isMSbrYvAWuVCJkX6OVF+pq9ySG7pd15u9f0t2eAozQ8 RH6LeVHZvk0IKLb0JS1aGK2GQ0jNBu18YMHzgTB919YKPZ/KKTt8s4SYwrPbWNIop4L8 CPnMnxbWzqJcJObSaOhuW9yaYu2+hDnogmKxF5ee33meLS5lNBhFnwNmhk+luebN1msP 6JhSvc1hxb+enaU3kv904Rc+5eZJtXykkrl1WehI1gujLoMKkMvLJywtdKDEsvIoaGEO aXnnbzakQiWzmeH3EVoJhbvN5PhKf52mEvbzHt5H8FiFid3xp8P+PCRicp+5MRG3ucQu udow== X-Gm-Message-State: AOAM533or11nAuLKnn1zQp+OeP/kw76jt6rOgeIKuXJwip+gYW4X/W5K vPyTQ8I7KoQNZZ+ZVNItB7o= X-Google-Smtp-Source: ABdhPJz2+2WxUDYy56VgqwLHTsVTQxppZsujilAbTKebn7Jwf2RduFUvodDK1dtgDwmEfpLn0usYeQ== X-Received: by 2002:a37:e509:: with SMTP id e9mr3124998qkg.469.1598986433172; Tue, 01 Sep 2020 11:53:53 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id g4sm2339857qth.30.2020.09.01.11.53.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:52 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 16/27] blk-iocost: decouple vrate adjustment from surplus transfers Date: Tue, 1 Sep 2020 14:52:46 -0400 Message-Id: <20200901185257.645114-17-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Budget donations are inaccurate and could take multiple periods to converge. To prevent triggering vrate adjustments while surplus transfers were catching up, vrate adjustment was suppressed if donations were increasing, which was indicated by non-zero nr_surpluses. This entangling won't be necessary with the scheduled rewrite of donation mechanism which will make it precise and immediate. Let's decouple the two in preparation. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 19 +++++++------------ include/trace/events/iocost.h | 13 ++++--------- 2 files changed, 11 insertions(+), 21 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index c1cd66cfa2a8..a3889a8b0a33 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1508,7 +1508,7 @@ static void ioc_timer_fn(struct timer_list *timer) struct ioc_gq *iocg, *tiocg; struct ioc_now now; LIST_HEAD(surpluses); - int nr_surpluses = 0, nr_shortages = 0, nr_lagging = 0; + int nr_shortages = 0, nr_lagging = 0; u32 ppm_rthr = MILLION - ioc->params.qos[QOS_RPPM]; u32 ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM]; u32 missed_ppm[2], rq_wait_pct; @@ -1640,10 +1640,8 @@ static void ioc_timer_fn(struct timer_list *timer) atomic64_add(delta, &iocg->vtime); atomic64_add(delta, &iocg->done_vtime); /* if usage is sufficiently low, maybe it can donate */ - if (surplus_adjusted_hweight_inuse(usage, hw_inuse)) { + if (surplus_adjusted_hweight_inuse(usage, hw_inuse)) list_add(&iocg->surplus_list, &surpluses); - nr_surpluses++; - } } else if (hw_inuse < hw_active) { u32 new_hwi, new_inuse; @@ -1673,7 +1671,7 @@ static void ioc_timer_fn(struct timer_list *timer) } } - if (!nr_shortages || !nr_surpluses) + if (!nr_shortages || list_empty(&surpluses)) goto skip_surplus_transfers; /* there are both shortages and surpluses, transfer surpluses */ @@ -1738,11 +1736,9 @@ static void ioc_timer_fn(struct timer_list *timer) /* * If there are IOs spanning multiple periods, wait - * them out before pushing the device harder. If - * there are surpluses, let redistribution work it - * out first. + * them out before pushing the device harder. */ - if (!nr_lagging && !nr_surpluses) + if (!nr_lagging) ioc->busy_level--; } else { /* @@ -1796,15 +1792,14 @@ static void ioc_timer_fn(struct timer_list *timer) } trace_iocost_ioc_vrate_adj(ioc, vrate, missed_ppm, rq_wait_pct, - nr_lagging, nr_shortages, - nr_surpluses); + nr_lagging, nr_shortages); atomic64_set(&ioc->vtime_rate, vrate); ioc_refresh_margins(ioc); } else if (ioc->busy_level != prev_busy_level || nr_lagging) { trace_iocost_ioc_vrate_adj(ioc, atomic64_read(&ioc->vtime_rate), missed_ppm, rq_wait_pct, nr_lagging, - nr_shortages, nr_surpluses); + nr_shortages); } ioc_refresh_params(ioc, false); diff --git a/include/trace/events/iocost.h b/include/trace/events/iocost.h index a905ecc0342f..ee024fe8fef6 100644 --- a/include/trace/events/iocost.h +++ b/include/trace/events/iocost.h @@ -128,11 +128,9 @@ DEFINE_EVENT(iocg_inuse_update, iocost_inuse_reset, TRACE_EVENT(iocost_ioc_vrate_adj, TP_PROTO(struct ioc *ioc, u64 new_vrate, u32 *missed_ppm, - u32 rq_wait_pct, int nr_lagging, int nr_shortages, - int nr_surpluses), + u32 rq_wait_pct, int nr_lagging, int nr_shortages), - TP_ARGS(ioc, new_vrate, missed_ppm, rq_wait_pct, nr_lagging, nr_shortages, - nr_surpluses), + TP_ARGS(ioc, new_vrate, missed_ppm, rq_wait_pct, nr_lagging, nr_shortages), TP_STRUCT__entry ( __string(devname, ioc_name(ioc)) @@ -144,7 +142,6 @@ TRACE_EVENT(iocost_ioc_vrate_adj, __field(u32, rq_wait_pct) __field(int, nr_lagging) __field(int, nr_shortages) - __field(int, nr_surpluses) ), TP_fast_assign( @@ -157,15 +154,13 @@ TRACE_EVENT(iocost_ioc_vrate_adj, __entry->rq_wait_pct = rq_wait_pct; __entry->nr_lagging = nr_lagging; __entry->nr_shortages = nr_shortages; - __entry->nr_surpluses = nr_surpluses; ), - TP_printk("[%s] vrate=%llu->%llu busy=%d missed_ppm=%u:%u rq_wait_pct=%u lagging=%d shortages=%d surpluses=%d", + TP_printk("[%s] vrate=%llu->%llu busy=%d missed_ppm=%u:%u rq_wait_pct=%u lagging=%d shortages=%d", __get_str(devname), __entry->old_vrate, __entry->new_vrate, __entry->busy_level, __entry->read_missed_ppm, __entry->write_missed_ppm, - __entry->rq_wait_pct, __entry->nr_lagging, __entry->nr_shortages, - __entry->nr_surpluses + __entry->rq_wait_pct, __entry->nr_lagging, __entry->nr_shortages ) ); From patchwork Tue Sep 1 18:52:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 686531575 for ; Tue, 1 Sep 2020 18:54:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4807720767 for ; Tue, 1 Sep 2020 18:54:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986442; bh=g6k7w/j8gl5gt/LKvAwv6p29WWSPeCoRupmDAGmTT4k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=B1QjZxWobQZJvT9LefV6MKZEMSfrwhHUNqERwVaue/EA+hvPfkN5r5drPx3H/WbMB oXFQmuFQR6NobE1/tIX7pQp/mxuCIBfFvi0ugIB9xovkwUvFrqYBs43eIQVbuZLZGq b2WU0wzRpd7eXmILKV0FhhcvhOYrrFOD1z0flti4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728925AbgIASyA (ORCPT ); Tue, 1 Sep 2020 14:54:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731783AbgIASx4 (ORCPT ); Tue, 1 Sep 2020 14:53:56 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B128C061245; Tue, 1 Sep 2020 11:53:56 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id o64so1984224qkb.10; Tue, 01 Sep 2020 11:53:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K3TiqQ6XoBnoO+Au4OZsHz6VfSjY8XgXmAHVlPD4pKE=; b=hCgv4Sg/jsBTk4NVkOay3mp7QlI5e4hqnj91u6kgsrQHPvFpsZztMVghTYelPwRF5M SKPoSe57Z9aebw01UCHrdiRaLrTYYQHYly0y2IkaXrA2e+o74oNIMPvRZ14IEIVYfhKN AJXwWvl6vSCWmSjGoExdW2jgNOFNZLtkIwwDi0MP9INvgODvAn++h7ugv0imeVcqhFBF QB85GcjedsubivHx9fOTBPEZOSt7RPcqJlMaXn/Mmyoekcf1JavMOtkUUvNXVq/U2N6s qpXizaPDXWHF7d1zvmD4yEV47vKzDEjJDVRQ1TsXF90DMIek87maIsL7issKbeiMJjbS b0yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=K3TiqQ6XoBnoO+Au4OZsHz6VfSjY8XgXmAHVlPD4pKE=; b=UGGthMQ6kSnZ3Xrm13JA+la+IYs0JKU6YknBX8Bcxtcj/IqHS/hQRVa+SiVwPRpKZA hXUV0+1xGCW5jsSjQPq3aSqTIxaTp/kgRzUVRuD++HLI0aRPFWzfqHV/F0hc0NgZLvc6 bwji3QHCQgrfsCocsobME5LgBXIPc1q9Dom5XrWj9gmM4eOVHAwqZLDRldE9aV/xkGGP gmpVIHtK/D6joC2J0+k/3nbNfbkQ3xO2Kzf9WAwKSMRqEzpVAw6PsgHaHnEnpEcbu9+T T1qSC+lRql9HQGXqSceLvDYcf+VLjGB24S1zQv8/ctNScctxz/XWLGg4VjSyjNOeUvox 8tmQ== X-Gm-Message-State: AOAM533ENK4k59C1w8y52kDkZds07deuDvykTejHS3kWOu0sytZIHTtQ 6jujGInumWR0o0O1mwyDAMFcrIr0lMYpYw== X-Google-Smtp-Source: ABdhPJy6cpjNy43giS09dLIAH39peHcF1mKQcu+OvTRGdx/KcELq3xROSt1tiA+Bqt0LzC6mhQMtJQ== X-Received: by 2002:a05:620a:13c9:: with SMTP id g9mr3415643qkl.436.1598986435478; Tue, 01 Sep 2020 11:53:55 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id n85sm2536672qkn.80.2020.09.01.11.53.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:54 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 17/27] blk-iocost: restructure surplus donation logic Date: Tue, 1 Sep 2020 14:52:47 -0400 Message-Id: <20200901185257.645114-18-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The way the surplus donation logic is structured isn't great. There are two separate paths for starting/increasing donations and decreasing them making the logic harder to follow and is prone to unnecessary behavior differences. In preparation for improved donation handling, this patch restructures the code so that * All donors - new, increasing and decreasing - are funneled through the same code path. * The target donation calculation is factored into hweight_after_donation() which is called once from the same spot for all possible donors. * Actual inuse adjustment is factored into trasnfer_surpluses(). This change introduces a few behavior differences - e.g. donation amount reduction now uses the max usage of the recent three periods just like new and increasing donations, and inuse now gets adjusted upwards the same way it gets downwards. These differences are unlikely to have severely negative implications and the whole logic will be revamped soon. This patch also removes two tracepoints. The existing TPs don't quite fit the new implementation. A later patch will update and reinstate them. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 179 ++++++++++++++++++++++++++------------------- 1 file changed, 103 insertions(+), 76 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index a3889a8b0a33..61b008d0801f 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -494,6 +494,7 @@ struct ioc_gq { int hweight_gen; u32 hweight_active; u32 hweight_inuse; + u32 hweight_after_donation; struct list_head walk_list; struct list_head surplus_list; @@ -1070,6 +1071,32 @@ static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep *hw_inusep = iocg->hweight_inuse; } +/* + * Calculate the hweight_inuse @iocg would get with max @inuse assuming all the + * other weights stay unchanged. + */ +static u32 current_hweight_max(struct ioc_gq *iocg) +{ + u32 hwm = WEIGHT_ONE; + u32 inuse = iocg->active; + u64 child_inuse_sum; + int lvl; + + lockdep_assert_held(&iocg->ioc->lock); + + for (lvl = iocg->level - 1; lvl >= 0; lvl--) { + struct ioc_gq *parent = iocg->ancestors[lvl]; + struct ioc_gq *child = iocg->ancestors[lvl + 1]; + + child_inuse_sum = parent->child_inuse_sum + inuse - child->inuse; + hwm = div64_u64((u64)hwm * inuse, child_inuse_sum); + inuse = DIV64_U64_ROUND_UP(parent->active * child_inuse_sum, + parent->child_active_sum); + } + + return max_t(u32, hwm, 1); +} + static void weight_updated(struct ioc_gq *iocg) { struct ioc *ioc = iocg->ioc; @@ -1488,20 +1515,58 @@ static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now) } } -/* returns usage with margin added if surplus is large enough */ -static u32 surplus_adjusted_hweight_inuse(u32 usage, u32 hw_inuse) +/* + * Determine what @iocg's hweight_inuse should be after donating unused + * capacity. @hwm is the upper bound and used to signal no donation. This + * function also throws away @iocg's excess budget. + */ +static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, + struct ioc_now *now) { + struct ioc *ioc = iocg->ioc; + u64 vtime = atomic64_read(&iocg->vtime); + s64 excess; + + /* see whether minimum margin requirement is met */ + if (waitqueue_active(&iocg->waitq) || + time_after64(vtime, now->vnow - ioc->margins.min)) + return hwm; + + /* throw away excess above max */ + excess = now->vnow - vtime - ioc->margins.max; + if (excess > 0) { + atomic64_add(excess, &iocg->vtime); + atomic64_add(excess, &iocg->done_vtime); + vtime += excess; + } + /* add margin */ usage = DIV_ROUND_UP(usage * SURPLUS_SCALE_PCT, 100); usage += SURPLUS_SCALE_ABS; /* don't bother if the surplus is too small */ - if (usage + SURPLUS_MIN_ADJ_DELTA > hw_inuse) - return 0; + if (usage + SURPLUS_MIN_ADJ_DELTA > hwm) + return hwm; return usage; } +static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now) +{ + struct ioc_gq *iocg; + + list_for_each_entry(iocg, surpluses, surplus_list) { + u32 old_hwi, new_hwi, new_inuse; + + current_hweight(iocg, NULL, &old_hwi); + new_hwi = iocg->hweight_after_donation; + + new_inuse = DIV64_U64_ROUND_UP((u64)iocg->inuse * new_hwi, + old_hwi); + __propagate_weights(iocg, iocg->weight, new_inuse); + } +} + static void ioc_timer_fn(struct timer_list *timer) { struct ioc *ioc = container_of(timer, struct ioc, timer); @@ -1560,9 +1625,9 @@ static void ioc_timer_fn(struct timer_list *timer) /* calc usages and see whether some weights need to be moved around */ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { - u64 vdone, vtime, usage_us, vmin; + u64 vdone, vtime, usage_us; u32 hw_active, hw_inuse, usage; - int uidx; + int uidx, nr_valid; /* * Collect unused and wind vtime closer to vnow to prevent @@ -1618,92 +1683,54 @@ static void ioc_timer_fn(struct timer_list *timer) started_at = ioc->period_at; dur = max_t(u64, now.now - started_at, 1); - usage = clamp_t(u32, + + iocg->usage_idx = uidx; + iocg->usages[uidx] = clamp_t(u32, DIV64_U64_ROUND_UP(usage_us * WEIGHT_ONE, dur), 1, WEIGHT_ONE); + } - iocg->usage_idx = uidx; - iocg->usages[uidx] = usage; - } else { - usage = 0; + /* base the decision on max historical usage */ + for (i = 0, usage = 0, nr_valid = 0; i < NR_USAGE_SLOTS; i++) { + if (iocg->usages[i]) { + usage = max(usage, iocg->usages[i]); + nr_valid++; + } } + if (nr_valid < MIN_VALID_USAGES) + usage = WEIGHT_ONE; /* see whether there's surplus vtime */ - vmin = now.vnow - ioc->margins.max; - WARN_ON_ONCE(!list_empty(&iocg->surplus_list)); - if (!waitqueue_active(&iocg->waitq) && - time_before64(vtime, vmin)) { - u64 delta = vmin - vtime; - - /* throw away surplus vtime */ - atomic64_add(delta, &iocg->vtime); - atomic64_add(delta, &iocg->done_vtime); - /* if usage is sufficiently low, maybe it can donate */ - if (surplus_adjusted_hweight_inuse(usage, hw_inuse)) - list_add(&iocg->surplus_list, &surpluses); - } else if (hw_inuse < hw_active) { - u32 new_hwi, new_inuse; + if (hw_inuse < hw_active || + (!waitqueue_active(&iocg->waitq) && + time_before64(vtime, now.vnow - ioc->margins.max))) { + u32 hwm, new_hwi; - /* was donating but might need to take back some */ - if (waitqueue_active(&iocg->waitq)) { - new_hwi = hw_active; + /* + * Already donating or accumulated enough to start. + * Determine the donation amount. + */ + hwm = current_hweight_max(iocg); + new_hwi = hweight_after_donation(iocg, hwm, usage, + &now); + if (new_hwi < hwm) { + iocg->hweight_after_donation = new_hwi; + list_add(&iocg->surplus_list, &surpluses); } else { - new_hwi = max(hw_inuse, - usage * SURPLUS_SCALE_PCT / 100 + - SURPLUS_SCALE_ABS); - } - - new_inuse = div64_u64((u64)iocg->inuse * new_hwi, - hw_inuse); - new_inuse = clamp_t(u32, new_inuse, 1, iocg->active); - - if (new_inuse > iocg->inuse) { - TRACE_IOCG_PATH(inuse_takeback, iocg, &now, - iocg->inuse, new_inuse, - hw_inuse, new_hwi); - __propagate_weights(iocg, iocg->weight, - new_inuse); + __propagate_weights(iocg, iocg->active, + iocg->active); + nr_shortages++; } } else { - /* genuninely out of vtime */ + /* genuinely short on vtime */ nr_shortages++; } } - if (!nr_shortages || list_empty(&surpluses)) - goto skip_surplus_transfers; + if (!list_empty(&surpluses) && nr_shortages) + transfer_surpluses(&surpluses, &now); - /* there are both shortages and surpluses, transfer surpluses */ - list_for_each_entry(iocg, &surpluses, surplus_list) { - u32 usage, hw_active, hw_inuse, new_hwi, new_inuse; - int nr_valid = 0; - - /* base the decision on max historical usage */ - for (i = 0, usage = 0; i < NR_USAGE_SLOTS; i++) { - if (iocg->usages[i]) { - usage = max(usage, iocg->usages[i]); - nr_valid++; - } - } - if (nr_valid < MIN_VALID_USAGES) - continue; - - current_hweight(iocg, &hw_active, &hw_inuse); - new_hwi = surplus_adjusted_hweight_inuse(usage, hw_inuse); - if (!new_hwi) - continue; - - new_inuse = DIV64_U64_ROUND_UP((u64)iocg->inuse * new_hwi, - hw_inuse); - if (new_inuse < iocg->inuse) { - TRACE_IOCG_PATH(inuse_giveaway, iocg, &now, - iocg->inuse, new_inuse, - hw_inuse, new_hwi); - __propagate_weights(iocg, iocg->weight, new_inuse); - } - } -skip_surplus_transfers: commit_weights(ioc); /* surplus list should be dissolved after use */ From patchwork Tue Sep 1 18:52:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5E4F1575 for ; Tue, 1 Sep 2020 18:54:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8169121655 for ; Tue, 1 Sep 2020 18:54:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986459; bh=djak4IVa7IC6+FWDNIBKwnhQfJJkTVsg91M+WsiVTmI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=Jjy+rHkY6AXVly53TtwG+tVW5isjZqyUx2yS6DvkhYQ8xvuatQhak2zkkKADvvLOq vmrnC7BHUoMQyItpocfDCDaLqGkMDorSba0mdIujC4PJ+coAWzJNDRSzpXKF7hh8D/ EGZCU0SvIozxxGblrPWMgKM4VLE5M4eEyL4le0v0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731820AbgIASyS (ORCPT ); Tue, 1 Sep 2020 14:54:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731867AbgIASyC (ORCPT ); Tue, 1 Sep 2020 14:54:02 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7F30C061244; Tue, 1 Sep 2020 11:54:00 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id f2so2008970qkh.3; Tue, 01 Sep 2020 11:54:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jGd+L4+CkY8CX0oqEN26vP4g80+Fyvx1ej68utXe9eU=; b=ev7eg3EzJbXpbqkUt0lt7tPqxrWvM49cng+IcVXF/RxITjxvPt9frEoVle9Dm8HVvR WJ6m+rPTklgKM6eUGAYXkR+kU0RSykq/51LEy/BipmfpyUbB00Q/EEhDity8wyseFhN+ VkH5Svpq1Gu0Tio1IZpfn7yPc5fB1JVtzOxaw7VMg7Y4iibIwXrx4RfS3hKF7rOI7pFI D3v2+5d1IuYuXuO1wujQtLYnJxbngTbMwmu0xPOD466mOP0eu1cdeKD93J2vVKD3oNHT OfyQQDl3EMiUHF1YaJjfH76tCV9QPGZYJ4b0Tl47VkjJBq2qizVhnRP67q8NqHNP0Brd g6Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=jGd+L4+CkY8CX0oqEN26vP4g80+Fyvx1ej68utXe9eU=; b=lmOFXBw+C7irWfE7T/e+fYOogxc483Xs4C28UYEm1DXuPqQRk+SDbQNt+a9UATDm4u y6cG/1aVwDkknnnYpa84A55PwVPSwaEubTIveghlb0PueilvOisOmFJORIztYYE60uAT 6AxGESRtxYmNNlaVZlLso4ifPEM1YSLWbgReP2UUzhoTC1nUTWtFWfHEDj15dW23s6MH p7MpiYPIB51wKMDXJvmff5hejzI2CAfLvE0Sd/F7yaq8aG92zDSol731RlgU+ydzcSfw nivvQZmOsikAYVGw9kIQpTnNjxL0fY2qp04F0Jl8yNiBkKOtOgHwg6qEItYOWBSzVQe1 nWMQ== X-Gm-Message-State: AOAM531hHWVK42gF7amabERfLcYft5pyeCEdl+TLDb+uAOBjCpY3+Ttl mNQeSWZUyQwbu0pwXxFIDKg= X-Google-Smtp-Source: ABdhPJz0FYHxtYz1RIlzqJCDxI7tk17SYzcBe+F3uolKbJ7xCAv8goMCyA6Gg//S8Tw+iH/N3/WwIw== X-Received: by 2002:a37:5fc6:: with SMTP id t189mr3003354qkb.78.1598986439743; Tue, 01 Sep 2020 11:53:59 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id d9sm2585814qtg.51.2020.09.01.11.53.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:53:59 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 18/27] blk-iocost: implement Andy's method for donation weight updates Date: Tue, 1 Sep 2020 14:52:48 -0400 Message-Id: <20200901185257.645114-19-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org iocost implements work conservation by reducing iocg->inuse and propagating the adjustment upwards proportionally. However, while I knew the target absolute hierarchical proportion - adjusted hweight_inuse, I couldn't figure out how to determine the iocg->inuse adjustment to achieve that and approximated the adjustment by scaling iocg->inuse using the proportion of the needed hweight_inuse changes. When nested, these scalings aren't accurate even when adjusting a single node as the donating node also receives the benefit of the donated portion. When multiple nodes are donating as they often do, they can be wildly wrong. iocost employed various safety nets to combat the inaccuracies. There are ample buffers in determining how much to donate, the adjustments are conservative and gradual. While it can achieve a reasonable level of work conservation in simple scenarios, the inaccuracies can easily add up leading to significant loss of total work. This in turn makes it difficult to closely cap vrate as vrate adjustment is needed to compensate for the loss of work. The combination of inaccurate donation calculations and vrate adjustments can lead to wide fluctuations and clunky overall behaviors. Andy Newell devised a method to calculate the needed ->inuse updates to achieve the target hweight_inuse's. The method is compatible with the proportional inuse adjustment propagation which allows all hot path operations to be local to each iocg. To roughly summarize, Andy's method divides the tree into donating and non-donating parts, calculates global donation rate which is used to determine the target hweight_inuse for each node, and then derives per-level proportions. There's non-trivial amount of math involved. Please refer to the following pdfs for detailed descriptions. https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN This patch implements Andy's method in transfer_surpluses(). This makes the donation calculations accurate per cycle and enables further improvements in other parts of the donation logic. Signed-off-by: Tejun Heo Cc: Andy Newell --- block/blk-iocost.c | 252 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 244 insertions(+), 8 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 61b008d0801f..ecc23b827e5d 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -491,9 +491,11 @@ struct ioc_gq { /* see __propagate_weights() and current_hweight() for details */ u64 child_active_sum; u64 child_inuse_sum; + u64 child_adjusted_sum; int hweight_gen; u32 hweight_active; u32 hweight_inuse; + u32 hweight_donating; u32 hweight_after_donation; struct list_head walk_list; @@ -1551,20 +1553,252 @@ static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, return usage; } +/* + * For work-conservation, an iocg which isn't using all of its share should + * donate the leftover to other iocgs. There are two ways to achieve this - 1. + * bumping up vrate accordingly 2. lowering the donating iocg's inuse weight. + * + * #1 is mathematically simpler but has the drawback of requiring synchronous + * global hweight_inuse updates when idle iocg's get activated or inuse weights + * change due to donation snapbacks as it has the possibility of grossly + * overshooting what's allowed by the model and vrate. + * + * #2 is inherently safe with local operations. The donating iocg can easily + * snap back to higher weights when needed without worrying about impacts on + * other nodes as the impacts will be inherently correct. This also makes idle + * iocg activations safe. The only effect activations have is decreasing + * hweight_inuse of others, the right solution to which is for those iocgs to + * snap back to higher weights. + * + * So, we go with #2. The challenge is calculating how each donating iocg's + * inuse should be adjusted to achieve the target donation amounts. This is done + * using Andy's method described in the following pdf. + * + * https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo + * + * Given the weights and target after-donation hweight_inuse values, Andy's + * method determines how the proportional distribution should look like at each + * sibling level to maintain the relative relationship between all non-donating + * pairs. To roughly summarize, it divides the tree into donating and + * non-donating parts, calculates global donation rate which is used to + * determine the target hweight_inuse for each node, and then derives per-level + * proportions. + * + * The following pdf shows that global distribution calculated this way can be + * achieved by scaling inuse weights of donating leaves and propagating the + * adjustments upwards proportionally. + * + * https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE + * + * Combining the above two, we can determine how each leaf iocg's inuse should + * be adjusted to achieve the target donation. + * + * https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN + * + * The inline comments use symbols from the last pdf. + * + * b is the sum of the absolute budgets in the subtree. 1 for the root node. + * f is the sum of the absolute budgets of non-donating nodes in the subtree. + * t is the sum of the absolute budgets of donating nodes in the subtree. + * w is the weight of the node. w = w_f + w_t + * w_f is the non-donating portion of w. w_f = w * f / b + * w_b is the donating portion of w. w_t = w * t / b + * s is the sum of all sibling weights. s = Sum(w) for siblings + * s_f and s_t are the non-donating and donating portions of s. + * + * Subscript p denotes the parent's counterpart and ' the adjusted value - e.g. + * w_pt is the donating portion of the parent's weight and w'_pt the same value + * after adjustments. Subscript r denotes the root node's values. + */ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now) { - struct ioc_gq *iocg; + LIST_HEAD(over_hwa); + LIST_HEAD(inner_walk); + struct ioc_gq *iocg, *tiocg, *root_iocg; + u32 after_sum, over_sum, over_target, gamma; + + /* + * It's pretty unlikely but possible for the total sum of + * hweight_after_donation's to be higher than WEIGHT_ONE, which will + * confuse the following calculations. If such condition is detected, + * scale down everyone over its full share equally to keep the sum below + * WEIGHT_ONE. + */ + after_sum = 0; + over_sum = 0; + list_for_each_entry(iocg, surpluses, surplus_list) { + u32 hwa; + + current_hweight(iocg, &hwa, NULL); + after_sum += iocg->hweight_after_donation; + + if (iocg->hweight_after_donation > hwa) { + over_sum += iocg->hweight_after_donation; + list_add(&iocg->walk_list, &over_hwa); + } + } + + if (after_sum >= WEIGHT_ONE) { + /* + * The delta should be deducted from the over_sum, calculate + * target over_sum value. + */ + u32 over_delta = after_sum - (WEIGHT_ONE - 1); + WARN_ON_ONCE(over_sum <= over_delta); + over_target = over_sum - over_delta; + } else { + over_target = 0; + } + + list_for_each_entry_safe(iocg, tiocg, &over_hwa, walk_list) { + if (over_target) + iocg->hweight_after_donation = + div_u64((u64)iocg->hweight_after_donation * + over_target, over_sum); + list_del_init(&iocg->walk_list); + } + + /* + * Build pre-order inner node walk list and prepare for donation + * adjustment calculations. + */ + list_for_each_entry(iocg, surpluses, surplus_list) { + iocg_build_inner_walk(iocg, &inner_walk); + } + + root_iocg = list_first_entry(&inner_walk, struct ioc_gq, walk_list); + WARN_ON_ONCE(root_iocg->level > 0); + list_for_each_entry(iocg, &inner_walk, walk_list) { + iocg->child_adjusted_sum = 0; + iocg->hweight_donating = 0; + iocg->hweight_after_donation = 0; + } + + /* + * Propagate the donating budget (b_t) and after donation budget (b'_t) + * up the hierarchy. + */ list_for_each_entry(iocg, surpluses, surplus_list) { - u32 old_hwi, new_hwi, new_inuse; + struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; + + parent->hweight_donating += iocg->hweight_donating; + parent->hweight_after_donation += iocg->hweight_after_donation; + } - current_hweight(iocg, NULL, &old_hwi); - new_hwi = iocg->hweight_after_donation; + list_for_each_entry_reverse(iocg, &inner_walk, walk_list) { + if (iocg->level > 0) { + struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; - new_inuse = DIV64_U64_ROUND_UP((u64)iocg->inuse * new_hwi, - old_hwi); - __propagate_weights(iocg, iocg->weight, new_inuse); + parent->hweight_donating += iocg->hweight_donating; + parent->hweight_after_donation += iocg->hweight_after_donation; + } + } + + /* + * Calculate inner hwa's (b) and make sure the donation values are + * within the accepted ranges as we're doing low res calculations with + * roundups. + */ + list_for_each_entry(iocg, &inner_walk, walk_list) { + if (iocg->level) { + struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; + + iocg->hweight_active = DIV64_U64_ROUND_UP( + (u64)parent->hweight_active * iocg->active, + parent->child_active_sum); + + } + + iocg->hweight_donating = min(iocg->hweight_donating, + iocg->hweight_active); + iocg->hweight_after_donation = min(iocg->hweight_after_donation, + iocg->hweight_donating - 1); + if (WARN_ON_ONCE(iocg->hweight_active <= 1 || + iocg->hweight_donating <= 1 || + iocg->hweight_after_donation == 0)) { + pr_warn("iocg: invalid donation weights in "); + pr_cont_cgroup_path(iocg_to_blkg(iocg)->blkcg->css.cgroup); + pr_cont(": active=%u donating=%u after=%u\n", + iocg->hweight_active, iocg->hweight_donating, + iocg->hweight_after_donation); + } } + + /* + * Calculate the global donation rate (gamma) - the rate to adjust + * non-donating budgets by. No need to use 64bit multiplication here as + * the first operand is guaranteed to be smaller than WEIGHT_ONE + * (1<<16). + * + * gamma = (1 - t_r') / (1 - t_r) + */ + gamma = DIV_ROUND_UP( + (WEIGHT_ONE - root_iocg->hweight_after_donation) * WEIGHT_ONE, + WEIGHT_ONE - root_iocg->hweight_donating); + + /* + * Calculate adjusted hwi, child_adjusted_sum and inuse for the inner + * nodes. + */ + list_for_each_entry(iocg, &inner_walk, walk_list) { + struct ioc_gq *parent; + u32 inuse, wpt, wptp; + u64 st, sf; + + if (iocg->level == 0) { + /* adjusted weight sum for 1st level: s' = s * b_pf / b'_pf */ + iocg->child_adjusted_sum = DIV64_U64_ROUND_UP( + iocg->child_active_sum * (WEIGHT_ONE - iocg->hweight_donating), + WEIGHT_ONE - iocg->hweight_after_donation); + continue; + } + + parent = iocg->ancestors[iocg->level - 1]; + + /* b' = gamma * b_f + b_t' */ + iocg->hweight_inuse = DIV64_U64_ROUND_UP( + (u64)gamma * (iocg->hweight_active - iocg->hweight_donating), + WEIGHT_ONE) + iocg->hweight_after_donation; + + /* w' = s' * b' / b'_p */ + inuse = DIV64_U64_ROUND_UP( + (u64)parent->child_adjusted_sum * iocg->hweight_inuse, + parent->hweight_inuse); + + /* adjusted weight sum for children: s' = s_f + s_t * w'_pt / w_pt */ + st = DIV64_U64_ROUND_UP( + iocg->child_active_sum * iocg->hweight_donating, + iocg->hweight_active); + sf = iocg->child_active_sum - st; + wpt = DIV64_U64_ROUND_UP( + (u64)iocg->active * iocg->hweight_donating, + iocg->hweight_active); + wptp = DIV64_U64_ROUND_UP( + (u64)inuse * iocg->hweight_after_donation, + iocg->hweight_inuse); + + iocg->child_adjusted_sum = sf + DIV64_U64_ROUND_UP(st * wptp, wpt); + } + + /* + * All inner nodes now have ->hweight_inuse and ->child_adjusted_sum and + * we can finally determine leaf adjustments. + */ + list_for_each_entry(iocg, surpluses, surplus_list) { + struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; + u32 inuse; + + /* w' = s' * b' / b'_p, note that b' == b'_t for donating leaves */ + inuse = DIV64_U64_ROUND_UP( + parent->child_adjusted_sum * iocg->hweight_after_donation, + parent->hweight_inuse); + __propagate_weights(iocg, iocg->active, inuse); + } + + /* walk list should be dissolved after use */ + list_for_each_entry_safe(iocg, tiocg, &inner_walk, walk_list) + list_del_init(&iocg->walk_list); } static void ioc_timer_fn(struct timer_list *timer) @@ -1705,16 +1939,18 @@ static void ioc_timer_fn(struct timer_list *timer) if (hw_inuse < hw_active || (!waitqueue_active(&iocg->waitq) && time_before64(vtime, now.vnow - ioc->margins.max))) { - u32 hwm, new_hwi; + u32 hwa, hwm, new_hwi; /* * Already donating or accumulated enough to start. * Determine the donation amount. */ + current_hweight(iocg, &hwa, NULL); hwm = current_hweight_max(iocg); new_hwi = hweight_after_donation(iocg, hwm, usage, &now); if (new_hwi < hwm) { + iocg->hweight_donating = hwa; iocg->hweight_after_donation = new_hwi; list_add(&iocg->surplus_list, &surpluses); } else { From patchwork Tue Sep 1 18:52:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F4471575 for ; Tue, 1 Sep 2020 18:55:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECFCB2087D for ; Tue, 1 Sep 2020 18:55:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986539; bh=xjoI6m/BzWSYR4WquOIJiNJAO6art7uZSSgSBt2csAM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ePANsa7gX+PThtPc2AF+DL7SqygiccVP7G9sl2BBI6pJXdU61a7XqV6CZJuTgRp89 5Vudu4XmL0s9hKiqT/wMMGwJGcQZ1rQD6by/sYS33kUZfvwUZVBjSQwyM62+MekTf9 LmkEQT6Z5jNFZjVRe2t1KNizTn51j64iXngKCwqg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732170AbgIASzR (ORCPT ); Tue, 1 Sep 2020 14:55:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731819AbgIASyF (ORCPT ); Tue, 1 Sep 2020 14:54:05 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 648F4C061245; Tue, 1 Sep 2020 11:54:03 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id n10so1749693qtv.3; Tue, 01 Sep 2020 11:54:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=j8/MHtQnIWUZGz4ny60HkFTlkU4fit1K0ZbZqOii9ss=; b=i79+RILXC83YIcNxFkRKDBXVMxYXYhWO3kwo/JYJSe6wJY/kXeb/hvcaszKCRwozRc V/WOpFOZ25hnm6tx5GkOi7Q7r6F4e+CWuo8VV+pk7wF44mwqiXl3x7Pw/xs/BVZb83c1 f67j13nEkyz064HpuEyqDhCg7Qj/4VZlFA9v5aCn9A5F8ziCmh336Dq5M5Br1bYAfuhj IisMtpx3WCJrnP0LQKH8X7Mw/b+dWa3E2lri3h5Yk62+AGPTFIKPVZFIR5ZKmORPuGRq xMETKtBfvbhX3CZBV7wzfSN4YkrAHQ78GORtw74nH+kSc9wJ5o6oPiQgrO3JyH9gSloU QFng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=j8/MHtQnIWUZGz4ny60HkFTlkU4fit1K0ZbZqOii9ss=; b=iH+9xXmatwrHIrZyJk5UflAbCgcJKlW0l/WpH4Dias4AQjK4u0qndNC8g8Ge5/F+zp GZ8+SEi7ytmpFizOLL+xL4xjGQ+rRHlHk/KN2cBUMZgdQxsXQZAyeCjplDCfarEUoQCG ROUL19fpkRV1DOsKvHgF1JjhcC/OE40dvJ2q8Q0WbHutvp2AoTp8PdpAg2jmLO92m/SZ rNPuNTp6kUKH22jabkqAFgCHfyE102jYZtVG0zgm3MY2/oLQVkOpBV2uK8lUg7cR8LtT sdT8syTk4OyRQwy7zIB1iYin3oNSoRb9w4Qd/JSdFDWHrPWvRS31sZn7hvrEpnzt9StZ F6LQ== X-Gm-Message-State: AOAM531dPU+yr83rm7+6U58PrJoxahOzsLeVe/zwMcqEBUPp/MRFbu2F 0S8W9F8OfNIhBEpQQYoxpz7f3Mp3pVRojA== X-Google-Smtp-Source: ABdhPJwpGcVGJyG0/nm08NjREu2jHRdZ7SqujXGhl0D0dGNMtZ5u7dc9T9dbT18vuo0/eUHCIAtAuA== X-Received: by 2002:ac8:4d84:: with SMTP id a4mr3269979qtw.365.1598986442381; Tue, 01 Sep 2020 11:54:02 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id 194sm2405798qke.36.2020.09.01.11.54.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:01 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 19/27] blk-iocost: revamp donation amount determination Date: Tue, 1 Sep 2020 14:52:49 -0400 Message-Id: <20200901185257.645114-20-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org iocost has various safety nets to combat inuse adjustment calculation inaccuracies. With Andy's method implemented in transfer_surpluses(), inuse adjustment calculations are now accurate and we can make donation amount determinations accurate too. * Stop keeping track of past usage history and using the maximum. Act on the immediate usage information. * Remove donation constraints defined by SURPLUS_* constants. Donate whatever isn't used. * Determine the donation amount so that the iocg will end up with MARGIN_TARGET_PCT budget at the end of the coming period assuming the same usage as the previous period. TARGET is set at 50% of period, which is the previous maximum. This provides smooth convergence for most repetitive IO patterns. * Apply donation logic early at 20% budget. There's no risk in doing so as the calculation is based on the delta between the current budget and the target budget at the end of the coming period. * Remove preemptive iocg activation for zero cost IOs. As donation can reach near zero now, the mere activation doesn't provide any protection anymore. In the unlikely case that this becomes a problem, the right solution is assigning appropriate costs for such IOs. This significantly improves the donation determination logic while also simplifying it. Now all donations are immediate, exact and smooth. Signed-off-by: Tejun Heo Cc: Andy Newell --- block/blk-iocost.c | 133 +++++++++++++++++---------------------------- 1 file changed, 51 insertions(+), 82 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index ecc23b827e5d..694f1487208a 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -217,12 +217,14 @@ enum { MAX_PERIOD = USEC_PER_SEC, /* - * A cgroup's vtime can run 50% behind the device vtime, which + * iocg->vtime is targeted at 50% behind the device vtime, which * serves as its IO credit buffer. Surplus weight adjustment is * immediately canceled if the vtime margin runs below 10%. */ MARGIN_MIN_PCT = 10, - MARGIN_MAX_PCT = 50, + MARGIN_LOW_PCT = 20, + MARGIN_TARGET_PCT = 50, + MARGIN_MAX_PCT = 100, /* Have some play in timer operations */ TIMER_SLACK_PCT = 1, @@ -234,17 +236,6 @@ enum { */ VTIME_VALID_DUR = 300 * USEC_PER_SEC, - /* - * Remember the past three non-zero usages and use the max for - * surplus calculation. Three slots guarantee that we remember one - * full period usage from the last active stretch even after - * partial deactivation and re-activation periods. Don't start - * giving away weight before collecting two data points to prevent - * hweight adjustments based on one partial activation period. - */ - NR_USAGE_SLOTS = 3, - MIN_VALID_USAGES = 2, - /* 1/64k is granular enough and can easily be handled w/ u32 */ WEIGHT_ONE = 1 << 16, @@ -280,14 +271,6 @@ enum { /* don't let cmds which take a very long time pin lagging for too long */ MAX_LAGGING_PERIODS = 10, - /* - * If usage% * 1.25 + 2% is lower than hweight% by more than 3%, - * donate the surplus. - */ - SURPLUS_SCALE_PCT = 125, /* * 125% */ - SURPLUS_SCALE_ABS = WEIGHT_ONE / 50, /* + 2% */ - SURPLUS_MIN_ADJ_DELTA = WEIGHT_ONE / 33, /* 3% */ - /* switch iff the conditions are met for longer than this */ AUTOP_CYCLE_NSEC = 10LLU * NSEC_PER_SEC, @@ -376,6 +359,8 @@ struct ioc_params { struct ioc_margins { s64 min; + s64 low; + s64 target; s64 max; }; @@ -514,11 +499,7 @@ struct ioc_gq { struct iocg_stat desc_stat; struct iocg_stat last_stat; u64 last_stat_abs_vusage; - - /* usage is recorded as fractions of WEIGHT_ONE */ - u32 usage_delta_us; - int usage_idx; - u32 usages[NR_USAGE_SLOTS]; + u64 usage_delta_us; /* this iocg's depth in the hierarchy and ancestors including self */ int level; @@ -737,6 +718,8 @@ static void ioc_refresh_margins(struct ioc *ioc) u64 vrate = atomic64_read(&ioc->vtime_rate); margins->min = (period_us * MARGIN_MIN_PCT / 100) * vrate; + margins->low = (period_us * MARGIN_LOW_PCT / 100) * vrate; + margins->target = (period_us * MARGIN_TARGET_PCT / 100) * vrate; margins->max = (period_us * MARGIN_MAX_PCT / 100) * vrate; } @@ -1228,7 +1211,7 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) return false; } if (!atomic_read(&blkg->use_delay) && - time_before_eq64(vtime, now->vnow + ioc->margins.max)) + time_before_eq64(vtime, now->vnow + ioc->margins.target)) return false; /* use delay */ @@ -1527,7 +1510,7 @@ static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, { struct ioc *ioc = iocg->ioc; u64 vtime = atomic64_read(&iocg->vtime); - s64 excess; + s64 excess, delta, target, new_hwi; /* see whether minimum margin requirement is met */ if (waitqueue_active(&iocg->waitq) || @@ -1542,15 +1525,28 @@ static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, vtime += excess; } - /* add margin */ - usage = DIV_ROUND_UP(usage * SURPLUS_SCALE_PCT, 100); - usage += SURPLUS_SCALE_ABS; - - /* don't bother if the surplus is too small */ - if (usage + SURPLUS_MIN_ADJ_DELTA > hwm) - return hwm; + /* + * Let's say the distance between iocg's and device's vtimes as a + * fraction of period duration is delta. Assuming that the iocg will + * consume the usage determined above, we want to determine new_hwi so + * that delta equals MARGIN_TARGET at the end of the next period. + * + * We need to execute usage worth of IOs while spending the sum of the + * new budget (1 - MARGIN_TARGET) and the leftover from the last period + * (delta): + * + * usage = (1 - MARGIN_TARGET + delta) * new_hwi + * + * Therefore, the new_hwi is: + * + * new_hwi = usage / (1 - MARGIN_TARGET + delta) + */ + delta = div64_s64(WEIGHT_ONE * (now->vnow - vtime), + now->vnow - ioc->period_at_vtime); + target = WEIGHT_ONE * MARGIN_TARGET_PCT / 100; + new_hwi = div64_s64(WEIGHT_ONE * usage, WEIGHT_ONE - target + delta); - return usage; + return clamp_t(s64, new_hwi, 1, hwm); } /* @@ -1812,7 +1808,7 @@ static void ioc_timer_fn(struct timer_list *timer) u32 ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM]; u32 missed_ppm[2], rq_wait_pct; u64 period_vtime; - int prev_busy_level, i; + int prev_busy_level; /* how were the latencies during the period? */ ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct); @@ -1857,11 +1853,10 @@ static void ioc_timer_fn(struct timer_list *timer) } commit_weights(ioc); - /* calc usages and see whether some weights need to be moved around */ + /* calc usage and see whether some weights need to be moved around */ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { - u64 vdone, vtime, usage_us; - u32 hw_active, hw_inuse, usage; - int uidx, nr_valid; + u64 vdone, vtime, usage_us, usage_dur; + u32 usage, hw_active, hw_inuse; /* * Collect unused and wind vtime closer to vnow to prevent @@ -1886,15 +1881,11 @@ static void ioc_timer_fn(struct timer_list *timer) nr_lagging++; /* - * Determine absolute usage factoring in pending and in-flight - * IOs to avoid stalls and high-latency completions appearing as - * idle. + * Determine absolute usage factoring in in-flight IOs to avoid + * high-latency completions appearing as idle. */ usage_us = iocg->usage_delta_us; - if (waitqueue_active(&iocg->waitq) && time_before64(vtime, now.vnow)) - usage_us += DIV64_U64_ROUND_UP( - cost_to_abs_cost(now.vnow - vtime, hw_inuse), - now.vrate); + if (vdone != vtime) { u64 inflight_us = DIV64_U64_ROUND_UP( cost_to_abs_cost(vtime - vdone, hw_inuse), @@ -1902,43 +1893,22 @@ static void ioc_timer_fn(struct timer_list *timer) usage_us = max(usage_us, inflight_us); } - /* convert to hweight based usage ratio and record */ - uidx = (iocg->usage_idx + 1) % NR_USAGE_SLOTS; - - if (time_after64(vtime, now.vnow - ioc->margins.min)) { - iocg->usage_idx = uidx; - iocg->usages[uidx] = WEIGHT_ONE; - } else if (usage_us) { - u64 started_at, dur; - - if (time_after64(iocg->activated_at, ioc->period_at)) - started_at = iocg->activated_at; - else - started_at = ioc->period_at; - - dur = max_t(u64, now.now - started_at, 1); + /* convert to hweight based usage ratio */ + if (time_after64(iocg->activated_at, ioc->period_at)) + usage_dur = max_t(u64, now.now - iocg->activated_at, 1); + else + usage_dur = max_t(u64, now.now - ioc->period_at, 1); - iocg->usage_idx = uidx; - iocg->usages[uidx] = clamp_t(u32, - DIV64_U64_ROUND_UP(usage_us * WEIGHT_ONE, dur), + usage = clamp_t(u32, + DIV64_U64_ROUND_UP(usage_us * WEIGHT_ONE, + usage_dur), 1, WEIGHT_ONE); - } - - /* base the decision on max historical usage */ - for (i = 0, usage = 0, nr_valid = 0; i < NR_USAGE_SLOTS; i++) { - if (iocg->usages[i]) { - usage = max(usage, iocg->usages[i]); - nr_valid++; - } - } - if (nr_valid < MIN_VALID_USAGES) - usage = WEIGHT_ONE; /* see whether there's surplus vtime */ WARN_ON_ONCE(!list_empty(&iocg->surplus_list)); if (hw_inuse < hw_active || (!waitqueue_active(&iocg->waitq) && - time_before64(vtime, now.vnow - ioc->margins.max))) { + time_before64(vtime, now.vnow - ioc->margins.low))) { u32 hwa, hwm, new_hwi; /* @@ -2175,15 +2145,14 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) if (!ioc->enabled || !iocg->level) return; - /* always activate so that even 0 cost IOs get protected to some level */ - if (!iocg_activate(iocg, &now)) - return; - /* calculate the absolute vtime cost */ abs_cost = calc_vtime_cost(bio, iocg, false); if (!abs_cost) return; + if (!iocg_activate(iocg, &now)) + return; + iocg->cursor = bio_end_sector(bio); vtime = atomic64_read(&iocg->vtime); From patchwork Tue Sep 1 18:52:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86948166C for ; Tue, 1 Sep 2020 18:55:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 577EA214D8 for ; Tue, 1 Sep 2020 18:55:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986519; bh=38IVeFLZRZ5yPnjfo2KPG/A1ILe76B8f4bJ7Btre7TE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=SzQwjyO+XNgH36SwT2UoF2vyc1UGJRWKPqmvE28B7Azg9IJGb8vGAG9litOTu9iEx V6LVBdL/K9HOL24BwXf1oY0ozoZCtzRapnCTe4IuJCGSuepUmMxRXpT98MXYBrqMpH KvQoCdhzlBiD/iQTqvWmBdVexyXIVsGZSeTbWqQs= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732196AbgIASzR (ORCPT ); Tue, 1 Sep 2020 14:55:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726997AbgIASyF (ORCPT ); Tue, 1 Sep 2020 14:54:05 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46900C061246; Tue, 1 Sep 2020 11:54:05 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id u3so1993442qkd.9; Tue, 01 Sep 2020 11:54:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WI61poO+tzD6ywi3PUQTgApvGbiRc/GiQJ11KYgRf2E=; b=D4EA6c7IU9mdyulG2hG/aegFWrCtcIT7s/jikZcEyHYrWBIGjPk3dTa9i3gmPjT9ph 1d6ecywOxoXmDAeCHr9mr0+Ef9L47eEdHn/gWU/iJUn49RTICL1Zj97zofr7CJUYD+cu J6XfqLAMSwnvJMy8lTkhVDnKrkmgYkfpNpkwUP9YxJmPHCs/rJQ9iXynJ30qTElmYrP6 EZ/fJPEhjua7GSUKY3EbOXro8VkKSmt8arm2tde8NG1nHi3X0UnD+EaZMYq5UEjo90pm EkD5LXoDFIxyje5BBidcubv51ZFExuBbxEYRPdnuqVNcX+sTH7w1KhftrCW4I/6KQDku 1tFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=WI61poO+tzD6ywi3PUQTgApvGbiRc/GiQJ11KYgRf2E=; b=LMBoD89Dk0frOBlLB2G1kVyJn7o91d+ZoX+qKMBTIs8hjbFUH6IMq+8S2ORUCFnnoP Yxc14MwBlLxPmURDc9xUoq/9WRDKA40L16cFx6QZvwZwJIybk/DoLaRPWQe8bc+HSDbM a3Pw1I1obMFU/cx2+KPxJF9DWUQx38jyjHDv3NcN1EykT6FUuOp9h4blJc/Kzy/UmrB6 ceaxACgFF3+nB1mGhjB5Va9koj996IBOXuWfGueHTSoVFh9ApblDQiQ1IpRQve9Vvdg+ U36qdCgR2+6hjs0zi1bP6mKFTNRa8UrUfd404k0AwDFcLOHC0XXrD1QD35NTFHUkLDms 8uLw== X-Gm-Message-State: AOAM53289r0G2oby0kKYqCpURneHeLoMcW21QIqL7kqBOI5AmbDepIMz TcLAgEPNl/aqETTLu33YSi8= X-Google-Smtp-Source: ABdhPJysAO7acug8W6zpx4fqHqJAI5U9kXbT2/PMK313OsvrdYv1kqkvvEirm6512eTQQsRmIwjEBA== X-Received: by 2002:a37:414f:: with SMTP id o76mr3237770qka.162.1598986444289; Tue, 01 Sep 2020 11:54:04 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id j8sm2415814qth.90.2020.09.01.11.54.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:03 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 20/27] blk-iocost: revamp in-period donation snapbacks Date: Tue, 1 Sep 2020 14:52:50 -0400 Message-Id: <20200901185257.645114-21-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When the margin drops below the minimum on a donating iocg, donation is immediately canceled in full. There are a couple shortcomings with the current behavior. * It's abrupt. A small temporary budget deficit can lead to a wide swing in weight allocation and a large surplus. * It's open coded in the issue path but not implemented for the merge path. A series of merges at a low inuse can make the iocg incur debts and stall incorrectly. This patch reimplements in-period donation snapbacks so that * inuse adjustment and cost calculations are factored into adjust_inuse_and_calc_cost() which is called from both the issue and merge paths. * Snapbacks are more gradual. It occurs in quarter steps. * A snapback triggers if the margin goes below the low threshold and is lower than the budget at the time of the last adjustment. * For the above, __propagate_weights() stores the margin in iocg->saved_margin. Move iocg->last_inuse storing together into __propagate_weights() for consistency. * Full snapback is guaranteed when there are waiters. * With precise donation and gradual snapbacks, inuse adjustments are now a lot more effective and the value of scaling inuse on weight changes isn't clear. Removed inuse scaling from weight_update(). Signed-off-by: Tejun Heo --- block/blk-iocost.c | 133 ++++++++++++++++++++++++++++++++------------- 1 file changed, 96 insertions(+), 37 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 694f1487208a..d09b4011449c 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -226,6 +226,8 @@ enum { MARGIN_TARGET_PCT = 50, MARGIN_MAX_PCT = 100, + INUSE_ADJ_STEP_PCT = 25, + /* Have some play in timer operations */ TIMER_SLACK_PCT = 1, @@ -443,12 +445,17 @@ struct ioc_gq { * * `last_inuse` remembers `inuse` while an iocg is idle to persist * surplus adjustments. + * + * `inuse` may be adjusted dynamically during period. `saved_*` are used + * to determine and track adjustments. */ u32 cfg_weight; u32 weight; u32 active; u32 inuse; + u32 last_inuse; + s64 saved_margin; sector_t cursor; /* to detect randio */ @@ -934,9 +941,11 @@ static void ioc_start_period(struct ioc *ioc, struct ioc_now *now) /* * Update @iocg's `active` and `inuse` to @active and @inuse, update level - * weight sums and propagate upwards accordingly. + * weight sums and propagate upwards accordingly. If @save, the current margin + * is saved to be used as reference for later inuse in-period adjustments. */ -static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse) +static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse, + bool save, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; int lvl; @@ -945,6 +954,10 @@ static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse) inuse = clamp_t(u32, inuse, 1, active); + iocg->last_inuse = iocg->inuse; + if (save) + iocg->saved_margin = now->vnow - atomic64_read(&iocg->vtime); + if (active == iocg->active && inuse == iocg->inuse) return; @@ -996,9 +1009,10 @@ static void commit_weights(struct ioc *ioc) } } -static void propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse) +static void propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse, + bool save, struct ioc_now *now) { - __propagate_weights(iocg, active, inuse); + __propagate_weights(iocg, active, inuse, save, now); commit_weights(iocg->ioc); } @@ -1082,7 +1096,7 @@ static u32 current_hweight_max(struct ioc_gq *iocg) return max_t(u32, hwm, 1); } -static void weight_updated(struct ioc_gq *iocg) +static void weight_updated(struct ioc_gq *iocg, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; struct blkcg_gq *blkg = iocg_to_blkg(iocg); @@ -1093,9 +1107,7 @@ static void weight_updated(struct ioc_gq *iocg) weight = iocg->cfg_weight ?: iocc->dfl_weight; if (weight != iocg->weight && iocg->active) - propagate_weights(iocg, weight, - DIV64_U64_ROUND_UP((u64)iocg->inuse * weight, - iocg->weight)); + propagate_weights(iocg, weight, iocg->inuse, true, now); iocg->weight = weight; } @@ -1165,8 +1177,9 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) */ iocg->hweight_gen = atomic_read(&ioc->hweight_gen) - 1; list_add(&iocg->active_list, &ioc->active_iocgs); + propagate_weights(iocg, iocg->weight, - iocg->last_inuse ?: iocg->weight); + iocg->last_inuse ?: iocg->weight, true, now); TRACE_IOCG_PATH(iocg_activate, iocg, now, last_period, cur_period, vtime); @@ -1789,7 +1802,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now) inuse = DIV64_U64_ROUND_UP( parent->child_adjusted_sum * iocg->hweight_after_donation, parent->hweight_inuse); - __propagate_weights(iocg, iocg->active, inuse); + __propagate_weights(iocg, iocg->active, inuse, true, now); } /* walk list should be dissolved after use */ @@ -1844,8 +1857,7 @@ static void ioc_timer_fn(struct timer_list *timer) iocg_kick_waitq(iocg, true, &now); } else if (iocg_is_idle(iocg)) { /* no waiter and idle, deactivate */ - iocg->last_inuse = iocg->inuse; - __propagate_weights(iocg, 0, 0); + __propagate_weights(iocg, 0, 0, false, &now); list_del_init(&iocg->active_list); } @@ -1925,7 +1937,7 @@ static void ioc_timer_fn(struct timer_list *timer) list_add(&iocg->surplus_list, &surpluses); } else { __propagate_weights(iocg, iocg->active, - iocg->active); + iocg->active, true, &now); nr_shortages++; } } else { @@ -2055,6 +2067,50 @@ static void ioc_timer_fn(struct timer_list *timer) spin_unlock_irq(&ioc->lock); } +static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime, + u64 abs_cost, struct ioc_now *now) +{ + struct ioc *ioc = iocg->ioc; + struct ioc_margins *margins = &ioc->margins; + u32 adj_step = DIV_ROUND_UP(iocg->active * INUSE_ADJ_STEP_PCT, 100); + u32 hwi; + s64 margin; + u64 cost, new_inuse; + + current_hweight(iocg, NULL, &hwi); + cost = abs_cost_to_cost(abs_cost, hwi); + margin = now->vnow - vtime - cost; + + /* + * We only increase inuse during period and do so iff the margin has + * deteriorated since the previous adjustment. + */ + if (margin >= iocg->saved_margin || margin >= margins->low || + iocg->inuse == iocg->active) + return cost; + + spin_lock_irq(&ioc->lock); + + /* we own inuse only when @iocg is in the normal active state */ + if (list_empty(&iocg->active_list)) { + spin_unlock_irq(&ioc->lock); + return cost; + } + + /* bump up inuse till @abs_cost fits in the existing budget */ + new_inuse = iocg->inuse; + do { + new_inuse = new_inuse + adj_step; + propagate_weights(iocg, iocg->active, new_inuse, true, now); + current_hweight(iocg, NULL, &hwi); + cost = abs_cost_to_cost(abs_cost, hwi); + } while (time_after64(vtime + cost, now->vnow) && + iocg->inuse != iocg->active); + + spin_unlock_irq(&ioc->lock); + return cost; +} + static void calc_vtime_cost_builtin(struct bio *bio, struct ioc_gq *iocg, bool is_merge, u64 *costp) { @@ -2136,7 +2192,6 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) struct ioc_gq *iocg = blkg_to_iocg(blkg); struct ioc_now now; struct iocg_wait wait; - u32 hw_active, hw_inuse; u64 abs_cost, cost, vtime; bool use_debt, ioc_locked; unsigned long flags; @@ -2154,21 +2209,8 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) return; iocg->cursor = bio_end_sector(bio); - vtime = atomic64_read(&iocg->vtime); - current_hweight(iocg, &hw_active, &hw_inuse); - - if (hw_inuse < hw_active && - time_after_eq64(vtime + ioc->margins.min, now.vnow)) { - TRACE_IOCG_PATH(inuse_reset, iocg, &now, - iocg->inuse, iocg->weight, hw_inuse, hw_active); - spin_lock_irq(&ioc->lock); - propagate_weights(iocg, iocg->weight, iocg->weight); - spin_unlock_irq(&ioc->lock); - current_hweight(iocg, &hw_active, &hw_inuse); - } - - cost = abs_cost_to_cost(abs_cost, hw_inuse); + cost = adjust_inuse_and_calc_cost(iocg, vtime, abs_cost, &now); /* * If no one's waiting and within budget, issue right away. The @@ -2190,7 +2232,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) */ use_debt = bio_issue_as_root_blkg(bio) || fatal_signal_pending(current); ioc_locked = use_debt || READ_ONCE(iocg->abs_vdebt); - +retry_lock: iocg_lock(iocg, ioc_locked, &flags); /* @@ -2232,6 +2274,17 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) return; } + /* guarantee that iocgs w/ waiters have maximum inuse */ + if (iocg->inuse != iocg->active) { + if (!ioc_locked) { + iocg_unlock(iocg, false, &flags); + ioc_locked = true; + goto retry_lock; + } + propagate_weights(iocg, iocg->active, iocg->active, true, + &now); + } + /* * Append self to the waitq and schedule the wakeup timer if we're * the first waiter. The timer duration is calculated based on the @@ -2274,8 +2327,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, struct ioc *ioc = iocg->ioc; sector_t bio_end = bio_end_sector(bio); struct ioc_now now; - u32 hw_inuse; - u64 abs_cost, cost; + u64 vtime, abs_cost, cost; unsigned long flags; /* bypass if disabled or for root cgroup */ @@ -2287,8 +2339,9 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, return; ioc_now(ioc, &now); - current_hweight(iocg, NULL, &hw_inuse); - cost = abs_cost_to_cost(abs_cost, hw_inuse); + + vtime = atomic64_read(&iocg->vtime); + cost = adjust_inuse_and_calc_cost(iocg, vtime, abs_cost, &now); /* update cursor if backmerging into the request at the cursor */ if (blk_rq_pos(rq) < bio_end && @@ -2530,7 +2583,7 @@ static void ioc_pd_init(struct blkg_policy_data *pd) } spin_lock_irqsave(&ioc->lock, flags); - weight_updated(iocg); + weight_updated(iocg, &now); spin_unlock_irqrestore(&ioc->lock, flags); } @@ -2544,7 +2597,10 @@ static void ioc_pd_free(struct blkg_policy_data *pd) spin_lock_irqsave(&ioc->lock, flags); if (!list_empty(&iocg->active_list)) { - propagate_weights(iocg, 0, 0); + struct ioc_now now; + + ioc_now(ioc, &now); + propagate_weights(iocg, 0, 0, false, &now); list_del_init(&iocg->active_list); } @@ -2612,6 +2668,7 @@ static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf, struct blkcg *blkcg = css_to_blkcg(of_css(of)); struct ioc_cgrp *iocc = blkcg_to_iocc(blkcg); struct blkg_conf_ctx ctx; + struct ioc_now now; struct ioc_gq *iocg; u32 v; int ret; @@ -2632,7 +2689,8 @@ static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf, if (iocg) { spin_lock_irq(&iocg->ioc->lock); - weight_updated(iocg); + ioc_now(iocg->ioc, &now); + weight_updated(iocg, &now); spin_unlock_irq(&iocg->ioc->lock); } } @@ -2658,7 +2716,8 @@ static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf, spin_lock(&iocg->ioc->lock); iocg->cfg_weight = v * WEIGHT_ONE; - weight_updated(iocg); + ioc_now(iocg->ioc, &now); + weight_updated(iocg, &now); spin_unlock(&iocg->ioc->lock); blkg_conf_finish(&ctx); From patchwork Tue Sep 1 18:52:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66A7F14E5 for ; Tue, 1 Sep 2020 18:55:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4485520767 for ; Tue, 1 Sep 2020 18:55:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986512; bh=dQ/3bh++xizY6NnacS+V7m11IZ+fdPZCu2rz0xk87Zw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=gax+MQdYfsCG4esflbwJlip/DZnFJ3y4CyUZbs9dD6ECVpK3DeZ3Ym6WaKEklF/pt UTM9ErCpOGmajDoSglyTC3QHPiZFXkAdJOEmZXRCNZGAqQcy8fhBc4ZojnJzo4xJjO jouiJ4UCfHBVGoyLyAD0MTvajJWaYdJO0YIRaCpM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731881AbgIASyT (ORCPT ); Tue, 1 Sep 2020 14:54:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730658AbgIASyH (ORCPT ); Tue, 1 Sep 2020 14:54:07 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35B00C061247; Tue, 1 Sep 2020 11:54:07 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id w186so2017806qkd.1; Tue, 01 Sep 2020 11:54:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+znxVIecrtKWYGkcHEVPQEzgR+d5RoCk5v84S1Jw+o4=; b=u8k3gyyVQfI0XRNV9AWPq8hwbyDime6LlCDuZaJ01IWRgdaLnv7YDHECSM6iZomAIw fjfjRTGpdj+3Tdhgj/hhECaf8496rA2fnMqUPDgtQZpVWQfJpcpyCMtOGjFZclO6Qk7s AXZaEIFikWjrrrUQ+/BOER1BDdP3ObTKm0dQzeDKvNm0enFRLt/O2T4dHHFUviN+oCJ2 WAG9oF0MpzzWTH1tXrIB31bMeez4gWypFnVnOyh+YHsd2LdRscAcQgRLV6GPzF95lUE1 bKPpFsyE/oMdWk7sRWw7q6xjQ56pdAs8VjlsjzHz0KzpVhAowC3mIzJV08vt4RPFp+Ok QS9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=+znxVIecrtKWYGkcHEVPQEzgR+d5RoCk5v84S1Jw+o4=; b=RFxQG94oPhM4Me6NXR1RzLCQXhiU6BMTliwE5EkVOALyKoqEc4WHGDzY/mWgIBCnT5 Dd3Emg5w1SWTGs6evfWNVvaPIvJFI8oOdWmgOPGvZx+RKor0FtAhE1uUvd7uhycqZjL6 7wmjGTGIcj9nPZFCKOO1/ZPAqPDRdYnfhNH/kpzA0wQsnFm+pscfHZd4258eH4ePJPSx Lo8pHZwAw8Md0AQ/5OZRT9R4SIEtmsDJZ4nA4iu9JKUu7KI80vQikiKN9rMmxaqhKogH igr/tjrI20+NsM6Cfehp4LRulV6PDZPIuyv8DnYr0dME0lGRukYT4ACaoHTzs4rML3WJ K/9A== X-Gm-Message-State: AOAM532LrlzP2r7OMv6e1awNsrHfETYwjh/fHkIshKV9d8SCTGCpsOhN +UgHI1r/2kF8CjRzVUBQpwU= X-Google-Smtp-Source: ABdhPJwxRJ6KXa/lhEd17sq4PINJmhS1dg/6HgW+DwKy+rk3PafNh0mIMLbYThae8p7DQ0MZAM55Vg== X-Received: by 2002:a37:454d:: with SMTP id s74mr3401793qka.373.1598986446198; Tue, 01 Sep 2020 11:54:06 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id y20sm2468314qkj.70.2020.09.01.11.54.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:05 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 21/27] blk-iocost: revamp debt handling Date: Tue, 1 Sep 2020 14:52:51 -0400 Message-Id: <20200901185257.645114-22-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Debt handling had several issues. * How much inuse a debtor carries wasn't clearly defined. inuse would be driven down over time from not issuing IOs but it'd be better to clamp it to minimum immediately once in debt. * How much can be paid off was determined by hweight_inuse. As inuse was driven down, the payment amount would fall together regardless of the debtor's active weight. This means that the debtors were punished harshly. * ioc_rqos_merge() wasn't calling blkcg_schedule_throttle() after iocg_kick_delay(). This patch revamps debt handling so that * Debt handling owns inuse for iocgs in debt and keeps them at zero. * Payment amount is determined by hweight_active. This is more deterministic and safer than hweight_inuse but still far from ideal in that it doesn't factor in possible donations from other iocgs for debt payments. This likely needs further improvements in the future. * iocg_rqos_merge() now calls blkcg_schedule_throttle() as necessary. Signed-off-by: Tejun Heo Cc: Andy Newell --- block/blk-iocost.c | 117 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index d09b4011449c..d2b69d87f3e7 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1206,13 +1206,13 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) struct blkcg_gq *blkg = iocg_to_blkg(iocg); u64 vtime = atomic64_read(&iocg->vtime); u64 delta_ns, expires, oexpires; - u32 hw_inuse; + u32 hwa; lockdep_assert_held(&iocg->waitq.lock); /* debt-adjust vtime */ - current_hweight(iocg, NULL, &hw_inuse); - vtime += abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); + current_hweight(iocg, &hwa, NULL); + vtime += abs_cost_to_cost(iocg->abs_vdebt, hwa); /* * Clear or maintain depending on the overage. Non-zero vdebt is what @@ -1258,6 +1258,47 @@ static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer) return HRTIMER_NORESTART; } +static void iocg_incur_debt(struct ioc_gq *iocg, u64 abs_cost, + struct ioc_now *now) +{ + struct iocg_pcpu_stat *gcs; + + lockdep_assert_held(&iocg->ioc->lock); + lockdep_assert_held(&iocg->waitq.lock); + WARN_ON_ONCE(list_empty(&iocg->active_list)); + + /* + * Once in debt, debt handling owns inuse. @iocg stays at the minimum + * inuse donating all of it share to others until its debt is paid off. + */ + if (!iocg->abs_vdebt && abs_cost) + propagate_weights(iocg, iocg->active, 0, false, now); + + iocg->abs_vdebt += abs_cost; + + gcs = get_cpu_ptr(iocg->pcpu_stat); + local64_add(abs_cost, &gcs->abs_vusage); + put_cpu_ptr(gcs); +} + +static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, + struct ioc_now *now) +{ + lockdep_assert_held(&iocg->ioc->lock); + lockdep_assert_held(&iocg->waitq.lock); + + /* make sure that nobody messed with @iocg */ + WARN_ON_ONCE(list_empty(&iocg->active_list)); + WARN_ON_ONCE(iocg->inuse > 1); + + iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); + + /* if debt is paid in full, restore inuse */ + if (!iocg->abs_vdebt) + propagate_weights(iocg, iocg->active, iocg->last_inuse, + false, now); +} + static int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode, int flags, void *key) { @@ -1296,26 +1337,25 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, struct iocg_wake_ctx ctx = { .iocg = iocg }; u64 vshortage, expires, oexpires; s64 vbudget; - u32 hw_inuse; + u32 hwa; lockdep_assert_held(&iocg->waitq.lock); - current_hweight(iocg, NULL, &hw_inuse); + current_hweight(iocg, &hwa, NULL); vbudget = now->vnow - atomic64_read(&iocg->vtime); /* pay off debt */ if (pay_debt && iocg->abs_vdebt && vbudget > 0) { - u64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); - u64 delta = min_t(u64, vbudget, vdebt); - u64 abs_delta = min(cost_to_abs_cost(delta, hw_inuse), - iocg->abs_vdebt); + u64 abs_vbudget = cost_to_abs_cost(vbudget, hwa); + u64 abs_vpay = min_t(u64, abs_vbudget, iocg->abs_vdebt); + u64 vpay = abs_cost_to_cost(abs_vpay, hwa); lockdep_assert_held(&ioc->lock); - atomic64_add(delta, &iocg->vtime); - atomic64_add(delta, &iocg->done_vtime); - iocg->abs_vdebt -= abs_delta; - vbudget -= vdebt; + atomic64_add(vpay, &iocg->vtime); + atomic64_add(vpay, &iocg->done_vtime); + iocg_pay_debt(iocg, abs_vpay, now); + vbudget -= vpay; iocg_kick_delay(iocg, now); } @@ -1327,17 +1367,20 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, * not positive. */ if (iocg->abs_vdebt) { - s64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hw_inuse); + s64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hwa); vbudget = min_t(s64, 0, vbudget - vdebt); } /* - * Wake up the ones which are due and see how much vtime we'll need - * for the next one. + * Wake up the ones which are due and see how much vtime we'll need for + * the next one. As paying off debt restores hw_inuse, it must be read + * after the above debt payment. */ - ctx.hw_inuse = hw_inuse; ctx.vbudget = vbudget; + current_hweight(iocg, NULL, &ctx.hw_inuse); + __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx); + if (!waitqueue_active(&iocg->waitq)) return; if (WARN_ON_ONCE(ctx.vbudget >= 0)) @@ -1525,6 +1568,10 @@ static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, u64 vtime = atomic64_read(&iocg->vtime); s64 excess, delta, target, new_hwi; + /* debt handling owns inuse for debtors */ + if (iocg->abs_vdebt) + return 1; + /* see whether minimum margin requirement is met */ if (waitqueue_active(&iocg->waitq) || time_after64(vtime, now->vnow - ioc->margins.min)) @@ -1798,6 +1845,18 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now) struct ioc_gq *parent = iocg->ancestors[iocg->level - 1]; u32 inuse; + /* + * In-debt iocgs participated in the donation calculation with + * the minimum target hweight_inuse. Configuring inuse + * accordingly would work fine but debt handling expects + * @iocg->inuse stay at the minimum and we don't wanna + * interfere. + */ + if (iocg->abs_vdebt) { + WARN_ON_ONCE(iocg->inuse > 1); + continue; + } + /* w' = s' * b' / b'_p, note that b' == b'_t for donating leaves */ inuse = DIV64_U64_ROUND_UP( parent->child_adjusted_sum * iocg->hweight_after_donation, @@ -2081,6 +2140,10 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime, cost = abs_cost_to_cost(abs_cost, hwi); margin = now->vnow - vtime - cost; + /* debt handling owns inuse for debtors */ + if (iocg->abs_vdebt) + return cost; + /* * We only increase inuse during period and do so iff the margin has * deteriorated since the previous adjustment. @@ -2092,7 +2155,7 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime, spin_lock_irq(&ioc->lock); /* we own inuse only when @iocg is in the normal active state */ - if (list_empty(&iocg->active_list)) { + if (iocg->abs_vdebt || list_empty(&iocg->active_list)) { spin_unlock_irq(&ioc->lock); return cost; } @@ -2266,7 +2329,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) * penalizing the cgroup and its descendants. */ if (use_debt) { - iocg->abs_vdebt += abs_cost; + iocg_incur_debt(iocg, abs_cost, &now); if (iocg_kick_delay(iocg, &now)) blkcg_schedule_throttle(rqos->q, (bio->bi_opf & REQ_SWAP) == REQ_SWAP); @@ -2275,7 +2338,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio) } /* guarantee that iocgs w/ waiters have maximum inuse */ - if (iocg->inuse != iocg->active) { + if (!iocg->abs_vdebt && iocg->inuse != iocg->active) { if (!ioc_locked) { iocg_unlock(iocg, false, &flags); ioc_locked = true; @@ -2363,14 +2426,20 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, * be for the vast majority of cases. See debt handling in * ioc_rqos_throttle() for details. */ - spin_lock_irqsave(&iocg->waitq.lock, flags); + spin_lock_irqsave(&ioc->lock, flags); + spin_lock(&iocg->waitq.lock); + if (likely(!list_empty(&iocg->active_list))) { - iocg->abs_vdebt += abs_cost; - iocg_kick_delay(iocg, &now); + iocg_incur_debt(iocg, abs_cost, &now); + if (iocg_kick_delay(iocg, &now)) + blkcg_schedule_throttle(rqos->q, + (bio->bi_opf & REQ_SWAP) == REQ_SWAP); } else { iocg_commit_bio(iocg, bio, abs_cost, cost); } - spin_unlock_irqrestore(&iocg->waitq.lock, flags); + + spin_unlock(&iocg->waitq.lock); + spin_unlock_irqrestore(&ioc->lock, flags); } static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio) From patchwork Tue Sep 1 18:52:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749385 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F423914E5 for ; Tue, 1 Sep 2020 18:55:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1A1D20866 for ; Tue, 1 Sep 2020 18:55:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986518; bh=v41xk5Vn0HlnkHGX0Tkm92X2XCgp2G1uiZOzISv126U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=O0i28/G5VTxywpIEqld8r17MhNPSypY1lEbE7ne+1Wk9X39LxeO4AeMBuUo5EzHmy QgkN9hZR9DQLzf4NARIfCfYFTRJuVc0AV8mxWmenXRZ+RIT6AL3xrRjr/pmNQh7wDX WPjrcYuee3Fs3xrnfc3L0qY+R5hR+JeBgYdURgjo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732185AbgIASzR (ORCPT ); Tue, 1 Sep 2020 14:55:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731873AbgIASyJ (ORCPT ); Tue, 1 Sep 2020 14:54:09 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A875C061244; Tue, 1 Sep 2020 11:54:09 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id p65so1752064qtd.2; Tue, 01 Sep 2020 11:54:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/gDenaieQk48kIz1Dxz8dwvYLvZzUa/DeKiS9toAb8Q=; b=I4fY2NLjgj8385ru65Sz82/4IM7JXkJS+IiDLhMZnRa1Tm8osQSWfk67eSJlTm0WCY RsfNgkUCYEO9BIV5hOX7ClROxm8cXDqwo4JYghrGukEbGm5zv8EZYPJxaR5UtjOLjoJg xqtq9gC8dlItEeFrQU1a2LRw96Nu5Uc56fCWikDtVpDQUeZZbo9dqnCNao0Eim+yNIcF 2/slpMVry900hho5fv/duDhqzbmF1mqk62JCtK2uFZc9Nl7iUmbU3ovM+aWZBSM9yL3W MMJfzTCvrZDbnLQlc2axXonMngqhlvUpEy4UT+5pD3vP9xzfTSAKRt9ZEiyGnFEyEZOM o7xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=/gDenaieQk48kIz1Dxz8dwvYLvZzUa/DeKiS9toAb8Q=; b=I7V7CC+046XltnR1ogUsWN7gZO+UwLPurb0gQzy5tG5ke+7CRyay2gqjueC7U6jVuo /tUsSgtqCuKH11Igyuc9aobHSJTJYf7UQRgTaCdp9BKxf/aZJIkzZ1mhYAqMWjoeRADy 9izZjw39wQSwEbf/Sae+hOGphGDdP1rEcNAgMjOdx5BwTB8+6TIwUElRy5in6UZs0msZ l4Pb0sjbFkm9Bv5AmJ/atwwvJZ/lTHAog94kxXnxbowxHjoEuPYmelHfPR3EnhTZrruW XxqsSTPLtxJypZCnhJbGwvEE3hjwKVOdWcGeZkOVEi7/BD9Uj3ZbHyXECaYkxWR1/not +72Q== X-Gm-Message-State: AOAM532GUhCuhxNNQHlKgi16TpxoPWbluv7cmQl25d8+VtxL+oSlLXTy 4+aY8e97n9Bm3PIMalvO+HY= X-Google-Smtp-Source: ABdhPJy9ioj3LNs/HrM9mnBi6yF6W8z59cw20fGMRsDxSKvZBP4vdISCIzUgLsVvF+WfcFZzzEX0Ww== X-Received: by 2002:ac8:6d0e:: with SMTP id o14mr3197641qtt.31.1598986448233; Tue, 01 Sep 2020 11:54:08 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id i66sm2672276qkc.63.2020.09.01.11.54.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:07 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo , Josef Bacik Subject: [PATCH 22/27] blk-iocost: implement delay adjustment hysteresis Date: Tue, 1 Sep 2020 14:52:52 -0400 Message-Id: <20200901185257.645114-23-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Curently, iocost syncs the delay duration to the outstanding debt amount, which seemed enough to protect the system from anon memory hogs. However, that was mostly because the delay calcuation was using hweight_inuse which quickly converges towards zero under debt for delay duration calculation, often pusnishing debtors overly harshly for longer than deserved. The previous patch fixed the delay calcuation and now the protection against anonymous memory hogs isn't enough because the effect of delay is indirect and non-linear and a huge amount of future debt can accumulate abruptly while unthrottled. This patch implements delay hysteresis so that delay is decayed exponentially over time instead of getting cleared immediately as debt is paid off. While the overall behavior is similar to the blk-cgroup implementation used by blk-iolatency, a lot of the details are different and due to the empirical nature of the mechanism, it's challenging to adapt the mechanism for one controller without negatively impacting the other. As the delay is gradually decayed now, there's no point in running it from its own hrtimer. Periodic updates are now performed from ioc_timer_fn() and the dedicated hrtimer is removed. Signed-off-by: Tejun Heo Cc: Josef Bacik --- block/blk-cgroup.c | 23 ++++++--- block/blk-iocost.c | 119 ++++++++++++++++++++++++++------------------- 2 files changed, 86 insertions(+), 56 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index c195365c9817..d33dd6be1d9c 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1613,16 +1613,24 @@ static void blkcg_scale_delay(struct blkcg_gq *blkg, u64 now) static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) { unsigned long pflags; + bool clamp; u64 now = ktime_to_ns(ktime_get()); u64 exp; u64 delay_nsec = 0; int tok; while (blkg->parent) { - if (atomic_read(&blkg->use_delay)) { + int use_delay = atomic_read(&blkg->use_delay); + + if (use_delay) { + u64 this_delay; + blkcg_scale_delay(blkg, now); - delay_nsec = max_t(u64, delay_nsec, - atomic64_read(&blkg->delay_nsec)); + this_delay = atomic64_read(&blkg->delay_nsec); + if (this_delay > delay_nsec) { + delay_nsec = this_delay; + clamp = use_delay > 0; + } } blkg = blkg->parent; } @@ -1634,10 +1642,13 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) * Let's not sleep for all eternity if we've amassed a huge delay. * Swapping or metadata IO can accumulate 10's of seconds worth of * delay, and we want userspace to be able to do _something_ so cap the - * delays at 1 second. If there's 10's of seconds worth of delay then - * the tasks will be delayed for 1 second for every syscall. + * delays at 0.25s. If there's 10's of seconds worth of delay then the + * tasks will be delayed for 0.25 second for every syscall. If + * blkcg_set_delay() was used as indicated by negative use_delay, the + * caller is responsible for regulating the range. */ - delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC); + if (clamp) + delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC); if (use_memdelay) psi_memstall_enter(&pflags); diff --git a/block/blk-iocost.c b/block/blk-iocost.c index d2b69d87f3e7..9cb8f29f01f5 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -270,6 +270,31 @@ enum { /* unbusy hysterisis */ UNBUSY_THR_PCT = 75, + /* + * The effect of delay is indirect and non-linear and a huge amount of + * future debt can accumulate abruptly while unthrottled. Linearly scale + * up delay as debt is going up and then let it decay exponentially. + * This gives us quick ramp ups while delay is accumulating and long + * tails which can help reducing the frequency of debt explosions on + * unthrottle. The parameters are experimentally determined. + * + * The delay mechanism provides adequate protection and behavior in many + * cases. However, this is far from ideal and falls shorts on both + * fronts. The debtors are often throttled too harshly costing a + * significant level of fairness and possibly total work while the + * protection against their impacts on the system can be choppy and + * unreliable. + * + * The shortcoming primarily stems from the fact that, unlike for page + * cache, the kernel doesn't have well-defined back-pressure propagation + * mechanism and policies for anonymous memory. Fully addressing this + * issue will likely require substantial improvements in the area. + */ + MIN_DELAY_THR_PCT = 500, + MAX_DELAY_THR_PCT = 25000, + MIN_DELAY = 250, + MAX_DELAY = 250 * USEC_PER_MSEC, + /* don't let cmds which take a very long time pin lagging for too long */ MAX_LAGGING_PERIODS = 10, @@ -473,6 +498,10 @@ struct ioc_gq { atomic64_t done_vtime; u64 abs_vdebt; + /* current delay in effect and when it started */ + u64 delay; + u64 delay_at; + /* * The period this iocg was last active in. Used for deactivation * and invalidating `vtime`. @@ -495,7 +524,6 @@ struct ioc_gq { struct wait_queue_head waitq; struct hrtimer waitq_timer; - struct hrtimer delay_timer; /* timestamp at the latest activation */ u64 activated_at; @@ -1204,58 +1232,50 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; struct blkcg_gq *blkg = iocg_to_blkg(iocg); - u64 vtime = atomic64_read(&iocg->vtime); - u64 delta_ns, expires, oexpires; + u64 tdelta, delay, new_delay; + s64 vover, vover_pct; u32 hwa; lockdep_assert_held(&iocg->waitq.lock); - /* debt-adjust vtime */ + /* calculate the current delay in effect - 1/2 every second */ + tdelta = now->now - iocg->delay_at; + if (iocg->delay) + delay = iocg->delay >> div64_u64(tdelta, USEC_PER_SEC); + else + delay = 0; + + /* calculate the new delay from the debt amount */ current_hweight(iocg, &hwa, NULL); - vtime += abs_cost_to_cost(iocg->abs_vdebt, hwa); + vover = atomic64_read(&iocg->vtime) + + abs_cost_to_cost(iocg->abs_vdebt, hwa) - now->vnow; + vover_pct = div64_s64(100 * vover, ioc->period_us * now->vrate); + + if (vover_pct <= MIN_DELAY_THR_PCT) + new_delay = 0; + else if (vover_pct >= MAX_DELAY_THR_PCT) + new_delay = MAX_DELAY; + else + new_delay = MIN_DELAY + + div_u64((MAX_DELAY - MIN_DELAY) * + (vover_pct - MIN_DELAY_THR_PCT), + MAX_DELAY_THR_PCT - MIN_DELAY_THR_PCT); - /* - * Clear or maintain depending on the overage. Non-zero vdebt is what - * guarantees that @iocg is online and future iocg_kick_delay() will - * clear use_delay. Don't leave it on when there's no vdebt. - */ - if (!iocg->abs_vdebt || time_before_eq64(vtime, now->vnow)) { - blkcg_clear_delay(blkg); - return false; + /* pick the higher one and apply */ + if (new_delay > delay) { + iocg->delay = new_delay; + iocg->delay_at = now->now; + delay = new_delay; } - if (!atomic_read(&blkg->use_delay) && - time_before_eq64(vtime, now->vnow + ioc->margins.target)) - return false; - - /* use delay */ - delta_ns = DIV64_U64_ROUND_UP(vtime - now->vnow, - now->vrate) * NSEC_PER_USEC; - blkcg_set_delay(blkg, delta_ns); - expires = now->now_ns + delta_ns; - /* if already active and close enough, don't bother */ - oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer)); - if (hrtimer_is_queued(&iocg->delay_timer) && - abs(oexpires - expires) <= ioc->timer_slack_ns) + if (delay >= MIN_DELAY) { + blkcg_set_delay(blkg, delay * NSEC_PER_USEC); return true; - - hrtimer_start_range_ns(&iocg->delay_timer, ns_to_ktime(expires), - ioc->timer_slack_ns, HRTIMER_MODE_ABS); - return true; -} - -static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer) -{ - struct ioc_gq *iocg = container_of(timer, struct ioc_gq, delay_timer); - struct ioc_now now; - unsigned long flags; - - spin_lock_irqsave(&iocg->waitq.lock, flags); - ioc_now(iocg->ioc, &now); - iocg_kick_delay(iocg, &now); - spin_unlock_irqrestore(&iocg->waitq.lock, flags); - - return HRTIMER_NORESTART; + } else { + iocg->delay = 0; + blkcg_clear_delay(blkg); + return false; + } } static void iocg_incur_debt(struct ioc_gq *iocg, u64 abs_cost, @@ -1356,9 +1376,10 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, atomic64_add(vpay, &iocg->done_vtime); iocg_pay_debt(iocg, abs_vpay, now); vbudget -= vpay; + } + if (iocg->abs_vdebt || iocg->delay) iocg_kick_delay(iocg, now); - } /* * Debt can still be outstanding if we haven't paid all yet or the @@ -1906,12 +1927,13 @@ static void ioc_timer_fn(struct timer_list *timer) */ list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) { if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt && - !iocg_is_idle(iocg)) + !iocg->delay && !iocg_is_idle(iocg)) continue; spin_lock(&iocg->waitq.lock); - if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt) { + if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt || + iocg->delay) { /* might be oversleeping vtime / hweight changes, kick */ iocg_kick_waitq(iocg, true, &now); } else if (iocg_is_idle(iocg)) { @@ -2641,8 +2663,6 @@ static void ioc_pd_init(struct blkg_policy_data *pd) init_waitqueue_head(&iocg->waitq); hrtimer_init(&iocg->waitq_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); iocg->waitq_timer.function = iocg_waitq_timer_fn; - hrtimer_init(&iocg->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); - iocg->delay_timer.function = iocg_delay_timer_fn; iocg->level = blkg->blkcg->css.cgroup->level; @@ -2679,7 +2699,6 @@ static void ioc_pd_free(struct blkg_policy_data *pd) spin_unlock_irqrestore(&ioc->lock, flags); hrtimer_cancel(&iocg->waitq_timer); - hrtimer_cancel(&iocg->delay_timer); } free_percpu(iocg->pcpu_stat); kfree(iocg); From patchwork Tue Sep 1 18:52:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749391 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 78BF614E5 for ; Tue, 1 Sep 2020 18:55:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5C53A208CA for ; Tue, 1 Sep 2020 18:55:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986543; bh=OsHfp158sxgAtGhavZFa9zfVqH8rzD+JJu1z3y7uhA0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=kycatoMPt9WD7BdJrL2pdJ84Bs3E/sKnc6IxArfEdO37pQ9+uiDA7TGjDWhJ86FZg uPCuBd+HlkzVA48R+c8UjMd+I03VLcz3tx2x+lGxRyyLVG27RsP6mJ7KQH2FXh+eZJ rvL/T2BJjSaxTsSWN3jk/SXxaOS90uFjleesmeyQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732039AbgIASzQ (ORCPT ); Tue, 1 Sep 2020 14:55:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731874AbgIASyL (ORCPT ); Tue, 1 Sep 2020 14:54:11 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 221C7C061249; Tue, 1 Sep 2020 11:54:11 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id b3so1725696qtg.13; Tue, 01 Sep 2020 11:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5UgoIrVUcwXACuCYd1cS80ywOzFHs5fY3NlOJo1IMJU=; b=d1BhVmQ+r2vM4ZCGNxytJ6sRkOxhcAKI98OMingTJiqDnyz/j5a9XSe3ok/dAbNH8j zms8uyQjXMnVVyYKZmWiZJ/wwTwlWWMHi7oRRHbYjLWdpvWXwjZopKrHfX5uWkT8Zwig mumwXLMcBAI+BvcJeUndRamVFjKVe1LXz+uh+oG3dzhGgPfCgL2fOzuuA2sZwrUNMKSG LdN/T0/Ju6ECBbJvIEotRusbWmhFZ4EbC0SeALRLGI92iqaHDWoAk7dvbICAybxD8Hnl 3SxwcS+7fl31WP1yOefToo777+sEBmsgaDfe2mKJMBa3Rs6l7LhMjN0OQU05F+Dbf7DP wy2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=5UgoIrVUcwXACuCYd1cS80ywOzFHs5fY3NlOJo1IMJU=; b=NaZww3sdLTXAg95eoABSPbIMuZq+X6OfA8pR2UR7sPyOt8vSslj03lHZfeFXzDvCl5 Wj0wzBNpaMO1smCP8/M0+F9FAPTBCzCkthemhVWIQMCOmWgkQWys08//EKxQqpM2tAij XBcu1OQevUwlxO/scylmjX8dGo9SUu1FIy9xoIOnZv01ydR67ocQ50uEGYkIk6De/nSJ NRN5nbVZ4xdRefvb6v27pH/0EaeNyD7Z4d1RIKEu93UJaV8FhT/hC79zdvqD8BVcYNoy ZU2osnrD63JSh/Kt1sA3++Q6N+vvDy0Snz+JBYCRoh21zNIrBbMKARkfXYOojrikDExa 4rRA== X-Gm-Message-State: AOAM5303maDPu2/mCK8ngyGPqVjXgireERo7qRm8votSOxWO8W3NnmCs VcogYRdhlIp6adO+w1Q4MkM= X-Google-Smtp-Source: ABdhPJxABrzIUmgvWAorSCTnHRblbapB7DxnIVnpOiQDJC/nlppjqKOVL++gwwuXfD1p/qGd2DadvA== X-Received: by 2002:ac8:67d2:: with SMTP id r18mr3343177qtp.179.1598986450224; Tue, 01 Sep 2020 11:54:10 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id k185sm2545673qkd.94.2020.09.01.11.54.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:09 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 23/27] blk-iocost: halve debts if device stays idle Date: Tue, 1 Sep 2020 14:52:53 -0400 Message-Id: <20200901185257.645114-24-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org A low weight iocg can amass a large amount of debt, for example, when anonymous memory gets reclaimed aggressively. If the system has a lot of memory paired with a slow IO device, the debt can span multiple seconds or more. If there are no other subsequent IO issuers, the in-debt iocg may end up blocked paying its debt while the IO device is idle. This patch implements a mechanism to protect against such pathological cases. If the device has been sufficiently idle for a substantial amount of time, the debts are halved. The criteria are on the conservative side as we want to resolve the rare extreme cases without impacting regular operation by forgiving debts too readily. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 49 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 9cb8f29f01f5..2a95a081cf44 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -295,6 +295,13 @@ enum { MIN_DELAY = 250, MAX_DELAY = 250 * USEC_PER_MSEC, + /* + * Halve debts if total usage keeps staying under 25% w/o any shortages + * for over 100ms. + */ + DEBT_BUSY_USAGE_PCT = 25, + DEBT_REDUCTION_IDLE_DUR = 100 * USEC_PER_MSEC, + /* don't let cmds which take a very long time pin lagging for too long */ MAX_LAGGING_PERIODS = 10, @@ -436,6 +443,9 @@ struct ioc { bool weights_updated; atomic_t hweight_gen; /* for lazy hweights */ + /* the last time debt cancel condition wasn't met */ + u64 debt_busy_at; + u64 autop_too_fast_at; u64 autop_too_slow_at; int autop_idx; @@ -1216,6 +1226,7 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) if (ioc->running == IOC_IDLE) { ioc->running = IOC_RUNNING; + ioc->debt_busy_at = now->now; ioc_start_period(ioc, now); } @@ -1896,7 +1907,8 @@ static void ioc_timer_fn(struct timer_list *timer) struct ioc_gq *iocg, *tiocg; struct ioc_now now; LIST_HEAD(surpluses); - int nr_shortages = 0, nr_lagging = 0; + int nr_debtors = 0, nr_shortages = 0, nr_lagging = 0; + u64 usage_us_sum = 0; u32 ppm_rthr = MILLION - ioc->params.qos[QOS_RPPM]; u32 ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM]; u32 missed_ppm[2], rq_wait_pct; @@ -1936,6 +1948,8 @@ static void ioc_timer_fn(struct timer_list *timer) iocg->delay) { /* might be oversleeping vtime / hweight changes, kick */ iocg_kick_waitq(iocg, true, &now); + if (iocg->abs_vdebt) + nr_debtors++; } else if (iocg_is_idle(iocg)) { /* no waiter and idle, deactivate */ __propagate_weights(iocg, 0, 0, false, &now); @@ -1978,6 +1992,7 @@ static void ioc_timer_fn(struct timer_list *timer) * high-latency completions appearing as idle. */ usage_us = iocg->usage_delta_us; + usage_us_sum += usage_us; if (vdone != vtime) { u64 inflight_us = DIV64_U64_ROUND_UP( @@ -2036,6 +2051,38 @@ static void ioc_timer_fn(struct timer_list *timer) list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list) list_del_init(&iocg->surplus_list); + /* + * A low weight iocg can amass a large amount of debt, for example, when + * anonymous memory gets reclaimed aggressively. If the system has a lot + * of memory paired with a slow IO device, the debt can span multiple + * seconds or more. If there are no other subsequent IO issuers, the + * in-debt iocg may end up blocked paying its debt while the IO device + * is idle. + * + * The following protects against such pathological cases. If the device + * has been sufficiently idle for a substantial amount of time, the + * debts are halved. The criteria are on the conservative side as we + * want to resolve the rare extreme cases without impacting regular + * operation by forgiving debts too readily. + */ + if (nr_shortages || + div64_u64(100 * usage_us_sum, now.now - ioc->period_at) >= + DEBT_BUSY_USAGE_PCT) + ioc->debt_busy_at = now.now; + + if (nr_debtors && + now.now - ioc->debt_busy_at >= DEBT_REDUCTION_IDLE_DUR) { + list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { + if (iocg->abs_vdebt) { + spin_lock(&iocg->waitq.lock); + iocg->abs_vdebt /= 2; + iocg_kick_waitq(iocg, true, &now); + spin_unlock(&iocg->waitq.lock); + } + } + ioc->debt_busy_at = now.now; + } + /* * If q is getting clogged or we're missing too much, we're issuing * too much IO and should lower vtime rate. If we're not missing From patchwork Tue Sep 1 18:52:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF519166C for ; Tue, 1 Sep 2020 18:55:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B255B20767 for ; Tue, 1 Sep 2020 18:55:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986517; bh=rh+8riLUr0egQeo5/yvPYZeAWeCIrfWDrCKuRxdmjo0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=tArWlvFI+SCfXIG2URUwtRDKrcUd1Bq1cB0CjSkHPaIiDgmxVPO8dB8cREyaUt/Jq BENUmFmWKNPy81k+pM7QSo4rW+u6/7711wIvMo1aCS2VwcvWi+muYdIjwBixKNQ1gk eYGlnLwLRxibh/hPAKMjAYF64JZLCssxXBiaoIpo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732146AbgIASzR (ORCPT ); Tue, 1 Sep 2020 14:55:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729198AbgIASyP (ORCPT ); Tue, 1 Sep 2020 14:54:15 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D89C4C061245; Tue, 1 Sep 2020 11:54:14 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id 60so1735493qtc.9; Tue, 01 Sep 2020 11:54:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ub30TeCAZwof0UjORTsGXysG7Mkcj3XZi2jYcPnWHuo=; b=EOFUZaMokh9VSlLVLwDg3BQQopVTEkqcrBcgAEAJPrvdFpOHJsMF5ArL0MQDPjByA2 /7afAkwTzIlMavh50rj0mXzgsTTB94D2vgWtTRu0yxFyFiRg8iWrY0sbavaJQ+P9aNM9 M7lMTVdR1nKWnJA9OsgYFJUbmmuPnS8QUkCc5cWN6k9R19gxPZfk8AeyrGdHvGDaiOXg H8Xh5QojFMVCIYUim8OtsHLtuZ6DaXY5EdRQ1gP9NK8/W3rE0nt6QxV0GO6vO3a1lSI4 /Ok8sFHSaLUOOoT7+zDE9oDBkxXSxLoWPlOu11PUCYtqgMywE5RCyiU3PNcCorrot2ZV KRgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=ub30TeCAZwof0UjORTsGXysG7Mkcj3XZi2jYcPnWHuo=; b=DR65dehxN5C89jtsQnAH0bpGj0ilZjS38Bjmd0j//MlEpXEbWe+gziKl15x9dfIeCH Df6fNlVVsH5W26uLcQaGz+sVwdZSjbkaMg9En/4ho7q9z1c00XhL7o6ITFfj+lAElviR +GNInSYuty0sjPBFPkbM0qYSDSMlFacEj2LQdQpyQhCp3x7x8brykliUnDWBnfMPwxuC rEAoXY1/TmEAtkVY9zjkzb2vr055RsMtCyz0HXX8Cdhu3W+KXNg3TbZfsA8xuOFjsNEB Meisuy1HinqZSQT0MZ4N0AIYRVuh0HWR23Xn4UgE0MD7GMnxqDi1R7y8092MCsobkZ1E 1GuQ== X-Gm-Message-State: AOAM531jQqFkZJVf8l3xIoLHJ6Tddus2y+2MIvdK3oKum9hlDV5f1Nxd 7m7Aq6hMQZp/JcPvC434y6s= X-Google-Smtp-Source: ABdhPJzcdY+s0pFoRfRo3PwWsJpdkxghZE/12Mgcy3MXC6pe4QWP8xlegWAnzGVuAolTsUamIuo5pg== X-Received: by 2002:ac8:6e99:: with SMTP id c25mr3245890qtv.324.1598986453892; Tue, 01 Sep 2020 11:54:13 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id f13sm2457042qko.122.2020.09.01.11.54.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:13 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 24/27] blk-iocost: implement vtime loss compensation Date: Tue, 1 Sep 2020 14:52:54 -0400 Message-Id: <20200901185257.645114-25-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When an iocg accumulates too much vtime or gets deactivated, we throw away some vtime, which lowers the overall device utilization. As the exact amount which is being thrown away is known, we can compensate by accelerating the vrate accordingly so that the extra vtime generated in the current period matches what got lost. This significantly improves work conservation when involving high weight cgroups with intermittent and bursty IO patterns. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 132 ++++++++++++++++++++++++++++++--------------- 1 file changed, 90 insertions(+), 42 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 2a95a081cf44..0270a504e6b5 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -224,20 +224,12 @@ enum { MARGIN_MIN_PCT = 10, MARGIN_LOW_PCT = 20, MARGIN_TARGET_PCT = 50, - MARGIN_MAX_PCT = 100, INUSE_ADJ_STEP_PCT = 25, /* Have some play in timer operations */ TIMER_SLACK_PCT = 1, - /* - * vtime can wrap well within a reasonable uptime when vrate is - * consistently raised. Don't trust recorded cgroup vtime if the - * period counter indicates that it's older than 5mins. - */ - VTIME_VALID_DUR = 300 * USEC_PER_SEC, - /* 1/64k is granular enough and can easily be handled w/ u32 */ WEIGHT_ONE = 1 << 16, @@ -395,7 +387,6 @@ struct ioc_margins { s64 min; s64 low; s64 target; - s64 max; }; struct ioc_missed { @@ -432,6 +423,8 @@ struct ioc { enum ioc_running running; atomic64_t vtime_rate; + u64 vtime_base_rate; + s64 vtime_err; seqcount_spinlock_t period_seqcount; u64 period_at; /* wallclock starttime */ @@ -760,12 +753,11 @@ static void ioc_refresh_margins(struct ioc *ioc) { struct ioc_margins *margins = &ioc->margins; u32 period_us = ioc->period_us; - u64 vrate = atomic64_read(&ioc->vtime_rate); + u64 vrate = ioc->vtime_base_rate; margins->min = (period_us * MARGIN_MIN_PCT / 100) * vrate; margins->low = (period_us * MARGIN_LOW_PCT / 100) * vrate; margins->target = (period_us * MARGIN_TARGET_PCT / 100) * vrate; - margins->max = (period_us * MARGIN_MAX_PCT / 100) * vrate; } /* latency Qos params changed, update period_us and all the dependent params */ @@ -831,8 +823,7 @@ static int ioc_autop_idx(struct ioc *ioc) return idx; /* step up/down based on the vrate */ - vrate_pct = div64_u64(atomic64_read(&ioc->vtime_rate) * 100, - VTIME_PER_USEC); + vrate_pct = div64_u64(ioc->vtime_base_rate * 100, VTIME_PER_USEC); now_ns = ktime_get_ns(); if (p->too_fast_vrate_pct && p->too_fast_vrate_pct <= vrate_pct) { @@ -940,6 +931,43 @@ static bool ioc_refresh_params(struct ioc *ioc, bool force) return true; } +/* + * When an iocg accumulates too much vtime or gets deactivated, we throw away + * some vtime, which lowers the overall device utilization. As the exact amount + * which is being thrown away is known, we can compensate by accelerating the + * vrate accordingly so that the extra vtime generated in the current period + * matches what got lost. + */ +static void ioc_refresh_vrate(struct ioc *ioc, struct ioc_now *now) +{ + s64 pleft = ioc->period_at + ioc->period_us - now->now; + s64 vperiod = ioc->period_us * ioc->vtime_base_rate; + s64 vcomp, vcomp_min, vcomp_max; + + lockdep_assert_held(&ioc->lock); + + /* we need some time left in this period */ + if (pleft <= 0) + goto done; + + /* + * Calculate how much vrate should be adjusted to offset the error. + * Limit the amount of adjustment and deduct the adjusted amount from + * the error. + */ + vcomp = -div64_s64(ioc->vtime_err, pleft); + vcomp_min = -(ioc->vtime_base_rate >> 1); + vcomp_max = ioc->vtime_base_rate; + vcomp = clamp(vcomp, vcomp_min, vcomp_max); + + ioc->vtime_err += vcomp * pleft; + + atomic64_set(&ioc->vtime_rate, ioc->vtime_base_rate + vcomp); +done: + /* bound how much error can accumulate */ + ioc->vtime_err = clamp(ioc->vtime_err, -vperiod, vperiod); +} + /* take a snapshot of the current [v]time and vrate */ static void ioc_now(struct ioc *ioc, struct ioc_now *now) { @@ -1152,8 +1180,8 @@ static void weight_updated(struct ioc_gq *iocg, struct ioc_now *now) static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; - u64 last_period, cur_period, max_period_delta; - u64 vtime, vmin; + u64 last_period, cur_period; + u64 vtime, vtarget; int i; /* @@ -1192,21 +1220,15 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) goto fail_unlock; /* - * vtime may wrap when vrate is raised substantially due to - * underestimated IO costs. Look at the period and ignore its - * vtime if the iocg has been idle for too long. Also, cap the - * budget it can start with to the margin. + * Always start with the target budget. On deactivation, we throw away + * anything above it. */ - max_period_delta = DIV64_U64_ROUND_UP(VTIME_VALID_DUR, ioc->period_us); + vtarget = now->vnow - ioc->margins.target; vtime = atomic64_read(&iocg->vtime); - vmin = now->vnow - ioc->margins.max; - if (last_period + max_period_delta < cur_period || - time_before64(vtime, vmin)) { - atomic64_add(vmin - vtime, &iocg->vtime); - atomic64_add(vmin - vtime, &iocg->done_vtime); - vtime = vmin; - } + atomic64_add(vtarget - vtime, &iocg->vtime); + atomic64_add(vtarget - vtime, &iocg->done_vtime); + vtime = vtarget; /* * Activate, propagate weight and start period timer if not @@ -1260,7 +1282,8 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) current_hweight(iocg, &hwa, NULL); vover = atomic64_read(&iocg->vtime) + abs_cost_to_cost(iocg->abs_vdebt, hwa) - now->vnow; - vover_pct = div64_s64(100 * vover, ioc->period_us * now->vrate); + vover_pct = div64_s64(100 * vover, + ioc->period_us * ioc->vtime_base_rate); if (vover_pct <= MIN_DELAY_THR_PCT) new_delay = 0; @@ -1421,7 +1444,8 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, /* determine next wakeup, add a timer margin to guarantee chunking */ vshortage = -ctx.vbudget; expires = now->now_ns + - DIV64_U64_ROUND_UP(vshortage, now->vrate) * NSEC_PER_USEC; + DIV64_U64_ROUND_UP(vshortage, ioc->vtime_base_rate) * + NSEC_PER_USEC; expires += ioc->timer_slack_ns; /* if already active and close enough, don't bother */ @@ -1536,6 +1560,7 @@ static void iocg_build_inner_walk(struct ioc_gq *iocg, /* collect per-cpu counters and propagate the deltas to the parent */ static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now) { + struct ioc *ioc = iocg->ioc; struct iocg_stat new_stat; u64 abs_vusage = 0; u64 vusage_delta; @@ -1551,7 +1576,7 @@ static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now) vusage_delta = abs_vusage - iocg->last_stat_abs_vusage; iocg->last_stat_abs_vusage = abs_vusage; - iocg->usage_delta_us = div64_u64(vusage_delta, now->vrate); + iocg->usage_delta_us = div64_u64(vusage_delta, ioc->vtime_base_rate); iocg->local_stat.usage_us += iocg->usage_delta_us; new_stat.usage_us = @@ -1593,8 +1618,8 @@ static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now) * capacity. @hwm is the upper bound and used to signal no donation. This * function also throws away @iocg's excess budget. */ -static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, - struct ioc_now *now) +static u32 hweight_after_donation(struct ioc_gq *iocg, u32 old_hwi, u32 hwm, + u32 usage, struct ioc_now *now) { struct ioc *ioc = iocg->ioc; u64 vtime = atomic64_read(&iocg->vtime); @@ -1609,12 +1634,13 @@ static u32 hweight_after_donation(struct ioc_gq *iocg, u32 hwm, u32 usage, time_after64(vtime, now->vnow - ioc->margins.min)) return hwm; - /* throw away excess above max */ - excess = now->vnow - vtime - ioc->margins.max; + /* throw away excess above target */ + excess = now->vnow - vtime - ioc->margins.target; if (excess > 0) { atomic64_add(excess, &iocg->vtime); atomic64_add(excess, &iocg->done_vtime); vtime += excess; + ioc->vtime_err -= div64_u64(excess * old_hwi, WEIGHT_ONE); } /* @@ -1952,6 +1978,24 @@ static void ioc_timer_fn(struct timer_list *timer) nr_debtors++; } else if (iocg_is_idle(iocg)) { /* no waiter and idle, deactivate */ + u64 vtime = atomic64_read(&iocg->vtime); + s64 excess; + + /* + * @iocg has been inactive for a full duration and will + * have a high budget. Account anything above target as + * error and throw away. On reactivation, it'll start + * with the target budget. + */ + excess = now.vnow - vtime - ioc->margins.target; + if (excess > 0) { + u32 old_hwi; + + current_hweight(iocg, NULL, &old_hwi); + ioc->vtime_err -= div64_u64(excess * old_hwi, + WEIGHT_ONE); + } + __propagate_weights(iocg, 0, 0, false, &now); list_del_init(&iocg->active_list); } @@ -1997,7 +2041,7 @@ static void ioc_timer_fn(struct timer_list *timer) if (vdone != vtime) { u64 inflight_us = DIV64_U64_ROUND_UP( cost_to_abs_cost(vtime - vdone, hw_inuse), - now.vrate); + ioc->vtime_base_rate); usage_us = max(usage_us, inflight_us); } @@ -2017,16 +2061,16 @@ static void ioc_timer_fn(struct timer_list *timer) if (hw_inuse < hw_active || (!waitqueue_active(&iocg->waitq) && time_before64(vtime, now.vnow - ioc->margins.low))) { - u32 hwa, hwm, new_hwi; + u32 hwa, old_hwi, hwm, new_hwi; /* * Already donating or accumulated enough to start. * Determine the donation amount. */ - current_hweight(iocg, &hwa, NULL); + current_hweight(iocg, &hwa, &old_hwi); hwm = current_hweight_max(iocg); - new_hwi = hweight_after_donation(iocg, hwm, usage, - &now); + new_hwi = hweight_after_donation(iocg, old_hwi, hwm, + usage, &now); if (new_hwi < hwm) { iocg->hweight_donating = hwa; iocg->hweight_after_donation = new_hwi; @@ -2130,7 +2174,7 @@ static void ioc_timer_fn(struct timer_list *timer) ioc->busy_level = clamp(ioc->busy_level, -1000, 1000); if (ioc->busy_level > 0 || (ioc->busy_level < 0 && !nr_lagging)) { - u64 vrate = atomic64_read(&ioc->vtime_rate); + u64 vrate = ioc->vtime_base_rate; u64 vrate_min = ioc->vrate_min, vrate_max = ioc->vrate_max; /* rq_wait signal is always reliable, ignore user vrate_min */ @@ -2167,7 +2211,7 @@ static void ioc_timer_fn(struct timer_list *timer) trace_iocost_ioc_vrate_adj(ioc, vrate, missed_ppm, rq_wait_pct, nr_lagging, nr_shortages); - atomic64_set(&ioc->vtime_rate, vrate); + ioc->vtime_base_rate = vrate; ioc_refresh_margins(ioc); } else if (ioc->busy_level != prev_busy_level || nr_lagging) { trace_iocost_ioc_vrate_adj(ioc, atomic64_read(&ioc->vtime_rate), @@ -2188,8 +2232,11 @@ static void ioc_timer_fn(struct timer_list *timer) ioc_start_period(ioc, &now); } else { ioc->busy_level = 0; + ioc->vtime_err = 0; ioc->running = IOC_IDLE; } + + ioc_refresh_vrate(ioc, &now); } spin_unlock_irq(&ioc->lock); @@ -2628,6 +2675,7 @@ static int blk_iocost_init(struct request_queue *q) INIT_LIST_HEAD(&ioc->active_iocgs); ioc->running = IOC_IDLE; + ioc->vtime_base_rate = VTIME_PER_USEC; atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC); seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock); ioc->period_at = ktime_to_us(ktime_get()); @@ -2762,7 +2810,7 @@ static size_t ioc_pd_stat(struct blkg_policy_data *pd, char *buf, size_t size) if (iocg->level == 0) { unsigned vp10k = DIV64_U64_ROUND_CLOSEST( - atomic64_read(&ioc->vtime_rate) * 10000, + ioc->vtime_base_rate * 10000, VTIME_PER_USEC); pos += scnprintf(buf + pos, size - pos, " cost.vrate=%u.%02u", vp10k / 100, vp10k % 100); From patchwork Tue Sep 1 18:52:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749377 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F32BD1575 for ; Tue, 1 Sep 2020 18:55:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DBA8920866 for ; Tue, 1 Sep 2020 18:55:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986510; bh=eJDpSQ/jBhsVWnqtKwwYIXs3wEWe2CNGAqk4PFQf180=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=tCdE/JT2ydhaG2oXgEE2mi2RLk5H5QiKGV8DWCGelSE0DGd+tsbLKbJA5piZkucT9 1hKMJc0y0XOA2x2JjAZHBdJmr/AuFqjccxqeNK3JcnL7nijzejXvXJiOjRy673zChx BPSsFmGvHdReWOrwOrpg4ohp/ODOZxnz4TConRHs= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732110AbgIASzA (ORCPT ); Tue, 1 Sep 2020 14:55:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731783AbgIASyT (ORCPT ); Tue, 1 Sep 2020 14:54:19 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B19AC061244; Tue, 1 Sep 2020 11:54:18 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id o5so1982059qke.12; Tue, 01 Sep 2020 11:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=H5bqavxbqk85onXAByeypQmTg/sTVdfkgjS7XfGpNMQ=; b=am9+85JN4GRBuF8BDWFu1kcmXapeiLESBbeO2JWuzObgkvndOBzUEW5c8Cs3bWO45B soo1wZFJVSJgViU3TVF/amCK2K8rK+5eU0g3FyLarODlyKoB7aB5pObEvWDHSAVfjdKn 7A5fIa8Uep0kNYxTmTEiCxHdbm0pqDTvXTKl0rXbi/726sr87Kc19jUuKMRRz3mNmg6n i94zbBVxEVyd6TSJKNe8l5quOIiACN3/rQ1lG19Pw+cKwOLvUqJe02RiOWfyVHNVJ4lp O5fh3KrvfgaxTeRr0i2Q80Ymdy+n0Wd//ctPLB5EaYLvtUSInqMCAsJd979AnRbhMasQ QWWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=H5bqavxbqk85onXAByeypQmTg/sTVdfkgjS7XfGpNMQ=; b=hk6yzeKcLE57uR081Y5UulZcno+JLxC+GlGtWu7lGmEIb0DAo2lce5Ur73tv9+acok q/njHVIAfFg801WFAjNVghib9iPB3m/JY7Xbas6jtMCI0Ox5jnc6qI6H+JEq0TIBwmn/ y5AKX2uKl8ydMjaVyBMlj34bTO8JhvCKhvPZy0kfoCHmbvmHMHUu3rxCGWt7v42cyw/3 w95m71kr3GfF394tqMo2HscvXdXAgzo1UcdJc/19Z7atVU0G1xGvrwIoyKhWO2BNRqUP Imdyoa6axvaRN8OhNI993MWQ0N6BJhusxd8vt3KDSDMy0o31QRHnSgyOyuzTAS+lD3aH i1IQ== X-Gm-Message-State: AOAM5322TdzUfKDtlLvgamPnLXFPcdkQOwHSZ902dvrQExGxAGWvh+Ax jaP/mgZR2nkMJSKoV7+okIY= X-Google-Smtp-Source: ABdhPJyO0a9oHk4MM+e9MbfpbLTA2zRCF+wakZ3DyQrCz6X3Pxd9oPmG2DZI4hGsAR4BuvSSS8jjrw== X-Received: by 2002:a37:a281:: with SMTP id l123mr3271032qke.171.1598986457628; Tue, 01 Sep 2020 11:54:17 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id r6sm2694793qkc.43.2020.09.01.11.54.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:17 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 25/27] blk-iocost: restore inuse update tracepoints Date: Tue, 1 Sep 2020 14:52:55 -0400 Message-Id: <20200901185257.645114-26-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Update and restore the inuse update tracepoints. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 16 ++++++++++++++++ include/trace/events/iocost.h | 6 +++--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 0270a504e6b5..9366527d8c12 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1919,6 +1919,12 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now) inuse = DIV64_U64_ROUND_UP( parent->child_adjusted_sum * iocg->hweight_after_donation, parent->hweight_inuse); + + TRACE_IOCG_PATH(inuse_transfer, iocg, now, + iocg->inuse, inuse, + iocg->hweight_inuse, + iocg->hweight_after_donation); + __propagate_weights(iocg, iocg->active, inuse, true, now); } @@ -2076,6 +2082,10 @@ static void ioc_timer_fn(struct timer_list *timer) iocg->hweight_after_donation = new_hwi; list_add(&iocg->surplus_list, &surpluses); } else { + TRACE_IOCG_PATH(inuse_shortage, iocg, &now, + iocg->inuse, iocg->active, + iocg->hweight_inuse, new_hwi); + __propagate_weights(iocg, iocg->active, iocg->active, true, &now); nr_shortages++; @@ -2248,11 +2258,13 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime, struct ioc *ioc = iocg->ioc; struct ioc_margins *margins = &ioc->margins; u32 adj_step = DIV_ROUND_UP(iocg->active * INUSE_ADJ_STEP_PCT, 100); + u32 __maybe_unused old_inuse = iocg->inuse, __maybe_unused old_hwi; u32 hwi; s64 margin; u64 cost, new_inuse; current_hweight(iocg, NULL, &hwi); + old_hwi = hwi; cost = abs_cost_to_cost(abs_cost, hwi); margin = now->vnow - vtime - cost; @@ -2287,6 +2299,10 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime, iocg->inuse != iocg->active); spin_unlock_irq(&ioc->lock); + + TRACE_IOCG_PATH(inuse_adjust, iocg, now, + old_inuse, iocg->inuse, old_hwi, hwi); + return cost; } diff --git a/include/trace/events/iocost.h b/include/trace/events/iocost.h index ee024fe8fef6..b350860d2e71 100644 --- a/include/trace/events/iocost.h +++ b/include/trace/events/iocost.h @@ -95,7 +95,7 @@ DECLARE_EVENT_CLASS(iocg_inuse_update, ) ); -DEFINE_EVENT(iocg_inuse_update, iocost_inuse_takeback, +DEFINE_EVENT(iocg_inuse_update, iocost_inuse_shortage, TP_PROTO(struct ioc_gq *iocg, const char *path, struct ioc_now *now, u32 old_inuse, u32 new_inuse, @@ -105,7 +105,7 @@ DEFINE_EVENT(iocg_inuse_update, iocost_inuse_takeback, old_hw_inuse, new_hw_inuse) ); -DEFINE_EVENT(iocg_inuse_update, iocost_inuse_giveaway, +DEFINE_EVENT(iocg_inuse_update, iocost_inuse_transfer, TP_PROTO(struct ioc_gq *iocg, const char *path, struct ioc_now *now, u32 old_inuse, u32 new_inuse, @@ -115,7 +115,7 @@ DEFINE_EVENT(iocg_inuse_update, iocost_inuse_giveaway, old_hw_inuse, new_hw_inuse) ); -DEFINE_EVENT(iocg_inuse_update, iocost_inuse_reset, +DEFINE_EVENT(iocg_inuse_update, iocost_inuse_adjust, TP_PROTO(struct ioc_gq *iocg, const char *path, struct ioc_now *now, u32 old_inuse, u32 new_inuse, From patchwork Tue Sep 1 18:52:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749371 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C12CD166C for ; Tue, 1 Sep 2020 18:54:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A4B8820866 for ; Tue, 1 Sep 2020 18:54:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986475; bh=InOQ+6F94LiBAqxNdhhEU3k85Aza8w0ZcOHPbzA+AC4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=aK0a92i9e9+PKMq6zebaKM+kPhEo6bEO7K2M/1Ek4wf0F7gub8I6Z2oRuHMm7X3eg GySvOzhKw1r/qOff9GJQxLBvenMnKfhPgy5OCCE0sdVzxHqdLRpW6wEEMeZpuNrLZa /A6vpDpNCkXuws1zhJbPuBcLpdRJY8hBXDZZtT4s= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731963AbgIASyd (ORCPT ); Tue, 1 Sep 2020 14:54:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731917AbgIASyW (ORCPT ); Tue, 1 Sep 2020 14:54:22 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E312C061245; Tue, 1 Sep 2020 11:54:21 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id 60so1735794qtc.9; Tue, 01 Sep 2020 11:54:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=m+qTmQYuX+yoSNUOz4jTM/Jvux+Eq7PX2lLyJ9lYHRM=; b=XipZHXfU9XrCtlGklpOQGNXRadhcHdExtLsTEZL7t60433LLNu7tEHGjQqBqMC+iR7 dzOZrvVvPn13/zRt1PKU3F9yOKmtRpSRDlUXyHHWyzh8LKcLYdSQ4C5U9dSjstz6fM0M j4gZRzS5tOd4+1i6w3lS3cx1sPdlhcRfer1SkYCs6qDq7Oi+WJoj+1xBKAZdM+qZd3XQ JnpfCe0cEXjxqluYe3gDlwQCygUtDoiivx5w4r914bM2z2dyq0ESorJJuIySt6s90gmS dVBA+q9jEx5wqCLzMbOcH/IX3zWUiR1PQnygRDH6WXwyH0wNqYOYCRbi0sFgurl8XLdC i9Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=m+qTmQYuX+yoSNUOz4jTM/Jvux+Eq7PX2lLyJ9lYHRM=; b=snSnaJkGV5z1+i3jtjIcUK5f2qsbX2MDEmhG2ZTAFWHCbXtF4LGse1SPfn10pAw/7u NUIfYJmKAZ5vi89amZPe32U0v9vLq9vuTVnnSKBmCJ7CqPj34bs8q8wsF2kHAd8x/yg7 NpT5QzX7z8FeqiBI2F3s+cmfssNHmxGQfRV7CJdxDqVykI5kMmvDVQkoYmVd3M014fyO kr6J5m8XXrTgqbJoz/wupnJnsC5UKKENi59Wk4e6jxpOgOT33lk0rArkHAYLWIHEDI/3 5pbONgDR2Vw07y5QpVwInbwQnwrml9QKspeaN9Z6Aq+h9fRw4gZW+6ypKudMH4YjNwo6 RNMg== X-Gm-Message-State: AOAM530We733ZwdkFYv/5m3KzdSj5qudSNgUyTOaq2/SYE9ts7U0LteP 5aDdrk3X8wjGNmkwg3/RQEs= X-Google-Smtp-Source: ABdhPJzUspyaN4Tu+/SqUVmwCZWio6vGZQ+22rInJpzfHUowHkMwaTKle0jWv9IsEhZA4s2chmePNw== X-Received: by 2002:ac8:6d32:: with SMTP id r18mr3346066qtu.246.1598986460695; Tue, 01 Sep 2020 11:54:20 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id i65sm2409921qkf.126.2020.09.01.11.54.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:20 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 26/27] blk-iocost: add three debug stat - cost.wait, indebt and indelay Date: Tue, 1 Sep 2020 14:52:56 -0400 Message-Id: <20200901185257.645114-27-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org These are really cheap to collect and can be useful in debugging iocost behavior. Add them as debug stats for now. Signed-off-by: Tejun Heo --- block/blk-iocost.c | 77 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 72 insertions(+), 5 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 9366527d8c12..fc897bb142bc 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -452,6 +452,9 @@ struct iocg_pcpu_stat { struct iocg_stat { u64 usage_us; + u64 wait_us; + u64 indebt_us; + u64 indelay_us; }; /* per device-cgroup pair */ @@ -538,6 +541,9 @@ struct ioc_gq { struct iocg_stat last_stat; u64 last_stat_abs_vusage; u64 usage_delta_us; + u64 wait_since; + u64 indebt_since; + u64 indelay_since; /* this iocg's depth in the hierarchy and ancestors including self */ int level; @@ -1303,9 +1309,15 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now) } if (delay >= MIN_DELAY) { + if (!iocg->indelay_since) + iocg->indelay_since = now->now; blkcg_set_delay(blkg, delay * NSEC_PER_USEC); return true; } else { + if (iocg->indelay_since) { + iocg->local_stat.indelay_us += now->now - iocg->indelay_since; + iocg->indelay_since = 0; + } iocg->delay = 0; blkcg_clear_delay(blkg); return false; @@ -1325,8 +1337,10 @@ static void iocg_incur_debt(struct ioc_gq *iocg, u64 abs_cost, * Once in debt, debt handling owns inuse. @iocg stays at the minimum * inuse donating all of it share to others until its debt is paid off. */ - if (!iocg->abs_vdebt && abs_cost) + if (!iocg->abs_vdebt && abs_cost) { + iocg->indebt_since = now->now; propagate_weights(iocg, iocg->active, 0, false, now); + } iocg->abs_vdebt += abs_cost; @@ -1348,9 +1362,13 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); /* if debt is paid in full, restore inuse */ - if (!iocg->abs_vdebt) + if (!iocg->abs_vdebt) { + iocg->local_stat.indebt_us += now->now - iocg->indebt_since; + iocg->indebt_since = 0; + propagate_weights(iocg, iocg->active, iocg->last_inuse, false, now); + } } static int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode, @@ -1436,8 +1454,17 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt, __wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx); - if (!waitqueue_active(&iocg->waitq)) + if (!waitqueue_active(&iocg->waitq)) { + if (iocg->wait_since) { + iocg->local_stat.wait_us += now->now - iocg->wait_since; + iocg->wait_since = 0; + } return; + } + + if (!iocg->wait_since) + iocg->wait_since = now->now; + if (WARN_ON_ONCE(ctx.vbudget >= 0)) return; @@ -1579,8 +1606,15 @@ static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now) iocg->usage_delta_us = div64_u64(vusage_delta, ioc->vtime_base_rate); iocg->local_stat.usage_us += iocg->usage_delta_us; + /* propagate upwards */ new_stat.usage_us = iocg->local_stat.usage_us + iocg->desc_stat.usage_us; + new_stat.wait_us = + iocg->local_stat.wait_us + iocg->desc_stat.wait_us; + new_stat.indebt_us = + iocg->local_stat.indebt_us + iocg->desc_stat.indebt_us; + new_stat.indelay_us = + iocg->local_stat.indelay_us + iocg->desc_stat.indelay_us; /* propagate the deltas to the parent */ if (iocg->level > 0) { @@ -1589,6 +1623,12 @@ static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now) parent_stat->usage_us += new_stat.usage_us - iocg->last_stat.usage_us; + parent_stat->wait_us += + new_stat.wait_us - iocg->last_stat.wait_us; + parent_stat->indebt_us += + new_stat.indebt_us - iocg->last_stat.indebt_us; + parent_stat->indelay_us += + new_stat.indelay_us - iocg->last_stat.indelay_us; } iocg->last_stat = new_stat; @@ -1961,8 +2001,6 @@ static void ioc_timer_fn(struct timer_list *timer) return; } - iocg_flush_stat(&ioc->active_iocgs, &now); - /* * Waiters determine the sleep durations based on the vrate they * saw at the time of sleep. If vrate has increased, some waiters @@ -1976,6 +2014,22 @@ static void ioc_timer_fn(struct timer_list *timer) spin_lock(&iocg->waitq.lock); + /* flush wait and indebt stat deltas */ + if (iocg->wait_since) { + iocg->local_stat.wait_us += now.now - iocg->wait_since; + iocg->wait_since = now.now; + } + if (iocg->indebt_since) { + iocg->local_stat.indebt_us += + now.now - iocg->indebt_since; + iocg->indebt_since = now.now; + } + if (iocg->indelay_since) { + iocg->local_stat.indelay_us += + now.now - iocg->indelay_since; + iocg->indelay_since = now.now; + } + if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt || iocg->delay) { /* might be oversleeping vtime / hweight changes, kick */ @@ -2010,6 +2064,12 @@ static void ioc_timer_fn(struct timer_list *timer) } commit_weights(ioc); + /* + * Wait and indebt stat are flushed above and the donation calculation + * below needs updated usage stat. Let's bring stat up-to-date. + */ + iocg_flush_stat(&ioc->active_iocgs, &now); + /* calc usage and see whether some weights need to be moved around */ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) { u64 vdone, vtime, usage_us, usage_dur; @@ -2835,6 +2895,13 @@ static size_t ioc_pd_stat(struct blkg_policy_data *pd, char *buf, size_t size) pos += scnprintf(buf + pos, size - pos, " cost.usage=%llu", iocg->last_stat.usage_us); + if (blkcg_debug_stats) + pos += scnprintf(buf + pos, size - pos, + " cost.wait=%llu cost.indebt=%llu cost.indelay=%llu", + iocg->last_stat.wait_us, + iocg->last_stat.indebt_us, + iocg->last_stat.indelay_us); + return pos; } From patchwork Tue Sep 1 18:52:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11749375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A886166C for ; Tue, 1 Sep 2020 18:54:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED35520866 for ; Tue, 1 Sep 2020 18:54:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598986498; bh=NGKu8RRxvqDYFJk1Bjo+x9ilWjO9wk+0cCLgeXz2njk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=kvvGkwDG10vBb6inVQ6b3yrQnUTMQnxcCAYTcZh/ftraq38X25BAF5S1RyIJ/AZwd AESga+SzXOZZgvgXMmJFqnW0+Wm8qaNRJ0BxHmSDe+uvjp9cqzxN0Q+dJDM8bbjVdn sPrEr7C8XVqNWj5S3rBqCtCZJZTTxQhKduIKo5n4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731954AbgIASyc (ORCPT ); Tue, 1 Sep 2020 14:54:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731934AbgIASyZ (ORCPT ); Tue, 1 Sep 2020 14:54:25 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 651CEC061244; Tue, 1 Sep 2020 11:54:25 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id v69so1995109qkb.7; Tue, 01 Sep 2020 11:54:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cm+df0jFPyE1zTjCqgFLh2q6gImstDntHA7G0hKbn1c=; b=XajDkh8P9KQTJXpi9a5vP6TJ7CkEhXM4INtbnSsTamKj77lttU5yXy9F/WfSWTKkxK TH5ddWvVKmWD8wKvRAttyvS3Hgh7AoiRULzZSGWVQvleLF5fH4ms+fK/evyW3FMFqfl4 B4z5fZo3UOC08McpaqbKgKFmzLEPi7zu/4gvz4qaHXJQPqK/4QI++6MWT5BGL/paChPI wY+8Puhdm0BglBfCawCT0Do8mk4x0L1pmSSWJcKpPNEBEOuofB3vh6hoyFBMc0Qe3B2c Fm+FWJOo51I2OrqxBjvSla140/ntPH3CsqqFeutrsnj4zy+W5HEkgRYSlMFTxOeFMvqE 1A+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=cm+df0jFPyE1zTjCqgFLh2q6gImstDntHA7G0hKbn1c=; b=tPdi3q/3Vg/+xxW+CEJ/4qotvu+adlPZaPr2IA1olq86BIWDiJxEjp4ER4zcXXs8En hREwp6VIEpKnOanVDPmG09PufQD0DN1YXKKW8GON6PFpnaajPm+brDcUY2roqlkuhBc7 7549HzUPsfElUNheIEuWPCL0LqEuYzcqgE1/DHhkNImipe/esxczHVak7XsUeLTBsX0h XPrnav+b7jzVql1xJXH9PE4nfCaTagY3bT2KAdafYdmJyMdnmZHfneWwmcK7xO6zyZrk i3lPHnda9RfNh9CdIPflDksCNqEgKvJCX/GgPGHwiAVLMnfuHz0fgIQYlG4hfvoY2j4l G5kQ== X-Gm-Message-State: AOAM530gtNKi0VtUDaFekRpwhf139BGAdDpVi11bCcjPFCa1IVZVVbZB CBnoWbPaTtX4QCfeeMu05twagEXtXD2LAg== X-Google-Smtp-Source: ABdhPJyC1zrmWahk0Un9SvEpY6Bbw4MwTkW6QE19cV+1OyUU/CrIaLlLAJ85naK2tPp7q0zf9GIF9g== X-Received: by 2002:a37:4c15:: with SMTP id z21mr3170782qka.334.1598986464504; Tue, 01 Sep 2020 11:54:24 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a198]) by smtp.gmail.com with ESMTPSA id p68sm2697824qka.78.2020.09.01.11.54.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 11:54:24 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, newella@fb.com, Tejun Heo Subject: [PATCH 27/27] blk-iocost: update iocost_monitor.py Date: Tue, 1 Sep 2020 14:52:57 -0400 Message-Id: <20200901185257.645114-28-tj@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200901185257.645114-1-tj@kernel.org> References: <20200901185257.645114-1-tj@kernel.org> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org iocost went through significant internal changes. Update iocost_monitor.py accordingly. Signed-off-by: Tejun Heo --- tools/cgroup/iocost_monitor.py | 54 ++++++++++++---------------------- 1 file changed, 19 insertions(+), 35 deletions(-) diff --git a/tools/cgroup/iocost_monitor.py b/tools/cgroup/iocost_monitor.py index f4699f9b46ba..c4ff907c078b 100644 --- a/tools/cgroup/iocost_monitor.py +++ b/tools/cgroup/iocost_monitor.py @@ -45,8 +45,7 @@ args = parser.parse_args() err('The kernel does not have iocost enabled') IOC_RUNNING = prog['IOC_RUNNING'].value_() -NR_USAGE_SLOTS = prog['NR_USAGE_SLOTS'].value_() -HWEIGHT_WHOLE = prog['HWEIGHT_WHOLE'].value_() +WEIGHT_ONE = prog['WEIGHT_ONE'].value_() VTIME_PER_SEC = prog['VTIME_PER_SEC'].value_() VTIME_PER_USEC = prog['VTIME_PER_USEC'].value_() AUTOP_SSD_FAST = prog['AUTOP_SSD_FAST'].value_() @@ -100,7 +99,7 @@ autop_names = { self.period_ms = ioc.period_us.value_() / 1_000 self.period_at = ioc.period_at.value_() / 1_000_000 self.vperiod_at = ioc.period_at_vtime.value_() / VTIME_PER_SEC - self.vrate_pct = ioc.vtime_rate.counter.value_() * 100 / VTIME_PER_USEC + self.vrate_pct = ioc.vtime_base_rate.value_() * 100 / VTIME_PER_USEC self.busy_level = ioc.busy_level.value_() self.autop_idx = ioc.autop_idx.value_() self.user_cost_model = ioc.user_cost_model.value_() @@ -136,7 +135,7 @@ autop_names = { def table_header_str(self): return f'{"":25} active {"weight":>9} {"hweight%":>13} {"inflt%":>6} ' \ - f'{"dbt":>3} {"delay":>6} {"usages%"}' + f'{"debt":>7} {"delay":>7} {"usage%"}' class IocgStat: def __init__(self, iocg): @@ -144,11 +143,11 @@ autop_names = { blkg = iocg.pd.blkg self.is_active = not list_empty(iocg.active_list.address_of_()) - self.weight = iocg.weight.value_() - self.active = iocg.active.value_() - self.inuse = iocg.inuse.value_() - self.hwa_pct = iocg.hweight_active.value_() * 100 / HWEIGHT_WHOLE - self.hwi_pct = iocg.hweight_inuse.value_() * 100 / HWEIGHT_WHOLE + self.weight = iocg.weight.value_() / WEIGHT_ONE + self.active = iocg.active.value_() / WEIGHT_ONE + self.inuse = iocg.inuse.value_() / WEIGHT_ONE + self.hwa_pct = iocg.hweight_active.value_() * 100 / WEIGHT_ONE + self.hwi_pct = iocg.hweight_inuse.value_() * 100 / WEIGHT_ONE self.address = iocg.value_() vdone = iocg.done_vtime.counter.value_() @@ -160,23 +159,13 @@ autop_names = { else: self.inflight_pct = 0 - # vdebt used to be an atomic64_t and is now u64, support both - try: - self.debt_ms = iocg.abs_vdebt.counter.value_() / VTIME_PER_USEC / 1000 - except: - self.debt_ms = iocg.abs_vdebt.value_() / VTIME_PER_USEC / 1000 - - self.use_delay = blkg.use_delay.counter.value_() - self.delay_ms = blkg.delay_nsec.counter.value_() / 1_000_000 - - usage_idx = iocg.usage_idx.value_() - self.usages = [] - self.usage = 0 - for i in range(NR_USAGE_SLOTS): - usage = iocg.usages[(usage_idx + 1 + i) % NR_USAGE_SLOTS].value_() - upct = usage * 100 / HWEIGHT_WHOLE - self.usages.append(upct) - self.usage = max(self.usage, upct) + self.usage = (100 * iocg.usage_delta_us.value_() / + ioc.period_us.value_()) if self.active else 0 + self.debt_ms = iocg.abs_vdebt.value_() / VTIME_PER_USEC / 1000 + if blkg.use_delay.counter.value_() != 0: + self.delay_ms = blkg.delay_nsec.counter.value_() / 1_000_000 + else: + self.delay_ms = 0 def dict(self, now, path): out = { 'cgroup' : path, @@ -189,25 +178,20 @@ autop_names = { 'hweight_inuse_pct' : self.hwi_pct, 'inflight_pct' : self.inflight_pct, 'debt_ms' : self.debt_ms, - 'use_delay' : self.use_delay, 'delay_ms' : self.delay_ms, 'usage_pct' : self.usage, 'address' : self.address } - for i in range(len(self.usages)): - out[f'usage_pct_{i}'] = str(self.usages[i]) return out def table_row_str(self, path): out = f'{path[-28:]:28} ' \ f'{"*" if self.is_active else " "} ' \ - f'{self.inuse:5}/{self.active:5} ' \ + f'{round(self.inuse):5}/{round(self.active):5} ' \ f'{self.hwi_pct:6.2f}/{self.hwa_pct:6.2f} ' \ f'{self.inflight_pct:6.2f} ' \ - f'{min(math.ceil(self.debt_ms), 999):3} ' \ - f'{min(self.use_delay, 99):2}*'\ - f'{min(math.ceil(self.delay_ms), 999):03} ' - for u in self.usages: - out += f'{min(round(u), 999):03d}:' + f'{self.debt_ms:7.2f} ' \ + f'{self.delay_ms:7.2f} '\ + f'{min(self.usage, 999):6.2f}' out = out.rstrip(':') return out