From patchwork Fri Jan 18 10:31:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10769677 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBD42139A for ; Fri, 18 Jan 2019 10:32:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9EB02DC20 for ; Fri, 18 Jan 2019 10:32:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE7052DC3D; Fri, 18 Jan 2019 10:32:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8005A2DC20 for ; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726927AbfARKbp (ORCPT ); Fri, 18 Jan 2019 05:31:45 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:36870 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726442AbfARKbo (ORCPT ); Fri, 18 Jan 2019 05:31:44 -0500 Received: by mail-wr1-f68.google.com with SMTP id s12so14417442wrt.4; Fri, 18 Jan 2019 02:31:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ydxFy/2Bm4LOh73jyZznaFZU/ozC3N50GLlK25DN1HY=; b=uZF7UkoWqAOS8InqNGoU+Xy56e0FXChmxbtUDcxGtitnfvoq5J69RWhz0T4GLa1XN7 ic2JWQNYvNwQ7FNdzgTmFlYPuyuMYePIvJOTAkt1rq1qv3npOIlJKnEnuduR5WWdHEMU MlEV6xYWBmZqHHmJfgKOCTyUrPl9QLTy0f69SqrgJg39k0nzLx+lxkK10qryRH6si66h SstsLC8pwuIVUh/PIvbFTJRHxY8UMxObgeOZiAcdSLK0QxKEj/cCnxFDfynRiCt2uUpM bEcteokvSKR89Sej9dLKUFm5hBLlwGls8IDK7sEpaqI46nshsh4x2VRpezr1CsIKYPsC DZKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ydxFy/2Bm4LOh73jyZznaFZU/ozC3N50GLlK25DN1HY=; b=no6shRfODKRhls8Nd2L1gll5YLEpOcGXu5HmjXFTxvZcWpmifjPbaMbhnRgUQvVQsE 2S6Kg93f0+ijoBRN+/xYL0jJdGBWcvlZ7D6txByP2nS6WRcppRCosTjylFfCHqvoukog ITsVzC8aEiRGxhT4ebVawalGutPBCtMUdMRL4Mn3p9+T0ENdVP5G6p7sG4DedfppTCgQ PENIe4hRQRu6V/KkQerLsNWxOqyQ0BLrc0TLdpUrV8AaymDtmidCdBAmeiUDyfDKYHTc gPFejT2aLQVSyxSJLIuvFW4lCnfTFbTRpUzoii+dkUa2e7vNVvvxF2cDXMW6k+xG2Cam Tkww== X-Gm-Message-State: AJcUukfGl3/jZlg7Tlbvwokm3iX5H7olqvT8yD2AjjtZbIYIv6g1fs4p b6dbkrbkCR2IGU6L4Ry51g== X-Google-Smtp-Source: ALg8bN6Cqcrjp3e6Cz7WFU23YJq5O2/aUYTSzUeLmqK15QNrqZLmKt7cMABpfofO0GTuS7A/NCqhPg== X-Received: by 2002:adf:ee46:: with SMTP id w6mr16290136wro.261.1547807502038; Fri, 18 Jan 2019 02:31:42 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host89-130-dynamic.43-79-r.retail.telecomitalia.it. [79.43.130.89]) by smtp.gmail.com with ESMTPSA id g9sm39949652wmg.44.2019.01.18.02.31.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Jan 2019 02:31:41 -0800 (PST) From: Andrea Righi To: Tejun Heo , Li Zefan , Johannes Weiner Cc: Jens Axboe , Vivek Goyal , Josef Bacik , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Andrea Righi Subject: [RFC PATCH 1/3] fsio-throttle: documentation Date: Fri, 18 Jan 2019 11:31:25 +0100 Message-Id: <20190118103127.325-2-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190118103127.325-1-righi.andrea@gmail.com> References: <20190118103127.325-1-righi.andrea@gmail.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Document the filesystem I/O controller: description, usage, design, etc. Signed-off-by: Andrea Righi --- Documentation/cgroup-v1/fsio-throttle.txt | 142 ++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 Documentation/cgroup-v1/fsio-throttle.txt diff --git a/Documentation/cgroup-v1/fsio-throttle.txt b/Documentation/cgroup-v1/fsio-throttle.txt new file mode 100644 index 000000000000..4f33cae2adea --- /dev/null +++ b/Documentation/cgroup-v1/fsio-throttle.txt @@ -0,0 +1,142 @@ + + Filesystem I/O throttling controller + +---------------------------------------------------------------------- +1. OVERVIEW + +This controller allows to limit filesystem I/O of mounted devices of specific +process containers (cgroups [1]) enforcing delays to the processes that exceed +the limits defined for their cgroup. + +The goal of the filesystem I/O controller is to improve performance +predictability from applications' point of view and provide performance +isolation of different control groups sharing the same filesystems. + +---------------------------------------------------------------------- +2. DESIGN + +I/O activity generated by READs is evaluated at the block layer, WRITEs are +evaluated when a page changes from clear to dirty (rewriting a page that was +already dirty doesn't generate extra I/O activity). + +Throttling is always performed at the VFS layer. + +This solution has the advantage of always being able to determine the +task/cgroup that originally generated the I/O request and it prevents +filesystem locking contention and potential priority inversion problems +(example: journal I/O being throttled that may slow down the entire system). + +The downside of this solution is that the controller is more fuzzy (compared to +the blkio controller) and it allows I/O bursts that may happen at the I/O +scheduler layer. + +---------------------------------------------------------------------- +2.1. TOKEN BUCKET THROTTLING + + Tokens (I/O rate) - + o + o + o + ....... <--. + \ / | Bucket size (burst limit) + \ooo/ | + --- <--' + |ooo + Incoming --->|---> Conforming + I/O |oo I/O + requests -->|--> requests + | + ---->| + +Token bucket [2] throttling: tokens are added to the bucket +every seconds; the bucket can hold at the most tokens; I/O +requests are accepted if there are available tokens in the bucket; when a +request of N bytes arrives, N tokens are removed from the bucket; if less than +N tokens are available in the bucket, the request is delayed until a sufficient +amount of token is available again in the bucket. + +---------------------------------------------------------------------- +3. USER INTERFACE + +A new I/O limit (in MB/s) can be defined using the file: +- fsio.max_mbs + +The syntax of a throttling policy is the following: + +": " + +Examples: + +- set a maximum I/O rate of 10MB/s on /dev/sda (8:0), bucket size = 10MB: + + # echo "8:0 10 10" > /sys/fs/cgroup/cg1/fsio.max_mbs + +- remove the I/O limit defined for /dev/sda (8:0): + + # echo "8:0 0 0" > /sys/fs/cgroup/cg1/fsio.max_mbs + +---------------------------------------------------------------------- +4. Additional parameters + +---------------------------------------------------------------------- +4.1. Sleep timeslice + +Sleep timeslice is a configurable parameter that allows to decide the minimum +time of sleep to enforce to throttled tasks. Tasks will never be put to sleep +for less than the sleep timeslice. Moreover wakeup timers will be always +aligned to a multiple of the sleep timeslice. + +Increasing the sleep timeslice has the advantage of reducing the overhead of +the controller: with a more coarse-grained control, less timers are created to +wake-up tasks, that means less softirq pressure in the system and less overhead +introduced. However, a bigger sleep timeslice makes the controller more fuzzy +since throttled tasks are going to receive less throttling events with larger +sleeps. + +The parameter can be changed via: +/sys/module/fsio_throttle/parameters/throttle_timeslice_ms + +The default value is 250 ms. + +Example: + - set the sleep timeslice to 1s: + + # echo 1000 > /sys/module/fsio_throttle/parameters/throttle_timeslice_ms + +---------------------------------------------------------------------- +4.2. Sleep timeframe + +This parameter defines maximum time to sleep for a throttled task. + +The parameter can be changed via: +/sys/module/fsio_throttle/parameters/throttle_timeslice_ms + +The default value is 2 sec. + +Example: + - set the sleep timeframe to 5s: + + # echo 5000 > /sys/module/fsio_throttle/parameters/throttle_timeframe_ms + +4.3. Throttle kernel threads + +By default kernel threads are never throttled or accounted for any I/O +activity. It is possible to change this behavior by setting 1 to: + +/sys/module/fsio_throttle/parameters/throttle_kernel_threads + +It is strongly recommended to not change this setting unless you know what you +are doing. + +---------------------------------------------------------------------- +5. TODO + +- Integration with the blkio controller +- Provide distinct read/write limits, as well as MBs vs iops +- Provide additional statistics in cgroupfs + +---------------------------------------------------------------------- +6. REFERENCES + +[1] Documentation/admin-guide/cgroup-v2.rst +[2] http://en.wikipedia.org/wiki/Token_bucket From patchwork Fri Jan 18 10:31:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10769675 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66FEB139A for ; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53FD92DC20 for ; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 479EA2DC3D; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2AE372DC20 for ; Fri, 18 Jan 2019 10:32:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727020AbfARKbr (ORCPT ); Fri, 18 Jan 2019 05:31:47 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:46874 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726876AbfARKbr (ORCPT ); Fri, 18 Jan 2019 05:31:47 -0500 Received: by mail-wr1-f68.google.com with SMTP id l9so14327855wrt.13; Fri, 18 Jan 2019 02:31:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=inHfsq74/pGw38ScfkC1tTFdlvR5n5Lc+bi28/v53+g=; b=IS6VmF5TYk4IJsP/qj3R6nq756RpY00XfgQ0NuhJhCZJSloG4611vya4oLXfOdQGei I8pqqfV+siBd9JNmB8EFr3a7DFNloqkwBDDCplw+VdVq7GDwjf7SlQLlO4v/Bl2gR9ta UzRsN0Pv1ig+BVkNk4kbMXIHfnhYoF+UQL8p7g5ypiKHhv72RX2WGrXMIOoKpWqka1Hn tpfxV2ztyfztrYX1Lr6l0fYc/ZawYGAUYRglhhQxbr5tTCgDYMMiVt6K0ZyG8L4xk4D1 ai1UWKw9cr9iLEXKZHFbfL0P2aexAdK1smaqM/9ob5zysqp0Y6UlZ2ljy9GrMw+Vih/a tEtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=inHfsq74/pGw38ScfkC1tTFdlvR5n5Lc+bi28/v53+g=; b=a8rMK5Ifg8qSmsq2geW4+tYjQ5k4ALjIAnrIOivuAuFBD4FCVhLTSRivz1w+pbi3TG UgxY7lzdQfBnSqE3oJZv8T4APetjY8tDYXIidq3eFOODTlUuwMIXlvd7cdBKW2BPxiFw YddaYx4Dt9+qVWc04Qdwx2nY3ZDpj1ECXd2HV9sAW6YAjNGYrRvtGXKUNoVl6uEB9DDJ tBI9s/MmB7kKEilgcQyLKtt6UQ/ZeN/wl6x8A0WcnqRUpaIUeeitSpuo78kmfqi4576v Cz8nJ2aTbH0QjjiM6C4ip5aSbx177s/DYbOX9H+fj5fkLU6rC1i2uVP2XA8ofv63r9Ad tOqA== X-Gm-Message-State: AJcUukf4gFJVOCQh9KxLa+grY/AnwXUwaW+W2/eGqTVAWXIvxqXnjG7N GZdhRur0/QFqMR4EqxdYmw== X-Google-Smtp-Source: ALg8bN5QT4MSFiwpvNEnYyXIUkgR4XQmNf7LbBSBhIXJeIMgzVvga2+Ok06W8l2rXUjm+kWlhOJYgg== X-Received: by 2002:a5d:68c3:: with SMTP id p3mr16363946wrw.34.1547807503314; Fri, 18 Jan 2019 02:31:43 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host89-130-dynamic.43-79-r.retail.telecomitalia.it. [79.43.130.89]) by smtp.gmail.com with ESMTPSA id g9sm39949652wmg.44.2019.01.18.02.31.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Jan 2019 02:31:42 -0800 (PST) From: Andrea Righi To: Tejun Heo , Li Zefan , Johannes Weiner Cc: Jens Axboe , Vivek Goyal , Josef Bacik , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Andrea Righi Subject: [RFC PATCH 2/3] fsio-throttle: controller infrastructure Date: Fri, 18 Jan 2019 11:31:26 +0100 Message-Id: <20190118103127.325-3-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190118103127.325-1-righi.andrea@gmail.com> References: <20190118103127.325-1-righi.andrea@gmail.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is the core of the fsio-throttle controller: it defines the interface to the cgroup subsystem and implements the I/O measurement and throttling logic. Signed-off-by: Andrea Righi --- include/linux/cgroup_subsys.h | 4 + include/linux/fsio-throttle.h | 43 +++ init/Kconfig | 11 + kernel/cgroup/Makefile | 1 + kernel/cgroup/fsio-throttle.c | 501 ++++++++++++++++++++++++++++++++++ 5 files changed, 560 insertions(+) create mode 100644 include/linux/fsio-throttle.h create mode 100644 kernel/cgroup/fsio-throttle.c diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index acb77dcff3b4..33beb70c0eca 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -61,6 +61,10 @@ SUBSYS(pids) SUBSYS(rdma) #endif +#if IS_ENABLED(CONFIG_CGROUP_FSIO_THROTTLE) +SUBSYS(fsio) +#endif + /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/include/linux/fsio-throttle.h b/include/linux/fsio-throttle.h new file mode 100644 index 000000000000..3a46df712475 --- /dev/null +++ b/include/linux/fsio-throttle.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __FSIO_THROTTLE_H__ +#define __FSIO_THROTTLE_H__ + +#include +#include + +#ifdef CONFIG_BLOCK +static inline dev_t bdev_to_dev(struct block_device *bdev) +{ + return bdev ? MKDEV(MAJOR(bdev->bd_inode->i_rdev), + bdev->bd_disk->first_minor) : 0; +} + +static inline struct block_device *as_to_bdev(struct address_space *mapping) +{ + return (mapping->host && mapping->host->i_sb->s_bdev) ? + mapping->host->i_sb->s_bdev : NULL; +} +#else /* CONFIG_BLOCK */ +static dev_t bdev_to_dev(struct block_device *bdev) +{ + return 0; +} + +static inline struct block_device *as_to_bdev(struct address_space *mapping) +{ + return NULL; +} +#endif /* CONFIG_BLOCK */ + +#ifdef CONFIG_CGROUP_FSIO_THROTTLE +int fsio_throttle(dev_t dev, ssize_t bytes, int state); +#else /* CONFIG_CGROUP_FSIO_THROTTLE */ +static inline int +fsio_throttle(dev_t dev, ssize_t bytes, int state) +{ + return 0; +} +#endif /* CONFIG_CGROUP_FSIO_THROTTLE */ + +#endif /* __FSIO_THROTTLE_H__ */ diff --git a/init/Kconfig b/init/Kconfig index d47cb77a220e..95d7342801eb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -775,6 +775,17 @@ config CGROUP_WRITEBACK depends on MEMCG && BLK_CGROUP default y +config CGROUP_FSIO_THROTTLE + bool "Filesystem I/O throttling controller" + default n + depends on BLOCK + help + This option enables filesystem I/O throttling infrastructure. + + This allows to properly throttle reads and writes at the filesystem + level, without introducing I/O locking contentions or priority + inversion problems. + menuconfig CGROUP_SCHED bool "CPU controller" default n diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile index bfcdae896122..12de828b36cd 100644 --- a/kernel/cgroup/Makefile +++ b/kernel/cgroup/Makefile @@ -2,6 +2,7 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o obj-$(CONFIG_CGROUP_FREEZER) += freezer.o +obj-$(CONFIG_CGROUP_FSIO_THROTTLE) += fsio-throttle.o obj-$(CONFIG_CGROUP_PIDS) += pids.o obj-$(CONFIG_CGROUP_RDMA) += rdma.o obj-$(CONFIG_CPUSETS) += cpuset.o diff --git a/kernel/cgroup/fsio-throttle.c b/kernel/cgroup/fsio-throttle.c new file mode 100644 index 000000000000..46f3ffd4015b --- /dev/null +++ b/kernel/cgroup/fsio-throttle.c @@ -0,0 +1,501 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * fsio-throttle.c - I/O cgroup controller + * + * Copyright (C) 2019 Andrea Righi + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define KB(x) ((x) * 1024) +#define MB(x) (KB(KB(x))) +#define GB(x) (MB(KB(x))) + +static int throttle_kernel_threads __read_mostly; +module_param(throttle_kernel_threads, int, 0644); +MODULE_PARM_DESC(throttle_kernel_threads, + "enable/disable I/O throttling for kernel threads"); + +static int throttle_timeslice_ms __read_mostly = 250; +module_param(throttle_timeslice_ms, int, 0644); +MODULE_PARM_DESC(throttle_kernel_threads, + "throttling time slice (default 250ms)"); + +static int throttle_timeframe_ms __read_mostly = 2000; +module_param(throttle_timeframe_ms, int, 0644); +MODULE_PARM_DESC(throttle_kernel_threads, + "maximum sleep time enforced (default 2000ms)"); + +struct iothrottle { + struct cgroup_subsys_state css; + struct list_head list; + /* protect the list of iothrottle_node elements (list) */ + struct mutex lock; + wait_queue_head_t wait; + struct timer_list timer; + bool timer_cancel; + /* protect the wait queue elements */ + spinlock_t wait_lock; +}; + +struct iothrottle_limit { + unsigned long long usage; + unsigned long long bucket_size; + unsigned long long limit; + unsigned long long timestamp; + /* protect all of the above */ + spinlock_t lock; +}; + +struct iothrottle_node { + struct list_head node; + struct rcu_head rcu; + struct iothrottle_limit bw; + dev_t dev; +}; + +static inline bool iothrottle_disabled(void) +{ + return !cgroup_subsys_enabled(fsio_cgrp_subsys); +} + +static struct iothrottle *css_to_iothrottle(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct iothrottle, css) : NULL; +} + +struct iothrottle *task_to_iothrottle(struct task_struct *p) +{ + if (unlikely(!p)) + return NULL; + return css_to_iothrottle(task_css(p, fsio_cgrp_id)); +} + +static inline unsigned long long +iothrottle_limit_delta_t(struct iothrottle_limit *res) +{ + return (long long)get_jiffies_64() - (long long)res->timestamp; +} + +static void iothrottle_limit_init(struct iothrottle_limit *res, + unsigned long long limit, + unsigned long long bucket_size) +{ + spin_lock_init(&res->lock); + res->limit = limit; + res->usage = 0; + res->bucket_size = bucket_size; + res->timestamp = get_jiffies_64(); +} + +static unsigned long long +iothrottle_limit_sleep(struct iothrottle_limit *res, unsigned long long size) +{ + unsigned long long delta; + long long tok; + unsigned long flags; + + spin_lock_irqsave(&res->lock, flags); + res->usage -= size; + delta = jiffies_to_msecs(iothrottle_limit_delta_t(res)); + res->timestamp = get_jiffies_64(); + tok = (long long)res->usage * MSEC_PER_SEC; + if (delta) { + long long max = (long long)res->bucket_size * MSEC_PER_SEC; + + tok += delta * res->limit; + tok = min_t(long long, tok, max); + res->usage = (unsigned long long)div_s64(tok, MSEC_PER_SEC); + } + spin_unlock_irqrestore(&res->lock, flags); + + return (tok < 0) ? msecs_to_jiffies(div_u64(-tok, res->limit)) : 0; +} + +static void iothrottle_limit_reset(struct iothrottle_limit *res) +{ + unsigned long flags; + + spin_lock_irqsave(&res->lock, flags); + res->usage = 0; + spin_unlock_irqrestore(&res->lock, flags); +} + +static inline int iothrottle_node_size(void) +{ + return sizeof(struct iothrottle_node); +} + +static struct iothrottle_node *iothrottle_node_alloc(gfp_t flags) +{ + struct iothrottle_node *n; + int size = iothrottle_node_size(); + + if (size < PAGE_SIZE) + n = kmalloc(size, flags); + else + n = vmalloc(size); + if (n) + memset(n, 0, size); + return n; +} + +static void iothrottle_node_free(struct iothrottle_node *n) +{ + if (iothrottle_node_size() < PAGE_SIZE) + kfree(n); + else + vfree(n); +} + +static struct iothrottle_node * +iothrottle_node_search(const struct iothrottle *iot, dev_t dev) +{ + struct iothrottle_node *n; + + list_for_each_entry_rcu(n, &iot->list, node) + if (n->dev == dev) + return n; + return NULL; +} + +static void iothrottle_node_reclaim(struct rcu_head *rp) +{ + struct iothrottle_node *n; + + n = container_of(rp, struct iothrottle_node, rcu); + iothrottle_node_free(n); +} + +static int iothrottle_parse_args(char *buf, size_t nbytes, + dev_t *dev, + unsigned long long *io_limit, + unsigned long long *bucket_size) +{ + struct gendisk *disk; + unsigned int major, minor; + unsigned long long limit, size; + int part, ret = 0; + + if (sscanf(buf, "%u:%u %llu %llu", &major, &minor, &limit, &size) != 4) + return -EINVAL; + disk = get_gendisk(MKDEV(major, minor), &part); + if (!disk) + return -ENODEV; + if (part) { + ret = -ENODEV; + goto out; + } + *dev = MKDEV(major, minor); + *io_limit = MB(limit); + *bucket_size = MB(size); +out: + put_disk_and_module(disk); + + return ret; +} + +static ssize_t iothrottle_write(struct kernfs_open_file *of, + char *buffer, size_t nbytes, loff_t off) +{ + struct iothrottle *iot; + struct iothrottle_node *n, *newn = NULL; + unsigned long long io_limit, bucket_size; + dev_t dev; + char *buf; + int ret; + + /* + * We need to allocate a new buffer here, because + * iothrottle_parse_args() can modify it and the buffer provided by + * write_string is supposed to be const. + */ + buf = kmalloc(nbytes + 1, GFP_KERNEL); + if (!buf) + return -ENOMEM; + memcpy(buf, buffer, nbytes + 1); + + ret = iothrottle_parse_args(buf, nbytes, &dev, &io_limit, &bucket_size); + if (ret) + goto out_free; + + newn = iothrottle_node_alloc(GFP_KERNEL); + if (!newn) { + ret = -ENOMEM; + goto out_free; + } + newn->dev = dev; + iothrottle_limit_init(&newn->bw, io_limit, bucket_size); + + iot = css_to_iothrottle(of_css(of)); + if (unlikely(!iot)) { + WARN_ON_ONCE(1); + goto out_free; + } + mutex_lock(&iot->lock); + n = iothrottle_node_search(iot, dev); + if (!n) { + /* Insert new node */ + if (io_limit) { + list_add_rcu(&newn->node, &iot->list); + newn = NULL; + } + } else if (!io_limit) { + /* Delete existing node */ + list_del_rcu(&n->node); + } else { + /* Update existing node */ + list_replace_rcu(&n->node, &newn->node); + newn = NULL; + } + mutex_unlock(&iot->lock); + if (n) + call_rcu(&n->rcu, iothrottle_node_reclaim); + ret = nbytes; +out_free: + if (newn) + iothrottle_node_free(newn); + kfree(buf); + return ret; +} + +static void iothrottle_show_limit(struct seq_file *m, + dev_t dev, struct iothrottle_limit *res) +{ + seq_put_decimal_ull(m, "", MAJOR(dev)); + seq_put_decimal_ull(m, ":", MINOR(dev)); + seq_put_decimal_ull(m, " ", res->limit); + seq_put_decimal_ull(m, " ", res->usage); + seq_put_decimal_ull(m, " ", res->bucket_size); + seq_put_decimal_ull(m, " ", + jiffies_to_clock_t(iothrottle_limit_delta_t(res))); + seq_putc(m, '\n'); +} + +static int iothrottle_read(struct seq_file *m, void *v) +{ + struct iothrottle *iot = css_to_iothrottle(seq_css(m)); + struct iothrottle_node *n; + + rcu_read_lock(); + list_for_each_entry_rcu(n, &iot->list, node) + iothrottle_show_limit(m, n->dev, &n->bw); + rcu_read_unlock(); + + return 0; +} + +static struct cftype iothrottle_files[] = { + { + .name = "max_mbs", + .seq_show = iothrottle_read, + .write = iothrottle_write, + .flags = CFTYPE_NOT_ON_ROOT, + }, +}; + +static void iothrottle_wakeup(struct iothrottle *iot, bool timer_cancel) +{ + spin_lock_bh(&iot->wait_lock); + if (timer_cancel) + iot->timer_cancel = true; + wake_up_all(&iot->wait); + spin_unlock_bh(&iot->wait_lock); +} + +static void iothrottle_timer_wakeup(struct timer_list *t) +{ + struct iothrottle *iot = from_timer(iot, t, timer); + + iothrottle_wakeup(iot, false); +} + +static struct cgroup_subsys_state * +iothrottle_css_alloc(struct cgroup_subsys_state *parent) +{ + struct iothrottle *iot; + + iot = kzalloc(sizeof(*iot), GFP_KERNEL); + if (!iot) + return ERR_PTR(-ENOMEM); + INIT_LIST_HEAD(&iot->list); + mutex_init(&iot->lock); + init_waitqueue_head(&iot->wait); + spin_lock_init(&iot->wait_lock); + iot->timer_cancel = false; + timer_setup(&iot->timer, iothrottle_timer_wakeup, 0); + + return &iot->css; +} + +static void iothrottle_css_offline(struct cgroup_subsys_state *css) +{ + struct iothrottle *iot = css_to_iothrottle(css); + + spin_lock_bh(&iot->wait_lock); + iot->timer_cancel = true; + spin_unlock_bh(&iot->wait_lock); + + iothrottle_wakeup(iot, true); +} + +static void iothrottle_css_free(struct cgroup_subsys_state *css) +{ + struct iothrottle_node *n, *p; + struct iothrottle *iot = css_to_iothrottle(css); + + del_timer_sync(&iot->timer); + /* + * don't worry about locking here, at this point there's no reference + * to the list. + */ + list_for_each_entry_safe(n, p, &iot->list, node) + iothrottle_node_free(n); + kfree(iot); +} + +static inline bool is_kernel_thread(void) +{ + return !!(current->flags & (PF_KTHREAD | PF_KSWAPD)); +} + +static inline bool is_urgent_task(void) +{ + /* Never throttle tasks that are going to exit */ + if (current->flags & PF_EXITING) + return true; + /* Throttle kernel threads only if throttle_kernel_threads is set */ + return is_kernel_thread() && !throttle_kernel_threads; +} + +static struct iothrottle *try_get_iothrottle_from_task(struct task_struct *p) +{ + struct iothrottle *iot = NULL; + + rcu_read_lock(); + if (!task_css_is_root(p, fsio_cgrp_id)) { + do { + iot = task_to_iothrottle(p); + if (unlikely(!iot)) + break; + } while (!css_tryget_online(&iot->css)); + } + rcu_read_unlock(); + + return iot; +} + +static int iothrottle_evaluate_sleep(struct iothrottle *iot, dev_t dev, + ssize_t bytes, int state) +{ + struct iothrottle_node *n; + unsigned long long sleep = 0; + + rcu_read_lock(); + n = iothrottle_node_search(iot, dev); + if (n) { + sleep = iothrottle_limit_sleep(&n->bw, bytes); + /* + * state == 0 is used to do only I/O accounting without + * enforcing sleeps. + */ + if (!state || sleep < msecs_to_jiffies(throttle_timeslice_ms)) + sleep = 0; + if (sleep) + iothrottle_limit_reset(&n->bw); + } + rcu_read_unlock(); + + return sleep; +} + +static noinline void iothrottle_force_sleep(struct iothrottle *iot, + unsigned long long sleep, + int state) +{ + unsigned long expire, now; + + /* + * Allow small IO bursts, by waking up the throttled task after a + * maximum sleep of throttle_timeframe millisec. + */ + if (sleep > msecs_to_jiffies(throttle_timeframe_ms)) + sleep = msecs_to_jiffies(throttle_timeframe_ms); + + now = READ_ONCE(jiffies); + expire = now + sleep; + + /* + * Round up the time to sleep to a multiple of the sleep timeslice. + * + * In this way we can strongly reduce timer softirqs and + * context switches in the system even when there are a lot of + * different cgroups. + */ + expire = roundup(expire, msecs_to_jiffies(throttle_timeslice_ms)); + + /* Force sleep */ + do { + DEFINE_WAIT(wait); + + spin_lock_bh(&iot->wait_lock); + if (unlikely(iot->timer_cancel)) { + spin_unlock_bh(&iot->wait_lock); + break; + } + mod_timer(&iot->timer, expire); + spin_unlock_bh(&iot->wait_lock); + + /* + * Do not enforce interruptible sleep if there are pending + * signals, otherwise we'll end up into a busy loop. + */ + if (signal_pending(current)) + state = TASK_KILLABLE; + + /* Send to sleep */ + prepare_to_wait(&iot->wait, &wait, state); + schedule(); + finish_wait(&iot->wait, &wait); + } while (!fatal_signal_pending(current) && + time_is_after_jiffies(expire)); +} + +int fsio_throttle(dev_t dev, ssize_t bytes, int state) +{ + struct iothrottle *iot; + unsigned long long sleep = 0; + + if (iothrottle_disabled() || is_urgent_task()) + return 0; + if (!dev) + return 0; + iot = try_get_iothrottle_from_task(current); + if (!iot) + return 0; + sleep = iothrottle_evaluate_sleep(iot, dev, bytes, state); + if (unlikely(sleep)) + iothrottle_force_sleep(iot, sleep, state); + css_put(&iot->css); + + return sleep; +} + +struct cgroup_subsys fsio_cgrp_subsys = { + .css_alloc = iothrottle_css_alloc, + .css_free = iothrottle_css_free, + .css_offline = iothrottle_css_offline, + .dfl_cftypes = iothrottle_files, +}; From patchwork Fri Jan 18 10:31:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10769673 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 341451580 for ; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E7A52DC35 for ; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 121922DC43; Fri, 18 Jan 2019 10:32:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A0462DC35 for ; Fri, 18 Jan 2019 10:32:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727128AbfARKbz (ORCPT ); Fri, 18 Jan 2019 05:31:55 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:45031 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726442AbfARKbr (ORCPT ); Fri, 18 Jan 2019 05:31:47 -0500 Received: by mail-wr1-f67.google.com with SMTP id z5so14366169wrt.11; Fri, 18 Jan 2019 02:31:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bJZKyQhLGckyRl2vzdSeAHUxaeeCyICyHV3vsVcaZAA=; b=d5qrVfAnwhDEn+LzYX2TWatO8uiaaxy6O0cBzq+14zoPcHkCGIBBhHCk3xDxdN+wO5 y1quHDCeReugFONlqY9U2oIBpR/lb3JY+aHzY9n9/2M552/QjsQTOibPdGTC7ueCj4uK bRtf7SfmM/4KEGY15qvgh8Jj8wPOJL2CbfTfcxgdkWyQOgjWNxDtBPHJAchCGDVGSwGy ZpWrZ2a3w8/T7Ep9ya+c9WNQF3uE4A1v1dWF9Y34kCtDky5wRfSKZ7En743NC08HNV75 gXb5HxZ2y9gFJuZtYSer2c3E+kuHvVRvuXJLEint2oBiRMJ149H15Bu4mTU1fUkmft16 S52g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=bJZKyQhLGckyRl2vzdSeAHUxaeeCyICyHV3vsVcaZAA=; b=hKNjWxen3WFW2kQoDLDdAj+ktQk6SZOv1yilFG72APM98u8bGce0aHI2Arq7WgQESB 5BO4saFALrr5wSZErm3hzkDGrGlmtLnhyMe0ala/cKtLHMyD7XJ+G9iu2qBhNktX8jDH uEORNR4DgCd1BhItdsevB7TrwjJdZp13lA6X6JO1miwkhtutBwUpifAAOpT5d7jnfiQm gszNH2EaJzo3Yjh8FsSZmlWwE9D5j99JbihEwO/2OPJsnyJHsmI5RtqdZlTrdCH0QKRE lLNGmnF6uj/1GJ8bWzVmhVrlsifKYxdDvQmJlKvP05aDwMGBVV5vvZGw6ZR+Xb1gPeYK xTSQ== X-Gm-Message-State: AJcUukfMjfKeMhtMcZ7V4S6PMW4iLFYQ+Z8iEhb0dPJd35NSIHqEe7nt TTa/Vz19A/4Zq30m41U9/oni8/ZU5g== X-Google-Smtp-Source: ALg8bN6+dR8Gm8adN2uvkLNU3TTIY3oILo8+c+B5cOsI1VgTDmM/IdpNYfpy5fdjIMFX05KLyQoYvg== X-Received: by 2002:adf:fc51:: with SMTP id e17mr15561451wrs.268.1547807504711; Fri, 18 Jan 2019 02:31:44 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host89-130-dynamic.43-79-r.retail.telecomitalia.it. [79.43.130.89]) by smtp.gmail.com with ESMTPSA id g9sm39949652wmg.44.2019.01.18.02.31.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Jan 2019 02:31:44 -0800 (PST) From: Andrea Righi To: Tejun Heo , Li Zefan , Johannes Weiner Cc: Jens Axboe , Vivek Goyal , Josef Bacik , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Andrea Righi Subject: [RFC PATCH 3/3] fsio-throttle: instrumentation Date: Fri, 18 Jan 2019 11:31:27 +0100 Message-Id: <20190118103127.325-4-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190118103127.325-1-righi.andrea@gmail.com> References: <20190118103127.325-1-righi.andrea@gmail.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Apply the fsio controller to the opportune kernel functions to evaluate and throttle filesystem I/O. Signed-off-by: Andrea Righi --- block/blk-core.c | 10 ++++++++++ include/linux/writeback.h | 7 ++++++- mm/filemap.c | 20 +++++++++++++++++++- mm/page-writeback.c | 14 ++++++++++++-- 4 files changed, 47 insertions(+), 4 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c5f61ceeb67..4b4717f64ac1 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -956,6 +957,15 @@ generic_make_request_checks(struct bio *bio) */ create_io_context(GFP_ATOMIC, q->node); + /* + * Account only READs at this layer (WRITEs are accounted and throttled + * in balance_dirty_pages()) and don't enfore sleeps (state=0): in this + * way we can prevent potential lock contentions and priority inversion + * problems at the filesystem layer. + */ + if (bio_op(bio) == REQ_OP_READ) + fsio_throttle(bio_dev(bio), bio->bi_iter.bi_size, 0); + if (!blkcg_bio_issue_check(q, bio)) return false; diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 738a0c24874f..1e161c7969e5 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -356,7 +356,12 @@ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty); unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh); void wb_update_bandwidth(struct bdi_writeback *wb, unsigned long start_time); -void balance_dirty_pages_ratelimited(struct address_space *mapping); + +#define balance_dirty_pages_ratelimited(__mapping) \ + __balance_dirty_pages_ratelimited(__mapping, false) +void __balance_dirty_pages_ratelimited(struct address_space *mapping, + bool redirty); + bool wb_over_bg_thresh(struct bdi_writeback *wb); typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc, diff --git a/mm/filemap.c b/mm/filemap.c index 9f5e323e883e..5cc0959274d6 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -2040,6 +2041,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; + struct block_device *bdev = as_to_bdev(mapping); struct inode *inode = mapping->host; struct file_ra_state *ra = &filp->f_ra; loff_t *ppos = &iocb->ki_pos; @@ -2068,6 +2070,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, cond_resched(); find_page: + fsio_throttle(bdev_to_dev(bdev), 0, TASK_INTERRUPTIBLE); if (fatal_signal_pending(current)) { error = -EINTR; goto out; @@ -2308,11 +2311,17 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) if (iocb->ki_flags & IOCB_DIRECT) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; + struct block_device *bdev = as_to_bdev(mapping); struct inode *inode = mapping->host; loff_t size; size = i_size_read(inode); if (iocb->ki_flags & IOCB_NOWAIT) { + unsigned long long sleep; + + sleep = fsio_throttle(bdev_to_dev(bdev), 0, 0); + if (sleep) + return -EAGAIN; if (filemap_range_has_page(mapping, iocb->ki_pos, iocb->ki_pos + count - 1)) return -EAGAIN; @@ -2322,6 +2331,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) iocb->ki_pos + count - 1); if (retval < 0) goto out; + fsio_throttle(bdev_to_dev(bdev), 0, TASK_INTERRUPTIBLE); } file_accessed(file); @@ -2366,9 +2376,11 @@ EXPORT_SYMBOL(generic_file_read_iter); static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) { struct address_space *mapping = file->f_mapping; + struct block_device *bdev = as_to_bdev(mapping); struct page *page; int ret; + fsio_throttle(bdev_to_dev(bdev), 0, TASK_INTERRUPTIBLE); do { page = __page_cache_alloc(gfp_mask); if (!page) @@ -2498,11 +2510,15 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) */ page = find_get_page(mapping, offset); if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) { + struct block_device *bdev = as_to_bdev(mapping); /* * We found the page, so try async readahead before * waiting for the lock. */ do_async_mmap_readahead(vmf->vma, ra, file, page, offset); + if (unlikely(!PageUptodate(page))) + fsio_throttle(bdev_to_dev(bdev), 0, + TASK_INTERRUPTIBLE); } else if (!page) { /* No page in the page cache at all */ do_sync_mmap_readahead(vmf->vma, ra, file, offset); @@ -3172,6 +3188,7 @@ ssize_t generic_perform_write(struct file *file, long status = 0; ssize_t written = 0; unsigned int flags = 0; + unsigned int dirty; do { struct page *page; @@ -3216,6 +3233,7 @@ ssize_t generic_perform_write(struct file *file, copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); flush_dcache_page(page); + dirty = PageDirty(page); status = a_ops->write_end(file, mapping, pos, bytes, copied, page, fsdata); if (unlikely(status < 0)) @@ -3241,7 +3259,7 @@ ssize_t generic_perform_write(struct file *file, pos += copied; written += copied; - balance_dirty_pages_ratelimited(mapping); + __balance_dirty_pages_ratelimited(mapping, dirty); } while (iov_iter_count(i)); return written ? written : status; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 7d1010453fb9..694ede8783f3 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1858,10 +1859,12 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0; * limit we decrease the ratelimiting by a lot, to prevent individual processes * from overshooting the limit by (ratelimit_pages) each. */ -void balance_dirty_pages_ratelimited(struct address_space *mapping) +void __balance_dirty_pages_ratelimited(struct address_space *mapping, + bool redirty) { struct inode *inode = mapping->host; struct backing_dev_info *bdi = inode_to_bdi(inode); + struct block_device *bdev = as_to_bdev(mapping); struct bdi_writeback *wb = NULL; int ratelimit; int *p; @@ -1878,6 +1881,13 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping) if (wb->dirty_exceeded) ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10)); + /* + * Throttle filesystem I/O only if page was initially clean: re-writing + * a dirty page doesn't generate additional I/O. + */ + if (!redirty) + fsio_throttle(bdev_to_dev(bdev), PAGE_SIZE, TASK_KILLABLE); + preempt_disable(); /* * This prevents one CPU to accumulate too many dirtied pages without @@ -1911,7 +1921,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping) wb_put(wb); } -EXPORT_SYMBOL(balance_dirty_pages_ratelimited); +EXPORT_SYMBOL(__balance_dirty_pages_ratelimited); /** * wb_over_bg_thresh - does @wb need to be written back?