From patchwork Tue Feb 19 15:27:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10820031 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5CB9813B5 for ; Tue, 19 Feb 2019 15:27:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 490AE28B7A for ; Tue, 19 Feb 2019 15:27:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3CEBC299EA; Tue, 19 Feb 2019 15:27:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58F2C28B7A for ; Tue, 19 Feb 2019 15:27:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DE3D8E0004; Tue, 19 Feb 2019 10:27:45 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 365C48E0002; Tue, 19 Feb 2019 10:27:45 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 207678E0004; Tue, 19 Feb 2019 10:27:45 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by kanga.kvack.org (Postfix) with ESMTP id B41DE8E0002 for ; Tue, 19 Feb 2019 10:27:44 -0500 (EST) Received: by mail-wr1-f70.google.com with SMTP id l5so9372859wrv.19 for ; Tue, 19 Feb 2019 07:27:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=aLP28lQlfvEaj5P3lTxct6bkK1WmE36USMTehJ7l8S0=; b=dX6Jn4Jc9fFjDc6Zwboz8N71Gu0dBUfFgtSVFFnLCJdyaF/l8OTas4rjQ1oZJOiMt5 A85luIy+pt/ribLnrKOrMngCth0Lv4KZOWdpHkXFLoP33RjpZ+lHPhNPjW4oL+R/KUki GmZiKLD0tBn3oNza1QRfrfis7wiRCb3hU1Jl9PmgLR7yWEt29L4kqGoAtV+ftWbV6tnj h+0eNv6Hetok77f1BoFQvw3xyv+3dIJL8o6gnijMLB2fCgYD67KAQ29FpVzs1osDKg+y vnknS4690Z8U4WKl5edCW8fdqs/CiZQFNFLstfqn6+AsA0/vpHCAPim5bJ2zJtAPWB8r zYtA== X-Gm-Message-State: AHQUAuaJgkDedOlZ02xteqqm2talGVq2XmUvlpOtSx6fdnLJ2piG2YCF gpnJgjyBsBcnRUmvRNmLGqhzLOtPylyOTdffP5wPHD4bGLny55r8OgqPV6S0uzL6BMS9iyXuf3p vMTSkrpRWKG5+kYd+vPfJdQh+DIzJRThjyR9MWX6oMeaQPEF6ccS+Ew9A/HV2fPCp3I0eCtqfkf +vcNy6uc2j2BGwY9shtoFI7sL8qVB05WEDhQ4o4hbK1XaaHWpYA8IJOO3AmhYvir9sgnDDFoLoc 3MlQ3rCuZ9Iql5ycxf4BGgzGtq/U6ocvuKBlN1mI3bblVKFNKhFaFtVGYYI+ViXlA6cGLoyENWU YYk/B3RuwJ1QqPJaj/EQ7d3nRDqrjim1ou2vgWZj5gDmxQq7NsG3CAd/BAbV8q7zTCRHJQEmA6+ i X-Received: by 2002:a7b:c00f:: with SMTP id c15mr3271048wmb.14.1550590064199; Tue, 19 Feb 2019 07:27:44 -0800 (PST) X-Received: by 2002:a7b:c00f:: with SMTP id c15mr3270982wmb.14.1550590062916; Tue, 19 Feb 2019 07:27:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550590062; cv=none; d=google.com; s=arc-20160816; b=q0PfpI64rVrMNLkiSfR86u88zaaYcc47o9cCbM4yHcIBqZ4rnpI4E+uZz+5LsV778s YUtGHJIwBk5aHDYK1NqX5GxMiATD5i4Iw93A8HoERsIkemcvDy0gjBlhJ0/WpRjOo42n j2j6Os1h4Gczl5RAu09DE0zlVx6p1ZNYYdP9jv5Sq+iESEldNDpqtkDU2JwhE3V2rJ6D mTou/bVbUMc572At0uEUkH5MjeiexKAjlCCuvUNwMFjTI65qKiyYoD37hSKa5/bc8xJv gZezbvR7XObh3nPaBPb/IxHULVn3zIS7b/dB8KoMkjyAKhkuII92Z7Jmw4b9eutmeMEq AZug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aLP28lQlfvEaj5P3lTxct6bkK1WmE36USMTehJ7l8S0=; b=OjlWr+W5Kp0CdO3W0cB9TXQdkXIgDnnX4YtczNe1OxR07eUSOSc0WeY+B8CLkgEUab cf9Z43bEHx2ij90tBF64XXtx8udgVmL/yZ1xHzUxS3kKTDFSug1vmewoRlJUe+rSvtr2 s48pREU926cqz+aGy3Pe4ElQ4Ej190KnKGMoqa4tMgD3/HgzrO2PFDjqwa5mytAzs6mJ /WXAz7mNFyD+Yrh+FNp9kDbdd1D6s0imEEKcISQerIWlWNs8ueL8Zx5bwlsIaRGliKAM IFYRhXNYM3AkXHb+gcZNA6lcHolM7oq03XjzMmrfdun99r++c0vl2QjjlK8uHHGWHbjt 5U0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RNxlnxyj; spf=pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=righi.andrea@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id g13sor11155082wrq.7.2019.02.19.07.27.42 for (Google Transport Security); Tue, 19 Feb 2019 07:27:42 -0800 (PST) Received-SPF: pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RNxlnxyj; spf=pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=righi.andrea@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aLP28lQlfvEaj5P3lTxct6bkK1WmE36USMTehJ7l8S0=; b=RNxlnxyjZtrg+yoazOslRL0f8IqYDEOH6oa6Q9YE0VVpT2JKBSumhyXCv7JTOfIVCJ PMb7RLbw1++BkSYzD0hZTW4WrOK7rRb+1mdxXhiwpli1bmaphZuU9HrhgjRrHUGXMuhh KziJfX4fO5IJVPy+NL0FfxQ6WftWqijXsQZQeYK0T4P34oaNCloUy5VzPVZvNmhpBUqT h47Q3GxpupqK5eK2ltM6Fyo3eU7SGA/0ytwonNDNp8tiJg4kzKwvJbMZonG5fg97ANGE fruzVKQBdo7nTs+4cqxwHvlmVVgIaTySixjUX8llkoxDZRCOnBPkyZQJIHHj3CCViZ3R 6KnQ== X-Google-Smtp-Source: AHgI3IbjuIduBGr6mwILzL8HCjCozk4cl5g64v4dZ/Lk7Y41wwIt/QjSC9z0rkisQSwGionyCA9OpA== X-Received: by 2002:adf:8273:: with SMTP id 106mr21900010wrb.34.1550590062342; Tue, 19 Feb 2019 07:27:42 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host117-125-dynamic.33-79-r.retail.telecomitalia.it. [79.33.125.117]) by smtp.gmail.com with ESMTPSA id v6sm29029503wrd.88.2019.02.19.07.27.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Feb 2019 07:27:41 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/3] blkcg: prevent priority inversion problem during sync() Date: Tue, 19 Feb 2019 16:27:10 +0100 Message-Id: <20190219152712.9855-2-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190219152712.9855-1-righi.andrea@gmail.com> References: <20190219152712.9855-1-righi.andrea@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prevent priority inversion problem when a high-priority blkcg issues a sync() and it is forced to wait the completion of all the writeback I/O generated by any other low-priority blkcg, causing massive latencies to processes that shouldn't be I/O-throttled at all. The idea is to save a list of blkcg's that are waiting for writeback: every time a sync() is executed the current blkcg is added to the list. Then, when I/O is throttled, if there's a blkcg waiting for writeback different than the current blkcg, no throttling is applied (we can probably refine this logic later, i.e., a better policy could be to adjust the throttling I/O rate using the blkcg with the highest speed from the list of waiters - priority inheritance, kinda). Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 73 ++++++++++++++++++++++++++++++++ block/blk-throttle.c | 11 +++-- fs/fs-writeback.c | 5 +++ fs/sync.c | 8 +++- include/linux/backing-dev-defs.h | 2 + include/linux/blk-cgroup.h | 23 ++++++++++ mm/backing-dev.c | 2 + 7 files changed, 120 insertions(+), 4 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 2bed5725aa03..fb3c39eadf92 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1351,6 +1351,79 @@ struct cgroup_subsys io_cgrp_subsys = { }; EXPORT_SYMBOL_GPL(io_cgrp_subsys); +#ifdef CONFIG_CGROUP_WRITEBACK +/** + * blkcg_wb_waiters_on_bdi - check for writeback waiters on a block device + * @blkcg: current blkcg cgroup + * @bdi: block device to check + * + * Return true if any other blkcg is waiting for writeback on the target block + * device, false otherwise. + */ +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi) +{ + struct blkcg *wait_blkcg; + bool ret = false; + + if (unlikely(!bdi)) + return false; + + rcu_read_lock(); + list_for_each_entry_rcu(wait_blkcg, &bdi->cgwb_waiters, cgwb_wait_node) + if (wait_blkcg != blkcg) { + ret = true; + break; + } + rcu_read_unlock(); + + return ret; +} + +/** + * blkcg_start_wb_wait_on_bdi - add current blkcg to writeback waiters list + * @bdi: target block device + * + * Add current blkcg to the list of writeback waiters on target block device. + */ +void blkcg_start_wb_wait_on_bdi(struct backing_dev_info *bdi) +{ + struct blkcg *blkcg; + + rcu_read_lock(); + blkcg = blkcg_from_current(); + if (blkcg) { + css_get(&blkcg->css); + spin_lock(&bdi->cgwb_waiters_lock); + list_add_rcu(&blkcg->cgwb_wait_node, &bdi->cgwb_waiters); + spin_unlock(&bdi->cgwb_waiters_lock); + } + rcu_read_unlock(); +} + +/** + * blkcg_stop_wb_wait_on_bdi - remove current blkcg from writeback waiters list + * @bdi: target block device + * + * Remove current blkcg from the list of writeback waiters on target block + * device. + */ +void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi) +{ + struct blkcg *blkcg; + + rcu_read_lock(); + blkcg = blkcg_from_current(); + if (blkcg) { + spin_lock(&bdi->cgwb_waiters_lock); + list_del_rcu(&blkcg->cgwb_wait_node); + spin_unlock(&bdi->cgwb_waiters_lock); + css_put(&blkcg->css); + } + rcu_read_unlock(); + synchronize_rcu(); +} +#endif + /** * blkcg_activate_policy - activate a blkcg policy on a request_queue * @q: request_queue of interest diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 1b97a73d2fb1..da817896cded 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -970,9 +970,13 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, { bool rw = bio_data_dir(bio); unsigned long bps_wait = 0, iops_wait = 0, max_wait = 0; + struct throtl_data *td = tg->td; + struct request_queue *q = td->queue; + struct backing_dev_info *bdi = q->backing_dev_info; + struct blkcg_gq *blkg = tg_to_blkg(tg); /* - * Currently whole state machine of group depends on first bio + * Currently whole state machine of group depends on first bio * queued in the group bio list. So one should not be calling * this function with a different bio if there are other bios * queued. @@ -981,8 +985,9 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, bio != throtl_peek_queued(&tg->service_queue.queued[rw])); /* If tg->bps = -1, then BW is unlimited */ - if (tg_bps_limit(tg, rw) == U64_MAX && - tg_iops_limit(tg, rw) == UINT_MAX) { + if (blkcg_wb_waiters_on_bdi(blkg->blkcg, bdi) || + (tg_bps_limit(tg, rw) == U64_MAX && + tg_iops_limit(tg, rw) == UINT_MAX)) { if (wait) *wait = 0; return true; diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 36855c1f8daf..77c039a0ec25 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "internal.h" /* @@ -2446,6 +2447,8 @@ void sync_inodes_sb(struct super_block *sb) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); + blkcg_start_wb_wait_on_bdi(bdi); + /* protect against inode wb switch, see inode_switch_wbs_work_fn() */ bdi_down_write_wb_switch_rwsem(bdi); bdi_split_work_to_wbs(bdi, &work, false); @@ -2453,6 +2456,8 @@ void sync_inodes_sb(struct super_block *sb) bdi_up_write_wb_switch_rwsem(bdi); wait_sb_inodes(sb); + + blkcg_stop_wb_wait_on_bdi(bdi); } EXPORT_SYMBOL(sync_inodes_sb); diff --git a/fs/sync.c b/fs/sync.c index b54e0541ad89..3958b8f98b85 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ @@ -76,8 +77,13 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg) static void sync_fs_one_sb(struct super_block *sb, void *arg) { - if (!sb_rdonly(sb) && sb->s_op->sync_fs) + struct backing_dev_info *bdi = sb->s_bdi; + + if (!sb_rdonly(sb) && sb->s_op->sync_fs) { + blkcg_start_wb_wait_on_bdi(bdi); sb->s_op->sync_fs(sb, *(int *)arg); + blkcg_stop_wb_wait_on_bdi(bdi); + } } static void fdatawrite_one_bdev(struct block_device *bdev, void *arg) diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 07e02d6df5ad..095e4dd0427b 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -191,6 +191,8 @@ struct backing_dev_info { struct rb_root cgwb_congested_tree; /* their congested states */ struct mutex cgwb_release_mutex; /* protect shutdown of wb structs */ struct rw_semaphore wb_switch_rwsem; /* no cgwb switch while syncing */ + struct list_head cgwb_waiters; /* list of all waiters for writeback */ + spinlock_t cgwb_waiters_lock; /* protect cgwb_waiters list */ #else struct bdi_writeback_congested *wb_congested; #endif diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 76c61318fda5..0f7dcb70e922 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -56,6 +56,7 @@ struct blkcg { struct list_head all_blkcgs_node; #ifdef CONFIG_CGROUP_WRITEBACK + struct list_head cgwb_wait_node; struct list_head cgwb_list; refcount_t cgwb_refcnt; #endif @@ -252,6 +253,12 @@ static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css) return css ? container_of(css, struct blkcg, css) : NULL; } +static inline struct blkcg *blkcg_from_current(void) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return css_to_blkcg(blkcg_css()); +} + /** * __bio_blkcg - internal, inconsistent version to get blkcg * @@ -454,6 +461,10 @@ static inline void blkcg_cgwb_put(struct blkcg *blkcg) blkcg_destroy_blkgs(blkcg); } +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi); +void blkcg_start_wb_wait_on_bdi(struct backing_dev_info *bdi); +void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi); + #else static inline void blkcg_cgwb_get(struct blkcg *blkcg) { } @@ -464,6 +475,14 @@ static inline void blkcg_cgwb_put(struct blkcg *blkcg) blkcg_destroy_blkgs(blkcg); } +static inline bool +blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi) +{ + return false; +} +static inline void blkcg_start_wb_wait_on_bdi(struct backing_dev_info *bdi) { } +static inline void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi) { } + #endif /** @@ -772,6 +791,7 @@ static inline void blkcg_bio_issue_init(struct bio *bio) static inline bool blkcg_bio_issue_check(struct request_queue *q, struct bio *bio) { + struct backing_dev_info *bdi = q->backing_dev_info; struct blkcg_gq *blkg; bool throtl = false; @@ -788,6 +808,9 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q, blkg = bio->bi_blkg; + if (blkcg_wb_waiters_on_bdi(blkg->blkcg, bdi)) + bio_set_flag(bio, BIO_THROTTLED); + throtl = blk_throtl_bio(q, blkg, bio); if (!throtl) { diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 72e6d0c55cfa..8848d26e8bf6 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -686,10 +686,12 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi) { int ret; + INIT_LIST_HEAD(&bdi->cgwb_waiters); INIT_RADIX_TREE(&bdi->cgwb_tree, GFP_ATOMIC); bdi->cgwb_congested_tree = RB_ROOT; mutex_init(&bdi->cgwb_release_mutex); init_rwsem(&bdi->wb_switch_rwsem); + spin_lock_init(&bdi->cgwb_waiters_lock); ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL); if (!ret) { From patchwork Tue Feb 19 15:27:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10820033 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0205513B5 for ; Tue, 19 Feb 2019 15:27:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3924285AD for ; Tue, 19 Feb 2019 15:27:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D7B04293C1; Tue, 19 Feb 2019 15:27:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 611FB285AD for ; Tue, 19 Feb 2019 15:27:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95B4C8E0005; Tue, 19 Feb 2019 10:27:46 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 90F368E0002; Tue, 19 Feb 2019 10:27:46 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D2F78E0005; Tue, 19 Feb 2019 10:27:46 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by kanga.kvack.org (Postfix) with ESMTP id 29A188E0002 for ; Tue, 19 Feb 2019 10:27:46 -0500 (EST) Received: by mail-wm1-f72.google.com with SMTP id v8so794665wmj.1 for ; Tue, 19 Feb 2019 07:27:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=2q6fbrSJpEIdLncW4UFN3mJBQ17i3z8yQuQeSIDg2n4=; b=mzHO+mMzGHFUFZ3FK0mH4tcX3RG6+tpDMthlnaSjbPIoO0faUNsA+Q2rHIp+jk/8ft SKwkxoliqTOUCCVGCGC49fLwEU3/tkGD5rYy3xI8WHR7VD30tbSMSvgdW+nFFnq6DnAu 9KcOkaQQnJSSvIBNXYdz75vWpzL9dp4+0+hFHYsTcFCrO5OBauXqOD38DnHoUgxwLJ9R 5NuA8klJZCoellygOLnVUuDZFPAdtgxG9/c5cDVAx16bsaj16ObJZDHQDC63Hu1BHhoD LLTLK1K2pSBGcuePbny4c3mHSGc7T7jBj8ybvS/GHXjOI70mDrl42GcpP9dIYmkIhHe/ 4xsQ== X-Gm-Message-State: AHQUAubTT73XX3GWoa3/A74ou9BVluETVr/dHbHvXABd/gz5PwYqjImG eKjM3/8oijyyo6QskvKkTKIhpbq2QJQs3mmPDn5qQedASn2UDZBVFzKiJ7m5mURrcxTB3uT/ENK eyfjr5Tx7uqWkKIodDTWdrbEY2aWuOFyNQJilF9qgO8K9gocQqMJ7ZsYDZvL6v1zzzmi0RayGVa cg9BoX/pALLAXCZ99+O/YB2iYpUNPHRBGmsim84Rh5767NLHeWMFcpd0UOEWOpY1NSfNM7J4Cd2 6d1vMynqycTc9b9/OfRuUcXvFkABHV5AI1y1AL9QDVq6b2AohvoSvFTnV+y91gtMZVi1t1fh7PP 7dlhEZOb+49PXWb2nR3DrNTnHyrt6QADIy70RYXlkq+sEaUjfF6U4QQ2CsZdHO/6ZTTmOLiG1HX B X-Received: by 2002:a1c:1dd2:: with SMTP id d201mr3120173wmd.49.1550590065580; Tue, 19 Feb 2019 07:27:45 -0800 (PST) X-Received: by 2002:a1c:1dd2:: with SMTP id d201mr3120120wmd.49.1550590064415; Tue, 19 Feb 2019 07:27:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550590064; cv=none; d=google.com; s=arc-20160816; b=o6v0O7ZEoUhhFTfI7279AQMLNTVL3jr+mQncq/sVlP5dSczum08RefS/qDKuQV+tDe DgM48KWvseSKNZEj4ZueLs6w0qIErtLD6XBgzM4Hmllgutw05KyFvp5UcQIHmrD9yvCR 2YQMA59RfKZtzBEWoZSukIOtSyAaizLXfxaxQvJUB7drqLbcpsI6TnfDY8hGqqxxJiFF KAOlVC/401QgfeiICgYHpSdeXPsjn/B7RzdD/u088pUM8CDZMCNuQUuxAaq4vcdTJfeP x+anLrrcm/tSp3AvnHpjtfn6o7GN4wtvo8ySKXkFLItNJ4Ik+QGOKR1bWrqfPizL7I83 QdaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2q6fbrSJpEIdLncW4UFN3mJBQ17i3z8yQuQeSIDg2n4=; b=twKoz8D4uDWp2EFfy9NyJzmgz1AhEj3VtVZ8+L15xGPbReIyoC6IRh/T/Qrne8xNu7 Zovc2qW3Q0Risyu24b9Xa8kkpwt1XssYXvSGLZYg+NSJ5hhfuv0YUbdyRqXRQoQBFGms VVPISg/NgMQI59rC3ioiJL6Elq85E4oWWAsQV8h9PxAga9hvAbwuAr5A5FWPT93/aFiR FrhB4ClJIAFC53ZFZrEGYpdMp9CEQv/MzQ2Ebn+IuA2oJZ0Fs0RmRjEzFEI7c1nik+vT aj7y03W8MCfeYZ2Sf+gHEY6xuMoY+Up9EYCpakMoL1EU5iuo/uHf8qsF6l5pHPeYwqNW 5ZYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Hc7yKF8Q; spf=pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=righi.andrea@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id v14sor11084138wri.9.2019.02.19.07.27.44 for (Google Transport Security); Tue, 19 Feb 2019 07:27:44 -0800 (PST) Received-SPF: pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Hc7yKF8Q; spf=pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=righi.andrea@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2q6fbrSJpEIdLncW4UFN3mJBQ17i3z8yQuQeSIDg2n4=; b=Hc7yKF8Q0KWei9jPYZqwtPGUg6enGpDzAb3GPtEE+uGJT2GIltg9n8oYO6EmU47dnm +wYlQ647qj51UGlYZNKeCAdC/Pk0mmoluTHFpcgfSvyMEmQeUgq4AKiVejeZ6dfnPl4m rMc4yjeoJI8TOMezjLRVtcmIiw5XFWC3QpxlfLZ8teQOFbg4AbFQwkGBp+rnIwRF3/on D/79xbnlJeVugNzjBBwnaDqvY/giTuukLGhCR5ATHdj0LlJe2WxnNXvk35VkbBIHm6jG uWEuO55gt+o6aMnwdIRi0a3mwOkcjPyWJl8raBZDs8rP2AGQwOL0p2GaLPU8i5ndeUIc /Sog== X-Google-Smtp-Source: AHgI3IZFIvl6ReS7nWK/NP1LCBsakFB+dO/D+RYZzjjCia5PB7ivC+IK/A1H93SbBT6YSgC2IHc0YA== X-Received: by 2002:adf:f786:: with SMTP id q6mr4734864wrp.125.1550590063818; Tue, 19 Feb 2019 07:27:43 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host117-125-dynamic.33-79-r.retail.telecomitalia.it. [79.33.125.117]) by smtp.gmail.com with ESMTPSA id v6sm29029503wrd.88.2019.02.19.07.27.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Feb 2019 07:27:43 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/3] blkcg: introduce io.sync_isolation Date: Tue, 19 Feb 2019 16:27:11 +0100 Message-Id: <20190219152712.9855-3-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190219152712.9855-1-righi.andrea@gmail.com> References: <20190219152712.9855-1-righi.andrea@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add a flag to the blkcg cgroups to make sync()'ers in a cgroup only be allowed to write out pages that have been dirtied by the cgroup itself. This flag is disabled by default (meaning that we are not changing the previous behavior by default). When this flag is enabled any cgroup can write out only dirty pages that belong to the cgroup itself (except for the root cgroup that would still be able to write out all pages globally). Signed-off-by: Andrea Righi --- Documentation/admin-guide/cgroup-v2.rst | 9 ++++++ block/blk-throttle.c | 37 +++++++++++++++++++++++++ include/linux/blk-cgroup.h | 7 +++++ 3 files changed, 53 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 7bf3f129c68b..f98027fc2398 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1432,6 +1432,15 @@ IO Interface Files Shows pressure stall information for IO. See Documentation/accounting/psi.txt for details. + io.sync_isolation + A flag (0|1) that determines whether a cgroup is allowed to write out + only pages that have been dirtied by the cgroup itself. This option is + set to false (0) by default, meaning that any cgroup would try to write + out dirty pages globally, even those that have been dirtied by other + cgroups. + + Setting this option to true (1) provides a better isolation across + cgroups that are doing an intense write I/O activity. Writeback ~~~~~~~~~ diff --git a/block/blk-throttle.c b/block/blk-throttle.c index da817896cded..4bc3b40a4d93 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1704,6 +1704,35 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, return ret ?: nbytes; } +#ifdef CONFIG_CGROUP_WRITEBACK +static int sync_isolation_show(struct seq_file *sf, void *v) +{ + struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); + + seq_printf(sf, "%d\n", test_bit(BLKCG_SYNC_ISOLATION, &blkcg->flags)); + return 0; +} + +static ssize_t sync_isolation_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct blkcg *blkcg = css_to_blkcg(of_css(of)); + unsigned long val; + int err; + + buf = strstrip(buf); + err = kstrtoul(buf, 0, &val); + if (err) + return err; + if (val) + set_bit(BLKCG_SYNC_ISOLATION, &blkcg->flags); + else + clear_bit(BLKCG_SYNC_ISOLATION, &blkcg->flags); + + return nbytes; +} +#endif + static struct cftype throtl_files[] = { #ifdef CONFIG_BLK_DEV_THROTTLING_LOW { @@ -1721,6 +1750,14 @@ static struct cftype throtl_files[] = { .write = tg_set_limit, .private = LIMIT_MAX, }, +#ifdef CONFIG_CGROUP_WRITEBACK + { + .name = "sync_isolation", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = sync_isolation_show, + .write = sync_isolation_write, + }, +#endif { } /* terminate */ }; diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 0f7dcb70e922..6ac5aa049334 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -44,6 +44,12 @@ enum blkg_rwstat_type { struct blkcg_gq; +/* blkcg->flags */ +enum { + /* sync()'ers allowed to write out pages dirtied by the blkcg */ + BLKCG_SYNC_ISOLATION, +}; + struct blkcg { struct cgroup_subsys_state css; spinlock_t lock; @@ -55,6 +61,7 @@ struct blkcg { struct blkcg_policy_data *cpd[BLKCG_MAX_POLS]; struct list_head all_blkcgs_node; + unsigned long flags; #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_wait_node; struct list_head cgwb_list; From patchwork Tue Feb 19 15:27:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10820035 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5576F6CB for ; Tue, 19 Feb 2019 15:27:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42721285AD for ; Tue, 19 Feb 2019 15:27:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 367E8293C1; Tue, 19 Feb 2019 15:27:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7437D285AD for ; Tue, 19 Feb 2019 15:27:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E16F08E0006; Tue, 19 Feb 2019 10:27:47 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D9EA48E0002; Tue, 19 Feb 2019 10:27:47 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB5558E0006; Tue, 19 Feb 2019 10:27:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by kanga.kvack.org (Postfix) with ESMTP id 74CE78E0002 for ; Tue, 19 Feb 2019 10:27:47 -0500 (EST) Received: by mail-wr1-f70.google.com with SMTP id b9so9394630wrw.14 for ; Tue, 19 Feb 2019 07:27:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=c98WFQKqSGewnmjGR+VAUu2jdbTDTi2eXPqlfHCbeCU=; b=p/Ao3ZV5IKKLWFNfBrF33yaxhRZBcDwq7gN6J338Rhnj3L9ya5cJLMhKVILeRRSv6y d0SG80E1p7jn20OLGeQnrGal/OEuziT9/bPF1mPfIwMnCEKjBPiGZp0O0SXnPk4d0tsk SGR3KGps8c1c+Rz/AgEj8DLopUHzsP/08oLE1XubQ5QHVPI4hnOxHeQAnLhrdaVrSl5A e+Ge41/YaHBtMBGau3Id7rMIT7M5pgcZorvgImqoi5Z35+mBVcVRHp2A5DvZzVNa7vSp NPFVqdseU0mF0hDl+k4YWVz9+gYZXxh1KdjjbAmx11QqvsT/B7HMx3p1gEcbqmh23ArQ +ncg== X-Gm-Message-State: AHQUAubXnGx1Ypg7wVL6iPq3lz8Pggj9JeZi+1Eb5CvWehXfADhIu1uk ooHFA8vnGoIIAlJPMITJgaYD5z3gfoWQN7Up7X27A9DWybREDFrCVrnrG1d2WnPYnzygcNZ9Mxe DzbSJdOeJhMESaBtkmAfSdH2dX/prHJoT/tToJF8ArBX1UrKw2TtO9vFBkDTw0rzW0mwb+HQChP uy9Hbz4AP2dgV9aX8wDiA8cko3Owzo/dq8H1QGM4ymtR47bRsrUsQdI88IphBkceyMXJrGC+Vb7 DbmOzLN+tlQ4P7v+/hv9KdBy47JONzKeQ6T/9X+zrQyrO25NQDAAo9UAsqmpfUX68dELZjDDkQN Xbhe1M/8qTK45N0B/26G3H2VBuMnJL7LdLR7U1VrJZ6ryDfTN+Vv/uqW+RR1rXFWDxht+cI75yf s X-Received: by 2002:a5d:5585:: with SMTP id i5mr22050387wrv.239.1550590066976; Tue, 19 Feb 2019 07:27:46 -0800 (PST) X-Received: by 2002:a5d:5585:: with SMTP id i5mr22050323wrv.239.1550590065782; Tue, 19 Feb 2019 07:27:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550590065; cv=none; d=google.com; s=arc-20160816; b=shWkQQAukN6GagwPAz3ePH33ZrulDVPUcM/90XKge5QSh0K3ijNfSgTY0HNTPIpO7H BWc23ei5YfBIeOMrSTaXy1h/acKQ4uQqOi0nk8P22LSkwc7pTdzZI2k3yynraSxX0lrR cb101yjVIBiamppjB2o0sk748X5hR33SiRR8PLbFGkoZiVgDp6wtqbWktIV4IxQa82Hc nmmcjFu6rf1CJWogUsYUxxJLH4xGMrn6uLvbT3jo/y6hB+S8AWEPMy9pTHyQ749YYN+k h7BMRlowJ1f/zlZeh0YuInpIjncoQHpuKqLwbhuFI7RBMGZ9q1Vdi/07zJVmro2UMcnK Xe2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=c98WFQKqSGewnmjGR+VAUu2jdbTDTi2eXPqlfHCbeCU=; b=iumqHM1F5NsjO6/op2J2Fs24R3/3ntxR2Y8OjUFU36k/rxc7/N5JOdHJmaF2rgmjnd OMnWsi8Ksf9/wLBSshLP+y5YE3HOb8SfQhdcztHc7zYUITTvDi+BitxSxVXMmP6AJPVn pNsoTdrd3eOUZEBbrxSUdqxMpAeKm0H1YWGIEgeozJ+yUotpSZ+LDqj9/T6ac/NxoJUU +yL5U7Ci3odbVDs9EIBactgy24TPpyNDpRb05h0S4AFE6xfGQrnUKLhKUAx7WRom63C7 S5Q10DVuOrSf+fLUHV0FBUfzQvaa7I2uOcaUx0H/B1b0oXzMQ3EHiAcL96ptTZqwipVJ z7Jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=P+6GFnyV; spf=pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=righi.andrea@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id f16sor1836289wmb.3.2019.02.19.07.27.45 for (Google Transport Security); Tue, 19 Feb 2019 07:27:45 -0800 (PST) Received-SPF: pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=P+6GFnyV; spf=pass (google.com: domain of righi.andrea@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=righi.andrea@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=c98WFQKqSGewnmjGR+VAUu2jdbTDTi2eXPqlfHCbeCU=; b=P+6GFnyVpg2KZGAIvcNuVcwUz8RKDcIJqgGs+wbGQMM/EqhXUxwX0boLM2h38zYuKH JF9igYIFbiaVwv+rzAwyq7wRD9ltKKasUpaIRHZ6Hm2O5/Y18QXHx7oGN9B++3bG5TTM iR4i8+mu3M4JEDaNSPlMQK2UhFh8TXwA2SoNePLRKuB/mRqi8wRVmNegc4+Pp+gcgwyn IpeueEJ9cFjHGFbFu8O9LTE1svbaOhLrjToACAJNmeAqYoklqRiHW/iaaCY6LQjEd4BB ulyYxlxqpdpIOshBcPLOp2qUuAOuOawp/lApQMoTcO9XLjXxH1feNEUAYwYmLn9G2FEL uRtQ== X-Google-Smtp-Source: AHgI3IZ5u7DQFM6uWNRQR5JFQTc+Kv+c+ywfGCH879D8R7gLQYbXK7hv5L2JhfRHOL0qscx2Es6Q+Q== X-Received: by 2002:a1c:1902:: with SMTP id 2mr3384191wmz.150.1550590065159; Tue, 19 Feb 2019 07:27:45 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host117-125-dynamic.33-79-r.retail.telecomitalia.it. [79.33.125.117]) by smtp.gmail.com with ESMTPSA id v6sm29029503wrd.88.2019.02.19.07.27.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Feb 2019 07:27:44 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/3] blkcg: implement sync() isolation Date: Tue, 19 Feb 2019 16:27:12 +0100 Message-Id: <20190219152712.9855-4-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190219152712.9855-1-righi.andrea@gmail.com> References: <20190219152712.9855-1-righi.andrea@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Keep track of the inodes that have been dirtied by each blkcg cgroup and make sure that a blkcg issuing a sync() can trigger the writeback + wait of only those pages that belong to the cgroup itself. This behavior is enabled only when io.sync_isolation is enabled in the cgroup, otherwise the old behavior is applied: sync() triggers the writeback of any dirty page. Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 47 ++++++++++++++++++++++++++++++++++ fs/fs-writeback.c | 52 +++++++++++++++++++++++++++++++++++--- fs/inode.c | 1 + include/linux/blk-cgroup.h | 22 ++++++++++++++++ include/linux/fs.h | 4 +++ mm/page-writeback.c | 1 + 6 files changed, 124 insertions(+), 3 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index fb3c39eadf92..c6ddf9eeab37 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1422,6 +1422,53 @@ void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi) rcu_read_unlock(); synchronize_rcu(); } + +/** + * blkcg_set_mapping_dirty - set owner of a dirty mapping + * @mapping: target address space + * + * Set the current blkcg as the owner of the address space @mapping (the first + * blkcg that dirties @mapping becomes the owner). + */ +void blkcg_set_mapping_dirty(struct address_space *mapping) +{ + struct blkcg *curr_blkcg, *blkcg; + + if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) || + mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) + return; + + rcu_read_lock(); + curr_blkcg = blkcg_from_current(); + blkcg = blkcg_from_mapping(mapping); + if (curr_blkcg != blkcg) { + if (blkcg) + css_put(&blkcg->css); + css_get(&curr_blkcg->css); + rcu_assign_pointer(mapping->i_blkcg, curr_blkcg); + } + rcu_read_unlock(); +} + +/** + * blkcg_set_mapping_dirty - clear the owner of a dirty mapping + * @mapping: target address space + * + * Unset the owner of @mapping when it becomes clean. + */ + +void blkcg_set_mapping_clean(struct address_space *mapping) +{ + struct blkcg *blkcg; + + rcu_read_lock(); + blkcg = rcu_dereference(mapping->i_blkcg); + if (blkcg) { + css_put(&blkcg->css); + RCU_INIT_POINTER(mapping->i_blkcg, NULL); + } + rcu_read_unlock(); +} #endif /** diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 77c039a0ec25..d003d0593f41 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -58,6 +58,9 @@ struct wb_writeback_work { struct list_head list; /* pending work list */ struct wb_completion *done; /* set if the caller waits */ +#ifdef CONFIG_CGROUP_WRITEBACK + struct blkcg *blkcg; +#endif }; /* @@ -916,6 +919,29 @@ static int __init cgroup_writeback_init(void) } fs_initcall(cgroup_writeback_init); +static void blkcg_set_sync_domain(struct wb_writeback_work *work) +{ + rcu_read_lock(); + work->blkcg = blkcg_from_current(); + rcu_read_unlock(); +} + +static bool blkcg_same_sync_domain(struct wb_writeback_work *work, + struct address_space *mapping) +{ + struct blkcg *blkcg; + + if (!work->blkcg || work->blkcg == &blkcg_root) + return true; + if (!test_bit(BLKCG_SYNC_ISOLATION, &work->blkcg->flags)) + return true; + rcu_read_lock(); + blkcg = blkcg_from_mapping(mapping); + rcu_read_unlock(); + + return blkcg == work->blkcg; +} + #else /* CONFIG_CGROUP_WRITEBACK */ static void bdi_down_write_wb_switch_rwsem(struct backing_dev_info *bdi) { } @@ -959,6 +985,15 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, } } +static void blkcg_set_sync_domain(struct wb_writeback_work *work) +{ +} + +static bool blkcg_same_sync_domain(struct wb_writeback_work *work, + struct address_space *mapping) +{ + return true; +} #endif /* CONFIG_CGROUP_WRITEBACK */ /* @@ -1131,7 +1166,7 @@ static int move_expired_inodes(struct list_head *delaying_queue, LIST_HEAD(tmp); struct list_head *pos, *node; struct super_block *sb = NULL; - struct inode *inode; + struct inode *inode, *next; int do_sb_sort = 0; int moved = 0; @@ -1141,11 +1176,12 @@ static int move_expired_inodes(struct list_head *delaying_queue, expire_time = jiffies - (dirtytime_expire_interval * HZ); older_than_this = &expire_time; } - while (!list_empty(delaying_queue)) { - inode = wb_inode(delaying_queue->prev); + list_for_each_entry_safe(inode, next, delaying_queue, i_io_list) { if (older_than_this && inode_dirtied_after(inode, *older_than_this)) break; + if (!blkcg_same_sync_domain(work, inode->i_mapping)) + continue; list_move(&inode->i_io_list, &tmp); moved++; if (flags & EXPIRE_DIRTY_ATIME) @@ -1560,6 +1596,15 @@ static long writeback_sb_inodes(struct super_block *sb, break; } + /* + * Only write out inodes that belong to the blkcg that issued + * the sync(). + */ + if (!blkcg_same_sync_domain(work, inode->i_mapping)) { + redirty_tail(inode, wb); + continue; + } + /* * Don't bother with new inodes or inodes being freed, first * kind does not need periodic writeout yet, and for the latter @@ -2447,6 +2492,7 @@ void sync_inodes_sb(struct super_block *sb) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); + blkcg_set_sync_domain(&work); blkcg_start_wb_wait_on_bdi(bdi); /* protect against inode wb switch, see inode_switch_wbs_work_fn() */ diff --git a/fs/inode.c b/fs/inode.c index 73432e64f874..d60a2042d39a 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -564,6 +564,7 @@ static void evict(struct inode *inode) bd_forget(inode); if (S_ISCHR(inode->i_mode) && inode->i_cdev) cd_forget(inode); + blkcg_set_mapping_clean(&inode->i_data); remove_inode_hash(inode); diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 6ac5aa049334..a2bcc83c8c3e 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -441,6 +441,15 @@ extern void blkcg_destroy_blkgs(struct blkcg *blkcg); #ifdef CONFIG_CGROUP_WRITEBACK +static inline struct blkcg *blkcg_from_mapping(struct address_space *mapping) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return rcu_dereference(mapping->i_blkcg); +} + +void blkcg_set_mapping_dirty(struct address_space *mapping); +void blkcg_set_mapping_clean(struct address_space *mapping); + /** * blkcg_cgwb_get - get a reference for blkcg->cgwb_list * @blkcg: blkcg of interest @@ -474,6 +483,19 @@ void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi); #else +static inline struct blkcg *blkcg_from_mapping(struct address_space *mapping) +{ + return NULL; +} + +static inline void blkcg_set_mapping_dirty(struct address_space *mapping) +{ +} + +static inline void blkcg_set_mapping_clean(struct address_space *mapping) +{ +} + static inline void blkcg_cgwb_get(struct blkcg *blkcg) { } static inline void blkcg_cgwb_put(struct blkcg *blkcg) diff --git a/include/linux/fs.h b/include/linux/fs.h index 29d8e2cfed0e..502a2b94f183 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -414,6 +414,7 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * @nrpages: Number of page entries, protected by the i_pages lock. * @nrexceptional: Shadow or DAX entries, protected by the i_pages lock. * @writeback_index: Writeback starts here. + * @i_blkcg: blkcg owner (that dirtied the address_space) * @a_ops: Methods. * @flags: Error bits and flags (AS_*). * @wb_err: The most recent error which has occurred. @@ -432,6 +433,9 @@ struct address_space { unsigned long nrexceptional; pgoff_t writeback_index; const struct address_space_operations *a_ops; +#ifdef CONFIG_CGROUP_WRITEBACK + struct blkcg __rcu *i_blkcg; +#endif unsigned long flags; errseq_t wb_err; spinlock_t private_lock; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 7d1010453fb9..a58071ee5f1c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2410,6 +2410,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) inode_attach_wb(inode, page); wb = inode_to_wb(inode); + blkcg_set_mapping_dirty(mapping); __inc_lruvec_page_state(page, NR_FILE_DIRTY); __inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); __inc_node_page_state(page, NR_DIRTIED);