From patchwork Mon Apr 6 20:04:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 6164551 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id BF7519F54F for ; Mon, 6 Apr 2015 20:05:08 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id A2CD8200E0 for ; Mon, 6 Apr 2015 20:05:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AE1B42035D for ; Mon, 6 Apr 2015 20:05:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754010AbbDFUEz (ORCPT ); Mon, 6 Apr 2015 16:04:55 -0400 Received: from mail-qc0-f182.google.com ([209.85.216.182]:33645 "EHLO mail-qc0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753686AbbDFUEr (ORCPT ); Mon, 6 Apr 2015 16:04:47 -0400 Received: by qcrf4 with SMTP id f4so15187970qcr.0; Mon, 06 Apr 2015 13:04:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=UMH6LHgYU+jeuWLGDErKBIXQm5EFd/HZ6B0/Q3q/Usk=; b=LspXvBXTCzuE7Ktrm088wYwjfK1+hJhbeiNcgxaPuekPkbVCW+U1f5x5ZgL2JEQ+uE KoEY/yMxXZmixNauvbxaqQcYhzGaJ32ZyHql+Ria9ZB5WwVT/d2ylxjTcYbpQ9o5DUF3 iRyyqK+H5pyhWvzHAUZDH0V17JAUzkJtM3c5rCmjjhA/hR/VnOuJx8MrW7nMAG3G5KFx dyDl5MplKcEI5+jy6D7RmGWGmnhJMTpwwmtLsGgQCZE9a61dl3vyRw5OPhNeCx1vDe6/ XP0vevn9+SQ3LyJ6M1ZBDr45D7Gwwoliclq3pQmzywKlGkavs4aCfu8YOEaAcgjOajmv O03w== X-Received: by 10.140.133.67 with SMTP id 64mr20281827qhf.17.1428350686660; Mon, 06 Apr 2015 13:04:46 -0700 (PDT) Received: from htj.duckdns.org.lan (207-38-238-8.c3-0.wsd-ubr1.qens-wsd.ny.cable.rcn.com. [207.38.238.8]) by mx.google.com with ESMTPSA id o3sm3909138qga.36.2015.04.06.13.04.44 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Apr 2015 13:04:45 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org, hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org, vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com, Tejun Heo Subject: [PATCH 05/19] writeback: move global_dirty_limit into wb_domain Date: Mon, 6 Apr 2015 16:04:20 -0400 Message-Id: <1428350674-8303-6-git-send-email-tj@kernel.org> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1428350674-8303-1-git-send-email-tj@kernel.org> References: <1428350674-8303-1-git-send-email-tj@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch is a part of the series to define wb_domain which represents a domain that wb's (bdi_writeback's) belong to and are measured against each other in. This will enable IO backpressure propagation for cgroup writeback. global_dirty_limit exists to regulate the global dirty threshold which is a property of the wb_domain. This patch moves hard_dirty_limit, dirty_lock, and update_time into wb_domain. This is pure reorganization and doesn't introduce any behavioral changes. Signed-off-by: Tejun Heo Cc: Jens Axboe Cc: Jan Kara Cc: Wu Fengguang Cc: Greg Thelen --- fs/fs-writeback.c | 2 +- include/linux/writeback.h | 17 ++++++++++++++- include/trace/events/writeback.h | 7 +++--- mm/page-writeback.c | 46 ++++++++++++++++++++-------------------- 4 files changed, 44 insertions(+), 28 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 85daae2..5b842bd 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -868,7 +868,7 @@ static long writeback_chunk_size(struct bdi_writeback *wb, pages = LONG_MAX; else { pages = min(wb->avg_write_bandwidth / 2, - global_dirty_limit / DIRTY_SCOPE); + global_wb_domain.dirty_limit / DIRTY_SCOPE); pages = min(pages, work->nr_pages); pages = round_down(pages + MIN_WRITEBACK_PAGES, MIN_WRITEBACK_PAGES); diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 4972dcf..fe0924a 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -95,6 +95,8 @@ struct writeback_control { * dirtyable memory accordingly. */ struct wb_domain { + spinlock_t lock; + /* * Scale the writeback cache size proportional to the relative * writeout speed. @@ -115,6 +117,19 @@ struct wb_domain { struct fprop_global completions; struct timer_list period_timer; /* timer for aging of completions */ unsigned long period_time; + + /* + * The dirtyable memory and dirty threshold could be suddenly + * knocked down by a large amount (eg. on the startup of KVM in a + * swapless system). This may throw the system into deep dirty + * exceeded state and throttle heavy/light dirtiers alike. To + * retain good responsiveness, maintain global_dirty_limit for + * tracking slowly down to the knocked down dirty threshold. + * + * Both fields are protected by ->lock. + */ + unsigned long dirty_limit_tstamp; + unsigned long dirty_limit; }; /* @@ -153,7 +168,7 @@ void throttle_vm_writeout(gfp_t gfp_mask); bool zone_dirty_ok(struct zone *zone); int wb_domain_init(struct wb_domain *dom, gfp_t gfp); -extern unsigned long global_dirty_limit; +extern struct wb_domain global_wb_domain; /* These are exported to sysctl. */ extern int dirty_background_ratio; diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index 5c9a68c..d5ac3dd 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -344,7 +344,7 @@ TRACE_EVENT(global_dirty_state, __entry->nr_written = global_page_state(NR_WRITTEN); __entry->background_thresh = background_thresh; __entry->dirty_thresh = dirty_thresh; - __entry->dirty_limit = global_dirty_limit; + __entry->dirty_limit = global_wb_domain.dirty_limit; ), TP_printk("dirty=%lu writeback=%lu unstable=%lu " @@ -446,8 +446,9 @@ TRACE_EVENT(balance_dirty_pages, unsigned long freerun = (thresh + bg_thresh) / 2; strlcpy(__entry->bdi, dev_name(bdi->dev), 32); - __entry->limit = global_dirty_limit; - __entry->setpoint = (global_dirty_limit + freerun) / 2; + __entry->limit = global_wb_domain.dirty_limit; + __entry->setpoint = (global_wb_domain.dirty_limit + + freerun) / 2; __entry->dirty = dirty; __entry->bdi_setpoint = __entry->setpoint * bdi_thresh / (thresh + 1); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 43380dc..c141533 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -122,9 +122,7 @@ EXPORT_SYMBOL(laptop_mode); /* End of sysctl-exported parameters */ -unsigned long global_dirty_limit; - -static struct wb_domain global_wb_domain; +struct wb_domain global_wb_domain; /* * Length of period for aging writeout fractions of bdis. This is an @@ -470,9 +468,15 @@ static void writeout_period(unsigned long t) int wb_domain_init(struct wb_domain *dom, gfp_t gfp) { memset(dom, 0, sizeof(*dom)); + + spin_lock_init(&dom->lock); + init_timer_deferrable(&dom->period_timer); dom->period_timer.function = writeout_period; dom->period_timer.data = (unsigned long)dom; + + dom->dirty_limit_tstamp = jiffies; + return fprop_global_init(&dom->completions, gfp); } @@ -532,7 +536,9 @@ static unsigned long dirty_freerun_ceiling(unsigned long thresh, static unsigned long hard_dirty_limit(unsigned long thresh) { - return max(thresh, global_dirty_limit); + struct wb_domain *dom = &global_wb_domain; + + return max(thresh, dom->dirty_limit); } /** @@ -916,17 +922,10 @@ out: wb->avg_write_bandwidth = avg; } -/* - * The global dirtyable memory and dirty threshold could be suddenly knocked - * down by a large amount (eg. on the startup of KVM in a swapless system). - * This may throw the system into deep dirty exceeded state and throttle - * heavy/light dirtiers alike. To retain good responsiveness, maintain - * global_dirty_limit for tracking slowly down to the knocked down dirty - * threshold. - */ static void update_dirty_limit(unsigned long thresh, unsigned long dirty) { - unsigned long limit = global_dirty_limit; + struct wb_domain *dom = &global_wb_domain; + unsigned long limit = dom->dirty_limit; /* * Follow up in one step. @@ -939,7 +938,7 @@ static void update_dirty_limit(unsigned long thresh, unsigned long dirty) /* * Follow down slowly. Use the higher one as the target, because thresh * may drop below dirty. This is exactly the reason to introduce - * global_dirty_limit which is guaranteed to lie above the dirty pages. + * dom->dirty_limit which is guaranteed to lie above the dirty pages. */ thresh = max(thresh, dirty); if (limit > thresh) { @@ -948,28 +947,27 @@ static void update_dirty_limit(unsigned long thresh, unsigned long dirty) } return; update: - global_dirty_limit = limit; + dom->dirty_limit = limit; } static void global_update_bandwidth(unsigned long thresh, unsigned long dirty, unsigned long now) { - static DEFINE_SPINLOCK(dirty_lock); - static unsigned long update_time = INITIAL_JIFFIES; + struct wb_domain *dom = &global_wb_domain; /* * check locklessly first to optimize away locking for the most time */ - if (time_before(now, update_time + BANDWIDTH_INTERVAL)) + if (time_before(now, dom->dirty_limit_tstamp + BANDWIDTH_INTERVAL)) return; - spin_lock(&dirty_lock); - if (time_after_eq(now, update_time + BANDWIDTH_INTERVAL)) { + spin_lock(&dom->lock); + if (time_after_eq(now, dom->dirty_limit_tstamp + BANDWIDTH_INTERVAL)) { update_dirty_limit(thresh, dirty); - update_time = now; + dom->dirty_limit_tstamp = now; } - spin_unlock(&dirty_lock); + spin_unlock(&dom->lock); } /* @@ -1761,10 +1759,12 @@ void laptop_sync_completion(void) void writeback_set_ratelimit(void) { + struct wb_domain *dom = &global_wb_domain; unsigned long background_thresh; unsigned long dirty_thresh; + global_dirty_limits(&background_thresh, &dirty_thresh); - global_dirty_limit = dirty_thresh; + dom->dirty_limit = dirty_thresh; ratelimit_pages = dirty_thresh / (num_online_cpus() * 32); if (ratelimit_pages < 16) ratelimit_pages = 16;