From patchwork Sat Mar 7 05:34:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 5958621 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id E3CE89F380 for ; Sat, 7 Mar 2015 05:34:18 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D037920304 for ; Sat, 7 Mar 2015 05:34:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 82BC820263 for ; Sat, 7 Mar 2015 05:34:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752675AbbCGFeP (ORCPT ); Sat, 7 Mar 2015 00:34:15 -0500 Received: from imap.thunk.org ([74.207.234.97]:55046 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750744AbbCGFeO (ORCPT ); Sat, 7 Mar 2015 00:34:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=thunk.org; s=ef5046eb; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=j7Q5RBPtwv/m+dA1qsWhYIrIlUW2/5Gv/08ObY1myz8=; b=iT1YXd9V89HWVX1/6RTL8YQxGGoaaMKH53iCrUOY5J+GrTVw53rjXjm1QlmtaVcTDH6Ap3RTQue1vWdKBeNm8kcuS3IMgK/UoLDhj/8G5TIjjdITO4EV0zY0pJitcGfHd3eZi3uWWhP5pL4tptY+4pSkLOKQbMOBP5Kcudeqetk=; Received: from root (helo=closure.thunk.org) by imap.thunk.org with local-esmtp (Exim 4.80) (envelope-from ) id 1YU7NJ-0000lN-Mx; Sat, 07 Mar 2015 05:34:09 +0000 Received: by closure.thunk.org (Postfix, from userid 15806) id 00579580236; Sat, 7 Mar 2015 00:34:08 -0500 (EST) Date: Sat, 7 Mar 2015 00:34:08 -0500 From: Theodore Ts'o To: Jan Kara Cc: Linus Torvalds , linux-fsdevel , Alexander Viro Subject: [PATCH] fs: make sure the timestamps for lazytime inodes eventually get written Message-ID: <20150307053408.GA18002@thunk.org> References: <20150224183103.GA13024@quack.suse.cz> <20150225145258.GB11217@thunk.org> <20150225162506.GE22736@quack.suse.cz> <20150226043304.GC11217@thunk.org> <20150226083455.GA10894@quack.suse.cz> <20150226134553.GF11217@thunk.org> <20150226143828.GA21485@quack.suse.cz> <20150226192735.GB17174@thunk.org> <20150302082907.GA4199@quack.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150302082907.GA4199@quack.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Jan, I believe the following should address all of the issues that you raised. Could you take a look and let me know what you think? Thanks!! - Ted commit e42f32d0ed65059e5cb689cc0ac28099a729ba9a Author: Theodore Ts'o Date: Sat Mar 7 00:22:37 2015 -0500 fs: make sure the timestamps for lazytime inodes eventually get written Jan Kara pointed out that if there is an inode which is constantly getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp will never be written since inode->dirtied_when is constantly getting updated. We fix this by adding an extra field to the inode, dirtied_time_when, so inodes with a stale dirtytime can get detected and handled. Also add a sysctl tunable, dirtytime_expire_seconds so we can properly debug this code and make sure it all works. Finally, if we have a dirtytime inode caused by an atime update, and there is no write activity on the file system, we need to have a secondary system to make sure these inodes get written out. We do this by setting up a second delayed work structure which wakes up the CPU much more rarely compared to writeback_expire_centisecs. Signed-off-by: Theodore Ts'o --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e907052..260d7e7 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -53,6 +53,9 @@ struct wb_writeback_work { struct completion *done; /* set if the caller waits */ }; +/* 12 hours in seconds */ +unsigned int dirtytime_expire_interval = 12 * 60 * 60; + /** * writeback_in_progress - determine whether there is writeback in progress * @bdi: the device's backing_dev_info structure. @@ -275,8 +278,8 @@ static int move_expired_inodes(struct list_head *delaying_queue, if ((flags & EXPIRE_DIRTY_ATIME) == 0) older_than_this = work->older_than_this; - else if ((work->reason == WB_REASON_SYNC) == 0) { - expire_time = jiffies - (HZ * 86400); + else if (!work->for_sync) { + expire_time = jiffies - (dirtytime_expire_interval * HZ); older_than_this = &expire_time; } while (!list_empty(delaying_queue)) { @@ -458,6 +461,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb, */ redirty_tail(inode, wb); } else if (inode->i_state & I_DIRTY_TIME) { + inode->dirtied_when = jiffies; list_move(&inode->i_wb_list, &wb->b_dirty_time); } else { /* The inode is clean. Remove from writeback lists. */ @@ -741,6 +745,13 @@ static long writeback_sb_inodes(struct super_block *sb, spin_lock(&inode->i_lock); if (!(inode->i_state & I_DIRTY_ALL)) wrote++; + if ((inode->i_state & I_DIRTY_TIME) && + ((start_time - inode->dirtied_time_when) > + (dirtytime_expire_interval * HZ))) { + inode->i_state &= ~I_DIRTY_TIME; + inode->i_state |= I_DIRTY_SYNC; + trace_writeback_lazytime(inode); + } requeue_inode(inode, wb, &wbc); inode_sync_complete(inode); spin_unlock(&inode->i_lock); @@ -1131,6 +1142,56 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) rcu_read_unlock(); } +/* + * Wake up bdi's periodically to make sure dirtytime inodes gets + * written back periodically. We deliberately do *not* check the + * b_dirtytime list in wb_has_dirty_io(), since this would cause the + * kernel to be constantly waking up once there are any dirtytime + * inodes on the system. So instead we define a separate delayed work + * function which gets called much more rarely. + * + * If there is any other write activity going on in the file system, + * this function won't be necessary. But if the only thing that has + * happened on the file system is a dirtytime inode caused by an atime + * update, we need this infrastructure below to make sure that inode + * eventually gets pushed out to disk. + */ +static void wakeup_dirtytime_writeback(struct work_struct *w); +static DECLARE_DELAYED_WORK(dirtytime_work, wakeup_dirtytime_writeback); + +static void wakeup_dirtytime_writeback(struct work_struct *w) +{ + struct backing_dev_info *bdi; + + rcu_read_lock(); + list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { + if (list_empty(&bdi->wb.b_dirty_time)) + continue; + bdi_wakeup_thread(bdi); + } + rcu_read_unlock(); + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); +} + +static int __init start_dirtytime_writeback(void) +{ + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); + return 0; +} +__initcall(start_dirtytime_writeback); + +int dirtytime_interval_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret == 0 && write) + mod_delayed_work(system_wq, &dirtytime_work, 0); + return ret; +} + + static noinline void block_dump___mark_inode_dirty(struct inode *inode) { if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) { @@ -1269,6 +1330,10 @@ void __mark_inode_dirty(struct inode *inode, int flags) } inode->dirtied_when = jiffies; + if (dirtytime) + inode->dirtied_time_when = jiffies; + if (flags & I_DIRTY_PAGES) + dirtytime = 0; list_move(&inode->i_wb_list, dirtytime ? &bdi->wb.b_dirty_time : &bdi->wb.b_dirty); spin_unlock(&bdi->wb.list_lock); diff --git a/include/linux/fs.h b/include/linux/fs.h index b4d71b5..f4131e8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -604,6 +604,7 @@ struct inode { struct mutex i_mutex; unsigned long dirtied_when; /* jiffies of first dirtying */ + unsigned long dirtied_time_when; struct hlist_node i_hash; struct list_head i_wb_list; /* backing dev IO list */ diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 0004833..b2dd371e 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -130,6 +130,7 @@ extern int vm_dirty_ratio; extern unsigned long vm_dirty_bytes; extern unsigned int dirty_writeback_interval; extern unsigned int dirty_expire_interval; +extern unsigned int dirtytime_expire_interval; extern int vm_highmem_is_dirtyable; extern int block_dump; extern int laptop_mode; @@ -146,6 +147,8 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write, extern int dirty_bytes_handler(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +int dirtytime_interval_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos); struct ctl_table; int dirty_writeback_centisecs_handler(struct ctl_table *, int, diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 88ea2d6..ce410bb 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1228,6 +1228,14 @@ static struct ctl_table vm_table[] = { .extra1 = &zero, }, { + .procname = "dirtytime_expire_seconds", + .data = &dirtytime_expire_interval, + .maxlen = sizeof(dirty_expire_interval), + .mode = 0644, + .proc_handler = dirtytime_interval_handler, + .extra1 = &zero, + }, + { .procname = "nr_pdflush_threads", .mode = 0444 /* read-only */, .proc_handler = pdflush_proc_obsolete,