From patchwork Fri Jun 19 22:32:23 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Hansen X-Patchwork-Id: 6649521 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D209AC05AC for ; Fri, 19 Jun 2015 22:32:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1CBD7208D8 for ; Fri, 19 Jun 2015 22:32:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0C7E32073A for ; Fri, 19 Jun 2015 22:32:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756100AbbFSWc3 (ORCPT ); Fri, 19 Jun 2015 18:32:29 -0400 Received: from mga02.intel.com ([134.134.136.20]:52325 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754054AbbFSWc1 (ORCPT ); Fri, 19 Jun 2015 18:32:27 -0400 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 19 Jun 2015 15:32:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,646,1427785200"; d="scan'208";a="749873022" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.39.19]) by orsmga002.jf.intel.com with ESMTP; 19 Jun 2015 15:32:24 -0700 Subject: [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() To: dave@sr71.net Cc: jack@suse.cz, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, tim.c.chen@linux.intel.com, ak@linux.intel.com From: Dave Hansen Date: Fri, 19 Jun 2015 15:32:23 -0700 Message-Id: <20150619223223.10B658AD@viggo.jf.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, __sb_start_write() and freeze_super() can race with each other. __sb_start_write() uses a smp_mb() to ensure that freeze_super() can see its write to sb->s_writers.counter and that it can see freeze_super()'s update to sb->s_writers.frozen. This all seems to work fine. But, this smp_mb() makes __sb_start_write() the single hottest function in the kernel if I sit in a loop and do tiny write()s to tmpfs over and over. This is on a very small 2-core system, so it will only get worse on larger systems. This _seems_ like an ideal case for RCU. __sb_start_write() is the RCU read-side and is in a very fast, performance-sensitive path. freeze_super() is the RCU writer and is in an extremely rare non-performance-sensitive path. Instead of doing and smp_wmb() in __sb_start_write(), we do rcu_read_lock(). This ensures that a CPU doing freeze_super() can not proceed past its synchronize_rcu() until the grace period has ended and the 's_writers.frozen = SB_FREEZE_WRITE' is visible to __sb_start_write(). One question here: Does the work that __sb_start_write() does in a previous grace period becomes visible to freeze_super() after its call to synchronize_rcu()? It _seems_ like it should, but it seems backwards to me since __sb_start_write() is the "reader" in this case. This patch increases the number of writes/second that I can do by 10.4%. Does anybody see any holes with this? Cc: Jan Kara Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Paul E. McKenney Cc: Tim Chen Cc: Andi Kleen --- b/fs/super.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff -puN fs/super.c~rcu-__sb_start_write fs/super.c --- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700 +++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700 @@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s */ int __sb_start_write(struct super_block *sb, int level, bool wait) { -retry: - if (unlikely(sb->s_writers.frozen >= level)) { + /* + * RCU keeps freeze_super() from proceeding + * while we are in here. + */ + rcu_read_lock(); + while (unlikely(sb->s_writers.frozen >= level)) { + rcu_read_unlock(); if (!wait) - return 0; + return 0; wait_event(sb->s_writers.wait_unfrozen, sb->s_writers.frozen < level); + rcu_read_lock(); } #ifdef CONFIG_LOCKDEP acquire_freeze_lock(sb, level, !wait, _RET_IP_); #endif percpu_counter_inc(&sb->s_writers.counter[level-1]); - /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. - */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } + rcu_read_unlock(); return 1; } EXPORT_SYMBOL(__sb_start_write); @@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb) /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); + /* + * After we synchronize_rcu(), we have ensured that everyone + * who reads sb->s_writers.frozen under rcu_read_lock() can + * now see our update. This pretty much means that + * __sb_start_write() will not allow any new writers. + */ + synchronize_rcu(); /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); @@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb) /* Now we go and block page faults... */ down_write(&sb->s_umount); sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_PAGEFAULT); @@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); return ret; @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb) out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb);